The story of creating a database using the Google Analytics API

Introduction

Google Analytics is a Wed access analysis tool provided by google. You can get various information such as the access area, time, device used, browser, etc. of the visitor to the Wed site. Since the GUI is also substantial, it is enough to say that it is enough, but since the API is also provided for free, I tried to make an API connection and create a database.

Connect to API and save to MySQL

1. 1. Enable Analytics API

Enable the API from the Google API console.

2. Install the Google client library

sudo pip install --upgrade google-api-python-client

3. Try to connect

Enter the date and the information you want to get in the main function and return the DataFrame.

'''fetch_analytics_data.py'''

from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = '<REPLACE_WITH_JSON_FILE>'
VIEW_ID = '<ID_NUMBER>'


def initialize_analyticsreporting():
    """Initializes an Analytics Reporting API V4 service object.

    Returns:
      An authorized Analytics Reporting API V4 service object.
    """
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
        KEY_FILE_LOCATION, SCOPES)

    # Build the service object.
    analytics = build('analyticsreporting', 'v4', credentials=credentials)

    return analytics


def get_report(analytics, start_day, end_day, dimension_list):
    """Queries the Analytics Reporting API V4.

    Args:
      analytics: An authorized Analytics Reporting API V4 service object.
    Returns:
      The Analytics Reporting API V4 response.
    """
    return analytics.reports().batchGet(
        body={
            'reportRequests': [
                {
                    'viewId': VIEW_ID,
                    'dateRanges': [{'startDate': start_day, 'endDate': end_day}],
                    # 'metrics': [{'expression': 'ga:sessions'}],
                    'dimensions': dimension_list
                    #                         [{'name': 'ga:country'},
                    #                          {'name': 'ga:browser'},
                    #                          {'name': 'ga:city'},
                    #                          {'name': 'ga:sex'}
                    #                         ]
                }]
        }
    ).execute()


def fetch_columns_name(response):
    col_name = [i.lstrip('ga:') for i in list(list(list(response.values())[0][0].values())[0].values())[0]]
    #     col_name.append('visitor')
    return col_name


def split_dimensions(response, col_name):
    df = pd.json_normalize(response.get('reports'), sep=' ')["data rows"]
    df = pd.json_normalize(df[0])

    # split dimensions
    for i in range(len(df['dimensions'][0])):
        new_feature = [df['dimensions'][j][i] for j in range(df.shape[0])]
        feature_name = col_name[i]
        df[feature_name] = new_feature
    df.drop(["dimensions"], axis=1, inplace=True)

    # fetch visitor (not use now)
    # df["visitor"] = [int(list(df['metrics'][i][0].values())[0][0]) for i in range(df.shape[0])]
    df.drop(['metrics'], axis=1, inplace=True)

    return df

def main(start_day, end_day, dim_list):
    analytics = initialize_analyticsreporting()
    response = get_report(analytics, start_day, end_day, dim_list)
    col_name = fetch_columns_name(response)
    df = split_dimensions(response, col_name)
    return df

if __name__ == "main":
    main()

4. Export to Database

Since there is a limit to the amount of information that can be drawn at one time in the free version, it is acquired in four parts.

import pandas as pd
import fetch_analytics_data as fa
from database_connect import database_engine
import sys
from sqlalchemy import create_engine

# -----------------------------------------------------------------
dim_list1 = [
    # Geo Network
    {'name': 'ga:country'},
    {'name': 'ga:city'},
    {'name': "ga:continentId"},

    ## Platform of Device
    {'name': "ga:browser"},

    ## Users
    {'name': 'ga:userType'},
    {'name': 'ga:sessionCount'},
    {'name': 'ga:daysSinceLastSession'},
    # {'name': 'ga:userDefinedValue'},
    # {'name': 'ga:userBucket'},

    ## session
    {'name': 'ga:sessionDurationBucket'},

    ## Time
    {'name': 'ga:dateHourMinute'},
]

# ----------------------------------------------------------------
dim_list2 = [
    ## Geo Network
    {'name': 'ga:city'},

    ## Platform of Device
    # {'name':"ga:operatingSystem"},
    # {'name':'ga:mobileDeviceBranding'},
    # {'name':'ga:mobileDeviceModel'},
    # {'name':"ga:mobileInputSelector"},
    # {'name':"ga:mobileDeviceInfo"},
    # {'name':'ga:mobileDeviceMarketingName'},
    {'name': 'ga:deviceCategory'},

    ## Page Tracking
    {'name': 'ga:pagePath'},
    {'name': 'ga:exitPagePath'},
    {'name': 'ga:pageDepth'},
    # {'name':'ga:pageTitle'},

    ## Time
    {'name': 'ga:dateHourMinute'},
]

# ---------------------------------------------------------------
dim_list3 = [
    # Geo Network
    {'name': 'ga:city'},

    ## Traffic Sources
    {'name': "ga:referralPath"},
    # {'name': "ga:campaign"},  # all not set
    {'name': "ga:source"},
    {'name': "ga:medium"},
    {'name': "ga:sourceMedium"},
    {'name': "ga:hasSocialSourceReferral"},

    ## Time
    {'name': 'ga:dateHourMinute'},
]

# -----------------------------------------------------------------
dim_list4 = [
    ## Geo Network
    {'name': 'ga:city'},

    ## Platform of Device
    {'name': "ga:operatingSystem"},
    {'name': 'ga:mobileDeviceBranding'},
    {'name': 'ga:mobileDeviceModel'},
    {'name': "ga:mobileInputSelector"},
    {'name': "ga:mobileDeviceInfo"},

    ## Time
    {'name': 'ga:dateHourMinute'},
]

# -----------------------------------------------------------

def database_engine():
      db_url = '<Database_URL>'
      engine = create_engine(db_url)
      return engine

def main():
    start_day = sys.argv[1]
    end_day = sys.argv[1]
    df1 = fa.main(start_day, end_day, dim_list1)
    df2 = fa.main(start_day, end_day, dim_list2)
    df3 = fa.main(start_day, end_day, dim_list3)
    df4 = fa.main(start_day, end_day, dim_list4)
    # merge
    df = pd.merge(df1, df2, on=['city','dateHourMinute'], how='outer').merge(df3, on=['city','dateHourMinute'], how='outer').merge(df4, on=['city','dateHourMinute'], how='outer')
    en = database_engine()
    df.to_sql('analytics_table', con=en, if_exists='append', index=False)

if __name__ == '__main__':
    main()


Recommended Posts

The story of creating a database using the Google Analytics API
A story about a Python beginner trying to get Google search results using the API
The story of writing a program
The story of creating a site that lists the release dates of books
The story of blackjack A processing (python)
Creating Google Spreadsheet using Python / Google Data API
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
I tried using the Google Cloud Vision API
The story of making a lie news generator
A story of creating 16 * 16 dots from a Digimon photo
Let Python measure the average score of a page using the PageSpeed Insights API
The story of Django creating a library that might be a little more useful
The story of creating Botonyan that returns the contents of Google Docs in response to a specific keyword on Slack
The story of sys.path.append ()
The story of launching a Minecraft server from Discord
A story that reduces the effort of operation / maintenance
Cut a part of the string using a Python slice
The story of making a music generation neural network
Create a tweet heatmap with the Google Maps API
I tried using the API of the salmon data project
A story about changing the master name of BlueZ
Zip 4 Gbyte problem is a story of the past
A story that analyzed the delivery of Nico Nama.
The story of using circleci to build manylinux wheels
A little bit from Python using the Jenkins API
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
The story of introducing a multi-factor authentication function using a one-time password into a Java application
Now in Singapore The story of creating a LineBot and wanting to do a memorable job
Avoiding the pitfalls of using a Mac (for Linux users?)
The story of building Zabbix 4.4
The story of creating a "spirit and time chat room" exclusively for engineers in the company
Reuse the behavior of the @property method by using a descriptor [16/100]
About the camera change event of Google Maps Android API
[Apache] The story of prefork
Try using the Twitter API
The story of a Django model field disappearing from a class
Creating a Python script that supports the e-Stat API (ver.2)
Create a real-time auto-reply bot using the Twitter Streaming API
Try using the Twitter API
A story that contributes to new corona analysis using a free trial of Google Cloud Platform
View the contents of the queue using the RabbitMQ Management Web API
Try using the PeeringDB 2.0 API
[Python] I tried collecting data using the API of wikipedia
Let's publish the super resolution API using Google Cloud Platform
The story of making a question box bot with discord.py
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
A memorandum of using eigen3
[Map display] Display a map from the address registered by the user using the Google Maps JavaScript API and Geocoding API!
I tried hitting the Google API with Ruby and Python-Make the database a Spreadsheet and manage it with Google Drive
The story of creating a bot that displays active members in a specific channel of slack with python
A story stuck with the installation of the machine learning library JAX
I tried to get a database of horse racing using Pandas
The story that a hash error came out when using Pipenv
A story that struggled to handle the Python package of PocketSphinx
The story of making a standard driver for db with python.
Evaluate the performance of a simple regression model using LeaveOneOut cross-validation
Try using Elasticsearch as the foundation of a question answering system
Finding the optimum value of a function using a genetic algorithm (Part 1)
Regularly upload files to Google Drive using the Google Drive API in Python
Python beginners hit the unofficial API of Google Play Music to play music
[SEO] Flow / sample code when using Google Analytics API in Python