Page Level Query Analysis at Scale with Google Colab, Python, & the GSC APIby Calefyb Tech digital marketing services
Anyone who does SEO as part of their job knows that there’s a lot of value in analyzing which queries are and are not sending traffic to specific pages on a site.
The most common uses for these datasets are to align on-page optimizations with existing rankings and traffic, and to identify gaps in ranking keywords.
However, working with this data is extremely tedious because it’s only available in the Google Search Console interface, and you have to look at only one page at a time.
On top of that, to get information on the text included in the ranking page, you either need to manually review it or extract it with a tool like Screaming Frog.
but even the above view would only be viable one page at a time, and as mentioned, the actual text extraction would have had to be separate as well.
Given these apparent issues with the readily available data at the SEO community’s disposal, the data engineering team at Inseev Interactive has been spending a lot of time thinking about how we can improve these processes at scale.
One specific example that we’ll be reviewing in this post is a simple script that allows you to get the above data in a flexible format for many great analytical views.
Better yet, this will all be available with only a few single input variables.
A quick rundown of tool functionality
The tool automatically compares the text on-page to the Google Search Console top queries at the page-level to let you know which queries are on-page as well as how many times they appear on the page. An optional XPath variable also allows you to specify the part of the page you want to analyze text on.
This means you’ll know exactly what queries are driving clicks/impressions that are not in your <title>, <h1>, or even something as specific as the first paragraph within the main content (MC). The sky’s the limit.
For those of you not familiar, we’ve also provided some quick XPath expressions you can use, as well as how to create site-specific XPath expressions within the “Input Variables” section of the post.
Post setup usage & datasets
Once the process is set up, all that’s required is filling out a short list of variables and the rest is automated for you.
The output dataset includes multiple automated CSV datasets, as well as a structured file format to keep things organized. A simple pivot of the core analysis automated CSV can provide you with the below dataset and many other useful layouts.
… Even some “new metrics”?
Okay, not technically “new,” but if you exclusively use the Google Search Console user interface, then you haven’t likely had access to metrics like these before: “Max Position,” “Min Position,” and “Count Position” for the specified date range – all of which are explained in the “Running your first analysis” section of the post.
To really demonstrate the impact and usefulness of this dataset, in the video below we use the Colab tool to:
- [3 Minutes] — Find non-brand <title> optimization opportunities for https://www.inseev.com/ (around 30 pages in video, but you could do any number of pages)
- [3 Minutes] — Convert the CSV to a more useable format
- [1 Minute] – Optimize the first title with the resulting dataset
Okay, you’re all set for the initial rundown. Hopefully we were able to get you excited before moving into the somewhat dull setup process.
Keep in mind that at the end of the post, there is also a section including a few helpful use cases and an example template! To jump directly to each section of this post, please use the following links:
- One-time setup of the script in Google Colab
- Running your first analysis
- Practical use cases and templates
[Quick Consideration #2] — This tool has been heavily tested by the members of the Inseev team. Most bugs [specifically with the web scraper] have been found and fixed, but like any other program, it is possible that other issues may come up.
- If you encounter any errors, feel free to reach out to us directly at email@example.com or firstname.lastname@example.org, and either myself or one of the other members of the data engineering team at Inseev would be happy to help you out.
- If new errors are encountered and fixed, we will always upload the updated script to the code repository linked in the sections below so the most up-to-date code can be utilized by all!
Things you’ll need:
- Google Drive
- Google Cloud Platform account
- Google Search Console access
Video walkthrough: tool setup process
Below you’ll find step-by-step editorial instructions in order to set up the entire process. However, if following editorial instructions isn’t your preferred method, we recorded a video of the setup process as well.
As you’ll see, we start with a brand new Gmail and set up the entire process in approximately 12 minutes, and the output is completely worth the time.https://www.youtube.com/embed/MzJ30CcTzAw
Keep in mind that the setup is one-off, and once set up, the tool should work on command from there on!
Editorial walkthrough: tool setup process
- Download the files from Github and set up in Google Drive
- Set up a Google Cloud Platform (GCP) Project (skip if you already have an account)
- Create the OAuth 2.0 client ID for the Google Search Console (GSC) API (skip if you already have an OAuth client ID with the Search Console API enabled)
- Add the OAuth 2.0 credentials to the Config.py file
Part one: Download the files from Github and set up in Google Drive
Download source files (no code required)
1. Navigate here.
2. Select “Code” > “Download Zip”
*You can also use ‘git clone https://github.com/jmelm93/query-optmization-checker.git‘ if you’re more comfortable using the command prompt.
Initiate Google Colab in Google Drive
If you already have a Google Colaboratory setup in your Google Drive, feel free to skip this step.
1. Navigate here.
2. Click “New” > “More” > “Connect more apps”.
3. Search “Colaboratory” > Click into the application page.
4. Click “Install” > “Continue” > Sign in with OAuth.
5. Click “OK” with the prompt checked so Google Drive automatically sets appropriate files to open with Google Colab (optional).
Import the downloaded folder to Google Drive & open in Colab
1. Navigate to Google Drive and create a folder called “Colab Notebooks”.
IMPORTANT: The folder needs to be called “Colab Notebooks” as the script is configured to look for the “api” folder from within “Colab Notebooks”.
2. Import the folder downloaded from Github into Google Drive.
At the end of this step, you should have a folder in your Google Drive that contains the below items:
Part two: Set up a Google Cloud Platform (GCP) project
If you already have a Google Cloud Platform (GCP) account, feel free to skip this part.
1. Navigate to the Google Cloud page.
2. Click on the “Get started for free” CTA (CTA text may change over time).
3. Sign in with the OAuth credentials of your choice. Any Gmail email will work.
4. Follow the prompts to sign up for your GCP account.
You’ll be asked to supply a credit card to sign up, but there is currently a $300 free trial and Google notes that they won’t charge you until you upgrade your account.
Part three: Create a 0Auth 2.0 client ID for the Google Search Console (GSC) API
1. Navigate here.
2. After you log in to your desired Google Cloud account, click “ENABLE”.
3. Configure the consent screen.
- In the consent screen creation process, select “External,” then continue onto the “App Information.”
Example below of minimum requirements:
- Skip “Scopes”
- Add the email(s) you’ll use for the Search Console API authentication into the “Test Users”. There could be other emails versus just the one that owns the Google Drive. An example may be a client’s email where you access the Google Search Console UI to view their KPIs.
4. In the left-rail navigation, click into “Credentials” > “CREATE CREDENTIALS” > “OAuth Client ID” (Not in image).
5. Within the “Create OAuth client ID” form, fill in:
- Application Type = Desktop app
- Name = Google Colab
- Click “CREATE”
6. Save the “Client ID” and “Client Secret” — as these will be added into the “api” folder config.py file from the Github files we downloaded.
- These should have appeared in a popup after hitting “CREATE”
- The “Client Secret” is functionally the password to your Google Cloud (DO NOT post this to the public/share it online)
Part four: Add the OAuth 2.0 credentials to the Config.py file
1. Return to Google Drive and navigate into the “api” folder.
2. Click into config.py.
Created on May 4th 2021 06:11. Viewed 145 times.