Articles

Page Level Query Analysis at Scale with Google Colab, Python, & the GSC API

by Calefyb Tech digital marketing services

Anyone who does SEO as part of their job knows that there’s a lot of value in analyzing which queries are and are not sending traffic to specific pages on a site.

The most common uses for these datasets are to align on-page optimizations with existing rankings and traffic, and to identify gaps in ranking keywords.

However, working with this data is extremely tedious because it’s only available in the Google Search Console interface, and you have to look at only one page at a time.

On top of that, to get information on the text included in the ranking page, you either need to manually review it or extract it with a tool like Screaming Frog.

but even the above view would only be viable one page at a time, and as mentioned, the actual text extraction would have had to be separate as well.

Given these apparent issues with the readily available data at the SEO community’s disposal, the data engineering team at Inseev Interactive has been spending a lot of time thinking about how we can improve these processes at scale.

One specific example that we’ll be reviewing in this post is a simple script that allows you to get the above data in a flexible format for many great analytical views.

Better yet, this will all be available with only a few single input variables.

A quick rundown of tool functionality

The tool automatically compares the text on-page to the Google Search Console top queries at the page-level to let you know which queries are on-page as well as how many times they appear on the page. An optional XPath variable also allows you to specify the part of the page you want to analyze text on.

This means you’ll know exactly what queries are driving clicks/impressions that are not in your <title>, <h1>, or even something as specific as the first paragraph within the main content (MC). The sky’s the limit.

For those of you not familiar, we’ve also provided some quick XPath expressions you can use, as well as how to create site-specific XPath expressions within the “Input Variables” section of the post.

Post setup usage & datasets

Once the process is set up, all that’s required is filling out a short list of variables and the rest is automated for you.

The output dataset includes multiple automated CSV datasets, as well as a structured file format to keep things organized. A simple pivot of the core analysis automated CSV can provide you with the below dataset and many other useful layouts.

A simple pivot table of the core analysis automated CSV.

… Even some “new metrics”?

Okay, not technically “new,” but if you exclusively use the Google Search Console user interface, then you haven’t likely had access to metrics like these before: “Max Position,” “Min Position,” and “Count Position” for the specified date range – all of which are explained in the “Running your first analysis” section of the post.

Example pivot table with

To really demonstrate the impact and usefulness of this dataset, in the video below we use the Colab tool to:

  1. [3 Minutes] — Find non-brand <title> optimization opportunities for https://www.inseev.com/ (around 30 pages in video, but you could do any number of pages)
  2. [3 Minutes] — Convert the CSV to a more useable format
  3. [1 Minute] – Optimize the first title with the resulting dataset

Okay, you’re all set for the initial rundown. Hopefully we were able to get you excited before moving into the somewhat dull setup process.

Keep in mind that at the end of the post, there is also a section including a few helpful use cases and an example template! To jump directly to each section of this post, please use the following links: 

[Quick Consideration #1] — The web scraper built into the tool DOES NOT support JavaScript rendering. If your website uses client-side rendering, the full functionality of the tool unfortunately will not work.

[Quick Consideration #2] — This tool has been heavily tested by the members of the Inseev team. Most bugs [specifically with the web scraper] have been found and fixed, but like any other program, it is possible that other issues may come up.

  • If you encounter any errors, feel free to reach out to us directly at jmelman@inseev.com or info@inseev.com, and either myself or one of the other members of the data engineering team at Inseev would be happy to help you out.
  • If new errors are encountered and fixed, we will always upload the updated script to the code repository linked in the sections below so the most up-to-date code can be utilized by all!

One-time setup of the script in Google Colab (in less than 20 minutes)

Things you’ll need:

  1. Google Drive
  2. Google Cloud Platform account
  3. Google Search Console access

Video walkthrough: tool setup process

Below you’ll find step-by-step editorial instructions in order to set up the entire process. However, if following editorial instructions isn’t your preferred method, we recorded a video of the setup process as well.

As you’ll see, we start with a brand new Gmail and set up the entire process in approximately 12 minutes, and the output is completely worth the time.https://www.youtube.com/embed/MzJ30CcTzAw

Keep in mind that the setup is one-off, and once set up, the tool should work on command from there on!

Editorial walkthrough: tool setup process

Four-part process:

  1. Download the files from Github and set up in Google Drive
  2. Set up a Google Cloud Platform (GCP) Project (skip if you already have an account)
  3. Create the OAuth 2.0 client ID for the Google Search Console (GSC) API (skip if you already have an OAuth client ID with the Search Console API enabled)
  4. Add the OAuth 2.0 credentials to the Config.py file

Part one: Download the files from Github and set up in Google Drive

Download source files (no code required)

1. Navigate here.

2. Select “Code” > “Download Zip”

*You can also use ‘git clone https://github.com/jmelm93/query-optmization-checker.git if you’re more comfortable using the command prompt.

Select Code then Download Zip
Initiate Google Colab in Google Drive

If you already have a Google Colaboratory setup in your Google Drive, feel free to skip this step.

1. Navigate here.

2. Click “New” > “More” > “Connect more apps”.

Click New then More then Connect more apps

3. Search “Colaboratory” > Click into the application page.

Search for Colaboratory and Click into the application page

4. Click “Install” > “Continue” > Sign in with OAuth.

Click Install then Continue then Sign in with OAuth

5. Click “OK” with the prompt checked so Google Drive automatically sets appropriate files to open with Google Colab (optional).

Click
Import the downloaded folder to Google Drive & open in Colab

1. Navigate to Google Drive and create a folder called “Colab Notebooks”.

IMPORTANT: The folder needs to be called “Colab Notebooks” as the script is configured to look for the “api” folder from within “Colab Notebooks”.

Error resulting in improper folder naming.
Error resulting in improper folder naming.

2. Import the folder downloaded from Github into Google Drive.

At the end of this step, you should have a folder in your Google Drive that contains the below items:

The folder should contain the query optimization checker and the README.MD

Part two: Set up a Google Cloud Platform (GCP) project

If you already have a Google Cloud Platform (GCP) account, feel free to skip this part.

1. Navigate to the Google Cloud page.

2. Click on the “Get started for free” CTA (CTA text may change over time).

Click Get Started For Free

3. Sign in with the OAuth credentials of your choice. Any Gmail email will work.

4. Follow the prompts to sign up for your GCP account.

You’ll be asked to supply a credit card to sign up, but there is currently a $300 free trial and Google notes that they won’t charge you until you upgrade your account.

Part three: Create a 0Auth 2.0 client ID for the Google Search Console (GSC) API

1. Navigate here.

2. After you log in to your desired Google Cloud account, click “ENABLE”.

Click Enable in GSC API

3. Configure the consent screen.

  • In the consent screen creation process, select “External,” then continue onto the “App Information.”

Example below of minimum requirements:

App information window for the consent screen.
Developer contact information section of consent screen.
  • Skip “Scopes”
  • Add the email(s) you’ll use for the Search Console API authentication into the “Test Users”. There could be other emails versus just the one that owns the Google Drive. An example may be a client’s email where you access the Google Search Console UI to view their KPIs.
Add the emails you’ll use for the Search Console API authentication into the Test Users

4. In the left-rail navigation, click into “Credentials” > “CREATE CREDENTIALS” > “OAuth Client ID” (Not in image).

In the left-rail navigation, click into Credentials then CREATE CREDENTIALS then OAuth Client ID

5. Within the “Create OAuth client ID” form, fill in:

  • Application Type = Desktop app
  • Name = Google Colab
  • Click “CREATE”
Within the Create OAuth client ID form, fill in Application Type as Desktop app, Name as Google Colab, then Click CREATE

6. Save the “Client ID” and “Client Secret” — as these will be added into the “api” folder config.py file from the Github files we downloaded.

  • These should have appeared in a popup after hitting “CREATE”
  • The “Client Secret” is functionally the password to your Google Cloud (DO NOT post this to the public/share it online)

Part four: Add the OAuth 2.0 credentials to the Config.py file

1. Return to Google Drive and navigate into the “api” folder.

2. Click into config.py.

Click into config.py


Sponsor Ads


About Calefyb Tech Innovator   digital marketing services

15 connections, 0 recommendations, 53 honor points.
Joined APSense since, March 24th, 2021, From Noida, India.

Created on May 4th 2021 06:11. Viewed 290 times.

Comments

No comment, be the first to comment.
Please sign in before you comment.