Google Search Console Low Hanging Fruits Analysis for SEO: Using Python, Pandas, Regression, and Bayesian libraries.

This project provides a comprehensive Python-based exploratory data analysis (EDA) of Google Search Console data to identify SEO optimization opportunities. The analysis focuses on "low-hanging fruit" - pages that rank in positions 11-30 and have potential for quick traffic gains.

The idea is straightforward: use a Python-based exploratory data analysis (EDA) workflow to scan your Search Console data and spot SEO opportunities. But don’t worry—the goal here isn’t to turn marketers into data scientists. The goal is to make the thinking clearer, faster, and more confident, using analysis as a flashlight.

The “Positions 11–30” Sweet Spot

Let’s talk about the range this project focuses on: average rankings between 11 and 30.

That’s not an arbitrary slice. It’s a particularly interesting zone because it’s where pages often live when they’re already relevant, already indexed, already earning impressions—but not yet getting the clicks they deserve. In other words: Google is already giving you a chance, just not a prime one.

And that’s why marketers love this category once they see it clearly. Because moving a page from position 18 to position 9 can feel dramatically more achievable than moving something from position 68 to position 9.

Sample Data

Below is the sample dataset we exported for this project. The original keywords have been masked/redacted to protect the source.

Sample data from the csv file we exported from Google Search Console
💡
All the code for this analysis is on Github for reference.

INTRODUCTION TO REGRESSION ANALYSIS for SEO

Regression analysis is a statistical method that examines the relationship between a dependent variable (the outcome we want to predict) and one or more independent variables (the factors that may influence the outcome).

Key Concepts:

The dependent variable is the outcome you want to predict or explain—meaning it changes in response to other factors. In an SEO context, common dependent variables include clicks (how many times users click your search results), CTR or click-through rate (the percentage of impressions that turn into clicks), and position (your ranking in search results).

Independent variables are the inputs you use to predict the dependent variable. They are treated as the “explanatory” factors in the model. In SEO analysis, examples include impressions (how many times your result is shown), position (when you are trying to predict CTR), and clicks plus impressions (when you are trying to predict position).

Regression analysis focuses on the relationship between these variables by estimating how changes in the independent variables are associated with changes in the dependent variable. For example, it can help answer questions like how increasing impressions affects clicks, how position impacts click-through rate, or whether you can predict search position based on clicks and impressions.

Why Regression Analysis Matters for SEO: Regression is valuable because it helps identify which factors most strongly influence search performance and quantifies their impact in measurable terms. Instead of relying on intuition, you can translate changes in metrics into expected outcomes (for instance, estimating how many additional clicks might result from a CTR improvement). This supports data-driven optimization strategies and enables forecasting future performance based on current trends and relationships in your SEO data.

Comprehensive position impact analysis:

It shows, at a glance, how moving up or down the rankings changes CTR, clicks, and impressions —and how visibility relates to CTRs(so you can see the payoff of better positions.)

  • Top-left: CTR vs. rank. Points show actual CTR by position; curved fit shows
    CTR peaks on higher ranks and fades as rank worsens.
  • Top-right: Clicks vs. rank. Points show clicks by position; curved fit shows
    clicks drop as rank moves down.
  • Bottom-left: Impressions vs. rank. Points show how many times you were seen;
    straight line shows impressions usually decline as rank worsens.
  • Bottom-right: CTR vs. impressions. Points tie visibility to CTR; straight
    line shows whether getting seen more tends to raise or lower CTR in this
    data.
It’s a quick snapshot of how rank shifts impact CTR, clicks, impressions, and their link—so you see why climbing the rankings pays off.
A histogram of the CTR distribution. As you can see, more than 30+ keywords have zero CTR.

Impact of Average Position of CTR

It’s a bubble scatter of CTR vs. rank by four color-coded rank bands (11–15, 16–20, 21–25, 26–30), with bigger bubbles for more impressions and a dashed curve showing the overall CTR-vs-position pattern.

Impressions vs. CTR: Regression analysis

It’s CTR vs. how often you show up(impressions). Dots = queries; color/size = rank. The black line shows the overall trend—do more impressions typically coincide with higher (or lower) CTR?

For a marketer, in this particular data, the intercept line basically says “how often you’re shown(impressions) isn’t driving CTR here.” The flat slope means more impressions don’t automatically raise or lower CTR, so you shouldn’t expect volume alone to fix click-through.

Takeaway: It tests whether pages that are shown more (impressions) also get higher or lower CTR, while letting you spot clusters by position via color/size. Slope ≈ 0.0000: CTR barely changes as impressions vary. The data is heavily skewed by lots of keywords with zero CTR

Top 10 keywords to optimize for based on impressions alone(low hanging fruits and quick wins)

Position vs Impression: Relationship

Trend: The plot fits a straight regression line (purple) through the points to show the overall direction—whether more impressions tend to come with better (lower) positions or not.
Reading the line: A downward slope means higher impression volume is associated with better ranks; a flat slope means little relationship in this sample.

It’s a quick check of whether impressions go hand in hand with better rankings.

Pairplots of SEO metrics: Clicks vs. CTR vs. Impressions

It’s a grid of small charts showing how each metric relates to every other one, plus the distribution of each metric on the diagonal—use it to eyeball which pairs move together (or don’t) and to spot any oddball points.

Purpose: a quick matrix view to spot relationships, clusters, or outliers across all metrics

Bayesian Hierarchical Analysis

INTRODUCTION

This hierarchical analysis examines how SEO performance metrics vary
across different impression levels. By grouping keywords into Low, Mid,
and High impression categories, we can uncover patterns and insights
that would be missed in aggregate analysis.

IMPRESSION GROUPING: Keywords were divided into three impression groups:

  • Low: < 319.0 impressions
  • Mid: 319.0 - 458.0 impressions
  • High: > 458.0 impressions
GROUP STATISTICS
----------------
                 Impressions                          ...       CTR    Clicks    
                       count        mean         std  ...       std      mean sum
Impression_Group                                      ...                        
Low                       19  287.947368   19.392077  ...  0.002699  0.368421   7
Mid                       16  382.750000   42.008729  ...  0.003092  0.750000  12
High                      18  736.777778  318.016566  ...  0.002967  1.222222  22

[3 rows x 9 columns]
KEY INSIGHTS
-----------
1. POSITION PATTERNS:
   - Low impression group: Average position = 17.07
   - Mid impression group: Average position = 18.13
   - High impression group: Average position = 16.28
   - Higher impression keywords tend to have better (lower) positions

2. CTR PATTERNS:
   - Low impression group: Average CTR = 0.001
   - Mid impression group: Average CTR = 0.002
   - High impression group: Average CTR = 0.002

3. IMPRESSION-CLICK RELATIONSHIP:
   - Low impression group: 7.0 total clicks
   - Mid impression group: 12.0 total clicks
   - High impression group: 22.0 total clicks

4. PERFORMANCE VARIABILITY:
   - Position variability is highest in: Low impression group
   - CTR variability is highest in: High impression group

Compare how rank relates to impression volume separately for Low, Mid, and High impression segments. Read it like: in each segment, does getting more impressions coincide with better or worse ranks? A downward red line means higher impressions go with better rank; flat means little relationship; upward would mean more impressions despite weaker rank.

Check the slope in each panel: Downward means more impressions generally align with better rank; flat means little relationship. Spot outliers: Points far from the trendline indicate keywords/pages over- or under-performing for that bucket. Compare panels: If only High slopes down, top-volume terms rank best; if Low/Mid slope down, improving rank could move them into higher-impression buckets.

The Hidden Gold on Page 2: Turning High-Impression Keywords into Traffic

This analysis shows that many of your “next wins” are already getting impressions but not clicks. By focusing on the page‑2 and page‑3 SERP sweet spot (positions 11–25), you can unlock meaningful CTR gains without major content rewrites—just by running basic SEO sanity checks and optimizations. Combine that with strategic internal linking and a few authority boosts, and those high‑impression, low‑click queries can quickly turn into consistent traffic drivers.

💡
All the code for this analysis is on Github for reference.

Read more