Google Algorithm Leak Opens a Firehose


API Documentation Reveals the Inner Workings of Google’s Search Algorithm and That Google Just Might Be Evil After All

In our social media and news-driven world small things can blow up faster than a fat kid at a cupcake cart. The stock market can rise or fall on the word choice of the Federal Reserve Chairman. A simple tweet can go viral, leading to massive backlash far beyond the original intent. A short, cryptic “drop”, or Internet post, by Q, caused hundreds of QAnon followers to assemble in downtown Dallas awaiting the return of JFK Jr. (True story 1, 2, 3)

But when evidence of how Google ranks search results is revealed, it’s like seeing the Virgin Mary burned into a grilled cheese sandwich. (Another true story 4, 5)

What Was the Google Algorithm Leak in May 2024?

In March 2024, thousands of documents containing over 2,500 pages of API documentation about Google’s ranking algorithm were leaked on GitHub. (6, 7) In May, an industry insider with access to Google ex-employees vouched for their authenticity and forwarded them to SEO guru and one-time Google Search Liaison Rand Fishkin. Fishkin reviewed the documents and talked to the then-anonymous source Erfan Azimi, an SEO, and the founder of EA Eagle Digital. (8)

Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital
Erfan Azimi, CEO and director of SEO for digital marketing agency EA Eagle Digital

On May 27, Fishkin published his findings. (8) Within one day, search engine optimization (SEO) sites and SEOs were excitedly discussing (9)  the revelations in these documents.

The leak contradicted what Google has said publicly about what drives rankings and Google re-ranks sites behind the scenes with “Twiddlers”.

Many of their claims directly contradict public statements made by Googlers over the years, in particular the company’s repeated denial that click-centric user signals are employed, denial that subdomains are considered separately in rankings, denials of a sandbox for newer websites, denials that a domain’s age is collected or considered, and more.

Rand Fishkin, May 27, 2024

While the leaked documents mention various ranking factors, none are weighted. The main takeaway remains the same as we’ve been discussing for years: once the SEO basics are covered, it’s all about creating helpful content that engages users. And probably no surprise, a page’s link diversity and relevance remain key, and PageRank is still very much alive within Google’s ranking features. (6)

To understand the technical API documentation, Fishkin employed the help of Mike King, the Founder and CEO of iPullRank. (10) King published a summary of what he found in the documentation, (11) including what he called Google’s “gaslighting” on what is and isn’t a ranking factor. King claims the documentation reveals the misstatements meant to throw off spammers and SEOs like the following:

  • Google says it doesn’t use a site authority score, when in fact they do consider the overall domain authority.
    • The inference is that bad content can bring down the rankings of good (helpful, authoritative, engaging, expert) content because the whole site’s authority is downgraded.
  • Contrary to public statements, Google uses click history to promote, demote, or reinforce a ranking. Bad clicks, good clicks, last longest clicks, unsquashed clicks, and unsquashed last longest clicks are all considered metrics. (According to a Google patent, “Squashing is a function that prevents one large signal from dominating the others.”)
    • King states the speculation that click-through rate and “dwell time” are viewed as engagement and ranking factors are indirectly proven.
      • “The bottom line here is that you need to drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank.”
  • Another search algorithm secret is the use of “Twiddlers”. Twiddlers are re-ranking functions that boost or demote a page after the main search algorithm ranks it. According to King, twiddlers can limit the content type appearing in search results to promote diversity.
    • The lesson here is that to rank well for a query, you may need the same content in different formats such as a blog, a video, and an image.

What Not to Do for SEO

King further examines algorithmic demotions included in the documentation.

  • Anchor Mismatch – When the link does not match the target site it’s linking to, the link is demoted on the calculations.
  • SERP Demotion – A signal indicating demotion based on factors observed from the SERP, suggesting potential user dissatisfaction with the page as likely measured by clicks.
  • Nav Demotion –  Presumably, this is a demotion applied to pages exhibiting poor navigation practices or user experience issues.
  • Exact Match Domains Demotion – In late 2012, Matt Cutts announced that exact match domains would not get as much value as they did historically. There is a specific feature for their demotion.
  • Location demotions – There is an indication that “global” pages and “super global” pages can be demoted. This suggests that Google attempts to associate pages with a location and rank them accordingly. (Said another way, Google interprets some queries as having local intent, and looks for local-optimized content, not just information.)

More of Mike King’s Conclusions (11) from the Leaked API Documentation

  • Font size and bolding of key terms and links are considered when categorizing the content.
  • Long documents get truncated, so put the most important content first.
  • Short content is scored for originality, so don’t publish repetitive “thin” content.
  • Dates are important. Google looks at the “byline” date in the content, any date in the URL or page title, and the last content update to gauge “freshness”, which is rewarding in rankings.
    • Don’t put dates in URLs or meta titles but do specify the published date AND update that date when the content is refreshed.
  • There’s no character limit on meta titles. It’s still best practice to limit title length to what can be displayed in the SERPs to get click-throughs, but longer titles can be used to help rankings.

What Does Google Count as User Engagement?

The information revealed during the Google antitrust trial (12) combined with this leaked API documentation provides more detail on what constitutes engagement. This is valuable ammunition for SEOs to demonstrate that ranking involves more than just keywords.

As previously stated, clicks beget more clicks. Not only do click-throughs reinforce the queries for which content will rank, but post-click behavior can further improve rankings.

Fishkin’s summary (8) observed “A handful of modules in the documentation make reference to features like “goodClicks,” “badClicks,” “lastLongestClicks,” impressions, squashed, unsquashed, and unicorn clicks.” This data is used by systems called Navboost and Glue, which use click data to determine search results rankings.

For example, bad clicks are those followed by a quick “back” navigation, meaning the search result was not a good user experience for the query. Google has said that the bounce rate wasn’t a ranking factor, but rather an indication of quality content and relevance, but now that doesn’t seem true. And “lastLongestClicks” implies that “dwell time” is measured as engagement and is a ranking factor.

Search clicks and engagement are powerful enough that user clicks within a geographic region can outweigh the big ranking signals like optimized content and highly relevant backlinks.

What’s Mike King’s Conclusion from the Leaked Documentation?

In short, his best advice is to make great content and promote it well.

After reviewing these features that give Google its advantages, it is quite obvious that making better content and promoting it to audiences that it resonates with will yield the best impact on those measures.