The Implications of the Yandex Source Code Leak
Updated: Feb 23
If you’re involved in Search Engine Optimization (SEO), you’ve likely heard of the recent Yandex source code leak. Even if you’re not, the news was in many ways notable; it’s an unprecedented event with ample room for possible developments. Some of them should be pretty obvious. SEOs have much to learn from the codebase that’s now available, for instance. This may present opportunities for cheap SEO but also genuine informed creativity toward SEO improvements. Other implications may be less so, like how Google may respond and adjust. So, while the story is still developing, let us explore the current implications of the Yandex source code leak.
Yandex as a search engine
Initial reactions have included indifference, mainly from those who underestimate Yandex as a search engine. While understandable, this may not be wise – as we’ll see next. MoversTech CRM advises that marketers don’t overlook any notable search engines regarding overall SEO performance and penetrating specific markets.
Indeed, Yandex is not Google, not in market share, and not in essence. StatCounter reports that Yandex holds a modest 0.85% global market share, apart from Google’s 92.9%. However, going beyond that one statistic likely misses the bigger picture. Market share-wise, for instance, consider that Yandex is:
The third-largest search engine in Europe, at 1.71% - higher than Yahoo and DuckDuckGo combined The fifth-largest search engine in Asia, at 0.9% - only 0.08% behind Yahoo The largest search engine in Russia, at 54.37% - considerably ahead of Google’s 43.54% The final point alone should highlight that Yandex is key for marketers seeking to penetrate the Russian market.
That’s not all, however; Yandex, as a company, also has a very notable lead in compound annual growth rate (CAGR). Indeed, Finbox reports that Yandex holds an impressive revenue forecast CAGR of 36.4%. Compare this to its competitors, as far as search engines go:
Alphabet Inc holds one of the 10.2% Microsoft Corporation holds one of the 10.1% Baidu Inc holds one of the 8%
In combination, these should help highlight why Yandex should not be overlooked. The overlap between Yandex and Google
That said, Yandex is not Google – so the implications of the Yandex source code leak may still be underestimated. Here, then, the overlap between the two should be stressed. For one, software engineers from both companies share a broader ecosystem. They attend the same conferences where they share findings, and many have worked for both companies – some for many years. But more importantly, Yandex has been using open-source Google technologies for a number of years too. The leak confirms that it relies quite heavily on Google’s own workings, as the Search Engine
Results Pages (SERPs) of the two are quite similar.
For instance, the leak confirms that Yandex’s Deep Structured Semantic Models (DSSM) are using some Google data for calculations. Combine that with the fact that Yandex scrapes Google, as well as Bing and others, and the overlap becomes substantive. Put simply, the thought SEOs had on “learning” Google through Yandex is not quite far-fetched. And since 93% of all online interactions start with a search engine, there’s sheer value in doing so. The implications of the Yandex source code leak for SEO So, with all that in mind, what are the main implications of the Yandex leak? While the story continues to develop, some are certain. #1 SEO will inevitably change
The first and most obvious implication is that SEO will inevitably change. The leak has presented the first opportunity to explore a major search engine’s code to this extent. While the codebase is incomplete and too vast to make sense of just yet, SEOs will undoubtedly explore it.
Consider, for instance, the following notable findings thus far. How many ranking factors are there?
Google-centric SEO has long held that 200+ ranking factors are at play as far as Google goes. SEO really works, even based on this tenet, so this seemed like safe, passed knowledge. Because of this belief, one can imagine the community’s shock when Martin MacDonald shared a codebase file that outlined 1922. That’s an almost tenfold increase, even if Yandex is not Google. And yet, while that’s the number currently circulating, 1922 are still too few. The full codebase reveals more subsets, whose total ranking factors add up to a stunning 17854. A series of Jupyter notebooks outline another 2000, which were presumably considered for addition by engineers. Together, those are just shy of 20000 ranking factors, each with its own practical or theoretical role. How do these factors work?
How these factors work poses the more substantive question, then – and the bigger headache. This is where the implications of the Yandex source code leak for SEO come into full swing. MatrixNet documentation states that scoring relies on “tens of thousands of factors, which significantly increases the relevance of search results.” This comes in addition to customization based on search queries, which groups ranking factors into:
Static, which are factors describing a web page itself Dynamic, which are factors that describe the interaction or match between a web page and a query Query-specific, which are factors unique to search geolocation, history, and so on How are these factors weighed? Which, of course, gives rise to questions on weights. Indeed, whether ranking factors are 17854 or 20000+, they can’t all weigh the same. Yandex may be using complex calculations with different slots for individual factors, but it does indeed weigh factors. The most positively impactful ones include having a dot com domain, word clickability, and, likely, direct keyword matches through Yandex Bar searches. The most negatively impactful ones include:
The presence of ads. The age of content. The location matches the content and the user.
Others, for which weight can’t be guessed yet, include a dot ru domain; Yandex seems to demote Russian domains. They include whether a page is a shop and how often the same URL appears in SERPs. Some of them have seen prior speculation, while others have not – and their weights are a substantial, unclear factor. #2 Yandex will change
While the SEO community grapples with these questions and the data at hand, the implications of the Yandex source code leak also expand to Yandex’s own operations. For one, that’s because the event itself is unprecedented, so public and market perceptions may still only be guessed. The leaked repository includes data for Yandex products and services, including:
The Yandex search engine and indexing bot Yandex Maps Alice, an AI assistant Yandex Taxi Yandex Direct, an ads service Yandex Mail Yandex Disk, a cloud storage service Yandex Market Yandex Travel Yandex360, a workspaces service Yandex Cloud Yandex Pay Yandex Metrika
In a statement to BleepingComputer, Yandex asserts that “the content [of the leaked repository] differs from the current version of the repository used in Yandex services .”They dub the leak as “code fragments from an internal repository” and say they “are conducting an internal investigation.” Yandex is well aware of the ongoing discussion surrounding the leak. As such, one can reasonably expect they will change their course in response to the public’s insights. No search engine would want SEOs to “game” their algorithms through such insights – which has long been Google’s stance. #3 Google will respond
And finally, while the issue does not directly involve Google, one can safely assume it will respond as well. There’s the pressing issue of SEO insights, for one, but also likely lawyer involvement in all the scraping. There’s even likely benefit in their own engineers acquiring insights and drawing inspiration. One can certainly only guess; it’s arguably a very safe guess that Google will not remain idle. Whether that means revising their own security protocols, adjusting ranking factors and looking out for SEOs, lawyering up, benefitting themselves, or any combination of the above, there is too much buzz not to get involved. The bottom line
In summary, many of the implications of the Yandex source code leak can only be guessed. How Yandex’s investigation and follow-up actions will go, how Google will react and possibly adjust, and more are up in the air. What’s certain is that the SEO landscape will change irreversibly. The sheer wealth of ranking factors, their weights and roles, and their very nature should enlighten the community. The leak should, at the very least, evolve SEO away from the current, simplified understandings of search engines’ inner workings.
The codebase continues to be explored, too, including with the use of ChatGPT. As more missing pieces are put into place and connections are understood, this unprecedented event should shed more light on the craft of SEO than ever before. Description: Read about the likely implications of the Yandex source code leak, from SEO developments to Yandex’s and Google’s responses. Keyword: implications of the Yandex source code leak
Sources: https://gs.statcounter.com/search-engine-market-share https://gs.statcounter.com/search-engine-market-share/all/europe https://gs.statcounter.com/search-engine-market-share/all/asia https://gs.statcounter.com/search-engine-market-share/all/russian-federation https://finbox.com/NASDAQGS:YNDX/explorer/revenue_proj_cagr_5y https://www.bleepingcomputer.com/news/security/yandex-denies-hack-blames-source-code-leak-on- former-employee/ Image source: https://unsplash.com/photos/zcpj2sUdUrQ