In the last 2-3 years, the amount of automated traffic sweeping across the internet has grown a lot. Overzealous bots are something that we used to see every so often. Today they’re a routine pest that we are constantly fighting.
Counting bots
There's no easy way to measure the volume of bot traffic that's out there. According to Thales Group’s Bad Bot Report, 2024 was the year when automated traffic—bots—overtook the volume of human traffic on the internet. Other security researchers placed that milestone as early as 2021. Either way, it's clear that most bots are bad bots. According to Thales, bad bots accounted for a quarter of all traffic in 2020 and are now closing in on one-third. They make up a growing share of a growing total.
Barracuda's recent research adds more bad news: Bad bots are becoming more varied and, on average, more advanced as well. There are more threats than ever to identify, and they're getting better at avoiding detection.
The Open Worldwide Application Security Project or OWASP lists 21 types of bad bot behaviour. This ranges from the downright criminal, like verifying stolen credit card details, to less malicious activities like content scraping.
A lot of the recent growth in bot traffic is down to content scraping. This is unsurprising. It is well-known that training AI models like LLMs requires a lot of data, and this is one of the easiest ways to get it. These content scrapers don’t always obey robots.txt files or other polite requests to stay away. They can send thousands of requests per minute, whether you want them or not (hint: you don’t).
How bad "bad" bots are
You might think that if most of these bad bots are only grabbing whatever’s publicly published on your site, then there’s no big problem. But even at their best, bad bots are disruptive and unwelcome. Some follow every link they can find. Others try to brute force URLs just to see what they can find. Unlike human visitors they can get caught in loops, endlessly clicking through your site, without doing anything useful—and possibly causing quite a lot of harm. Those "clicks" are just milliseconds apart. They eat up bandwidth and soak up CPU, slowing down response times for everyone else.
If the internet had no protection against bad bots, then more than half the world’s hosting resources would be devoted to non-human users. In that world it’s likely that bots will receive your fastest page loads, leaving a worse experience for the real people who you want to impress. As Google’s web.dev article, 'Why Speed Matters', puts it:
Performance can also have a material effect on whether your website's users follow through. Slow sites have a negative impact on revenue, and fast sites are shown to increase conversion rates and improve business outcomes.
And this is before we factor in more malicious bots that are designed to steal sensitive data or probe for broader vulnerabilities.
We saw the previews
As an infrastructure operator, we have had a front-row seat to the boom in unwanted, automated traffic. In previous years we experienced bot surges that were big enough and surprising enough to blog about at the time. Now we know that they were just the warm-up.
In the summer of 2022-23 we blocked thousands of IPs from Azure, all because of one bot. It was unusual at the time for a bot attack to use thousands or millions of IPs, and we were surprised to trace those addresses back to Azure. It was a sign of the future when Microsoft’s Abuse team didn’t even respond to our messages about the huge amount of harmful traffic they were sending our way. (Our inboxes are still open, btw).
Fast-forward to mid-2024 and we were busy forcing Bytespider and Claudebot back from Cloud Containers. IP-blocking (or rate limiting) was much less useful this time, but user agents were a useful proxy for identifying the bots. Through analysis, testing and a quick-but-careful rollout of the right protections, we soon had the issue under control.
The days when a single bot attack was blog-worthy are long gone now.
This article is the first in a series about today's bot-filled internet. The next article will cover our full-time and proactive efforts to analyse, identify and repel bad bot traffic every day. This has been a big operational change for us, mostly invisible from the outside. It’s only going to become more important over time.
Bot protection takes more than a few firewalls or IP reputation lists. It has become an ongoing, proactive effort that we see as integral to the service that we offer. Every web host in the world faces the same problem, but not every host puts the same amount of energy into dealing with it.
All of this is our way of saying that just because we haven’t written about any massive bot attacks lately, that’s not because they’ve gone away. It's because they’ve become normal. We are working so hard to make sure that you are as unaffected as possible, that we've changed the way SiteHost operates. We’ll have more on the topic soon.
Main image created using Gemini.