How to Detect Bot Traffic and Protect Your Site from Web Scraping

{authorName}

Tech Insights for ProfessionalsThe latest thought leadership for IT pros

12 April 2022

Bot traffic can be highly damaging to a website. How can you spot and prevent these activities?

Article 6 Minutes
How to Detect Bot Traffic and Protect Your Site from Web Scraping
  • Home
  • IT
  • Security
  • How to Detect Bot Traffic and Protect Your Site from Web Scraping

Protecting your website from malicious activity needs to play a major role in any firm's security strategy, and one important form of activity to focus on is guarding against bot traffic. These connections can damage businesses in a number of ways, from exposing them to data breaches to disrupting critical online activities. This can result in significant costs, so investing in the right tools and technology now can pay major dividends later.

The importance of bot detection

Bot detection should be a growing concern for any business. According to figures from Imperva, for instance, more than a quarter of all web traffic in 2020 (25.6% of web requests) originated from malicious bots.

A major risk of bot traffic is distributed denial of service (DDoS) attacks. These seek to flood your server with requests in order to overwhelm its capacity, meaning legitimate traffic can't be served. This is more than just a nuisance; effective DDoS attacks can shut down operations completely, resulting in major loss of revenue or reputation, and are increasingly being used as part of wider attacks such as ransomware attempts.

Another risk of bot traffic is web scraping. This involves a bot going through your site to gather and extract data. While there are legitimate uses for this, such as gathering publicly-available data for business leads or market analysis, it can also be used by criminals.

For instance, web scraping can gather data for use in spear-phishing attacks or password cracking. At the same time, these bots drain your resources, harm your user experience and increase your server costs.

Challenges of bot detection

A key question for firms is how you separate bad bot traffic from legitimate visitors. This isn't always as simple as filtering out all bot activity. For example, if you want your site to be discoverable by search engines like Google, you'll have to allow its crawlers to work.

But telling malicious bots apart from other traffic isn’t always easy. Sophisticated bots are designed to look and act in much the same way as human users and, if they’re detected, botnet makers often have millions of compromised machines they can use to launch an attack, so simply playing whack-a-mole to block suspicious IP addresses is often ineffective.

What's more, bots are an extremely easy and cost-effective way for a malicious actor to attack a business. With the rise of bots-as-a-service making it easy for anyone to target a website, it can be difficult for defenses to keep up with the volume and variety of threats they face.

5 ways to identify bot traffic

To combat this issue, the first step must be to recognize what bot traffic looks like in order to filter it out and protect your business. But as noted above, this is no easy task in an age where bots can be made to appear more human-like than ever. However, there are still a few telltale signs that you can look out for, especially if you're faced with a cheaper, less sophisticated attack.

  1. High page views: Your analytics can tell you a lot about what your normal traffic profile looks like, and stats that stand out from the usual are always worth investigation. Traffic that's visiting many more pages on your site than normal are often a clear sign of an incoming DDoS attack.
  2. High bounce rate: Not every bot attack wants to cause disruption by clogging up your servers. Some web scraping bots are designed to load a page, take what they need and leave again in a matter of milliseconds, so an unusually high and fast bounce rate for your pages is another sign something's not right.
  3. Abnormal session durations: Page views that only last a few milliseconds are a common sign of bot traffic, but unusually long session durations need to be looked at too. Anything more than a few minutes is an indicator the page isn't being viewed by a human.
  4. Unusual locations: Spikes in traffic from parts of the world where you don't do business is a major red flag, as these are unlikely to be legitimate customers.
  5. Poor conversions: Gaining conversions that meet your site's goals - whether it's signing up for a newsletter or filling in a data capture form to view gated content - are usually a good sign. However, you need to look at the quality of these conversions. If a lot of them include junk data, or emails frequently encounter bouncebacks, this is another indicator you have bots scraping your site.

Bot detection methods and techniques

Knowing what to look for is only the first step, however. While a good antimalware defense and firewall can help automate the process of looking for bot traffic and deploy countermeasures to block illegitimate traffic, there are also a few things you can do within the design of your site itself to discourage the use of bots such as web scrapers. However, you also need to be aware of the potential issue these may cause.

Captchas

These have been around a while and aim to filter out bots by asking questions only a human can solve. They're still a common sight for protecting more sensitive content, but they have a couple of issues. Firstly, they tend to frustrate users, add friction and make your content less accessible. Secondly, advances in artificial intelligence and image recognition in recent years have also blunted their effectiveness, so you may be inadvertently putting off genuine users, while sophisticated bots can easily solve them.

Rate limiting

Restricting visitors to a set number of interactions on your site is a good way to filter out bots without affecting legitimate users. For example, only allowing a certain number of searches per second for each IP address won't hinder a human who's looking for information, but it can trip up bots. However, if visitors are using a shared internet connection, you can get multiple legitimate requests from the same IP address, so you should go beyond this to also prevent activities such as very fast form submissions.

Web application firewalls

Using a web application firewall can prevent bad bots from initiating attacks such as SQL injections, session hijacking and cross-site scripting, so should certainly be a part of any defense strategy. However, you shouldn't rely too heavily on these, as they work by looking for known threats and patterns, making them less effective at blocking advanced, adaptable bots that don’t use obvious attack signatures.   

Require authentication

Asking a visitor to sign up for an account before using your website's services can also be an effective way of stopping web scrapers. Requiring an email verification and login can prevent many straightforward bots.

However, this can only be fully effective if you have multi-factor authentication (MFA), as advanced bots can create their own accounts. One issue with this is that it will again add friction for legitimate users. Even a simple sign-up screen will put some people off, and MFA even more so. Therefore, you need to determine how important the content you're protecting is and whether it's worth the added hassle and potential for lower traffic overall.

Further reading:

 

Tech Insights for Professionals

Insights for Professionals provide free access to the latest thought leadership from global brands. We deliver subscriber value by creating and gathering specialist content for senior professionals.

Comments

Join the conversation...