Blocking Referer Spam (shorter version)

For a tale, see Blocking Referer Spam, but if you want the Cliff's Notes version:

  • Referer spam refers to bozos who bombard sites with traffic, peppering the HTTP Referer: header with various URLs, the goal being to drive traffic back to their sites.
  • Even if you don't post Referers (really, referrers, but someone misspelled it in 1993 or so and we're stuck now), you'll get Referer Spam.
  • Sometimes it's combined with attempts to exploit open proxies and generate click through revenue for affiliate and some ad sites.

Sites using Apache can respond to referer spam by modifying their configuration files or their .htaccess files and the Apache URL Rewrite Engine.

The simplest approach is to block by IP address:

deny from 127.0.0.1

Now this approach will still record a hit in your access log,but it will be recorded with a 403 status code. This approach is a bit of a sledgehammer (you could block everyone from the 127 network with deny from 127. for example) and requires frequent maintence since the spammers bounce around from system to system.

Another approach is to block by pattern:

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^.*pattern.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^.*another-pattern.*$ [NC]
RewriteRule . - [F,L]

Where pattern and another-pattern could be poker, gambling, high-stakes (just to pick a few of the words I'm blocking on). Now, this will effectively deny access to your site for anyone submitting a referer containing the patterns you use. Again, you'll still get the hit recorded.

Note that blocking by pattern could block some legitimate requests. For example, in a few days I'm fairly certain that a search on poker referer spam will return this page. If you click on the link to this site you (should) get an error message back, assuming the search term is echoed in the Referer header.

Why is this important? It may just be a hit, but it requires processing on your server. If the target of the hit is a CGI or application (even PHP script) it requires more processing and takes that capacity away from your legitimate users. If you're just managing a personal site then it's unlikely to be a significant problem, more just an annoyance. On the other hand since the tools seem to blast away at practically any URL they can find on a site you're open to a denial of service attack if the spammers stray across a CGI or other application.

I don't post referers on my site with one exception, but I get spammed nonetheless. There's one exception: I do echo back the Referer in a <link> element. That's the Referer the active client used to retrieve the page, not all the referers that hit the site.

My personal practice is a mix of blocking by pattern (variations on poker, gambing, texas-hold-em, and a list of domain names which I noticed generating the spam traffic) and blocking by IP. I block an entire class A for example. Yes, that effectively shuts out several countries from my site but....just to be clear, this is my site. It's not public benefit, and if the net value to me of having traffic from that Class A is less than the impact of the bozos pounding away with spam robots, then I'll block the site.

«Little privacy hole in itp32.exe | Main |A meditation on sabbaticals »

:
:

Enter your email address:

Delivered by FeedBurner