Block SEMRush Bot

ashkir

Core license
Internals
V3 Beta tester
License owner
Big Chevereto
One big piece of advice I have for fellow big chevereto owners is to look in your logs for SEMRushBot. If you have it. Block it. Period. It hits my website 6000+ concurrent times a minute with over 100 concurrent connections trying to crawl the albums etc

Create a robots.txt in your root folder
Robots.txt:
User-agent: SemrushBot
Disallow: /
 

mkerala

Network license
Internals
V3 Beta tester
License owner
Big Chevereto
I had issues with FB bot which sent my DB haywire. But somehow it got less aggressive lately.

Google Analytics show 90+ users active from china out of 130+ total users active. There aren't many Chinese users in my site. So I guess it is another bot.
 

lovedigit

Network license
Internals
V3 Beta tester
License owner
Big Chevereto
I have also been seeing huge spike in Chinese traffic from Beijing.
I think that one is bot too. It comes in waves, and usually makes active requests anything between 30-80+ active users.
 

lovedigit

Network license
Internals
V3 Beta tester
License owner
Big Chevereto
Another resource intensive bot is Ahrefs bot. It doesn't serve any public purpose and solely made to crawl your website for any potential competitors to give advantage.

I came up with a list of bots that is abusive in nature.
You may want to remove advertising bots from the list if you're using google adsense or adwords.

robots.txt:
#Huge offenders

#adsbot
User-agent: adsbot
Disallow: /

#BLEXbot webmeup-crawler
User-agent: BLEXBot
Disallow: /

#Semrush bots
User-agent: SemrushBot
Disallow: /

#Ashref bots
User-agent: AhrefsBot
Disallow: /

# http://mj12bot.com/
User-agent: MJ12bot
Disallow: /

# advertising-related bots:
User-agent: Mediapartners-Google*
Disallow: /

# Crawlers that are kind enough to obey, but which we'd rather not have
# unless they're feeding search engines.
User-agent: UbiCrawler
Disallow: /

User-agent: DOC
Disallow: /

User-agent: Zao
Disallow: /

# Some bots are known to be trouble, particularly those designed to copy
# entire sites. Please obey robots.txt.
User-agent: sitecheck.internetseer.com
Disallow: /

User-agent: Zealbot
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: Fetch
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: linko
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: Microsoft.URL.Control
Disallow: /

User-agent: Xenu
Disallow: /

User-agent: larbin
Disallow: /

User-agent: libwww
Disallow: /

User-agent: ZyBORG
Disallow: /

User-agent: Download Ninja
Disallow: /

# Misbehaving: requests much too fast:
User-agent: fast
Disallow: /

#
# Sorry, wget in its recursive mode is a frequent problem.
# Please read the man page and use it properly; there is a
# --wait option you can use to set the delay between hits,
# for instance.
#
User-agent: wget
Disallow: /

#
# The 'grub' distributed client has been [I]very[/I] poorly behaved.
#
User-agent: grub-client
Disallow: /

#
# Doesn't follow robots.txt anyway, but...
#
User-agent: k2spider
Disallow: /

#
# Hits many times per second, not acceptable
# http://www.nameprotect.com/botinfo.html
User-agent: NPBot
Disallow: /

# A capture bot, downloads gazillions of pages with no public benefit
# http://www.webreaper.net/
User-agent: WebReaper
Disallow: /
 
Last edited:
Top