Here is a list of USERAGENTS to block with htaccess.
The first criteria is a Linuxuk server that allows htaccess, (some dont)
On a shared server the need is great for this....
The list below i have added to but i got the initial list from the 'net.
RewriteEngine on
get rid of bad bots
RewriteCond %{HTTP_USER_AGENT} ^(.*)NjuiceBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)VideoSurf_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)mxbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Butterfly [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^kame-rt [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)OneRiot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PycURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MLBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PHP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NjuiceBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer_Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^TencentTraveler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GeoHasher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)spbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)discobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Jayde [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Microsoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Twiceler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FreeWebMonitoring [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)webmoney [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)SurveyBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)kmbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ellerdale [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)abby [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Twingly [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Twitterbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)RiverglassScanner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Gigabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Purebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)NSPlayer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)ShopperReports [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Player [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)DigExt [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BlueDragon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)HeartRails [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)MaMa [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)YoudaoBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^mozilla$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)DotBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)AppleCoreMedia [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)MetaURI [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)thunderstone [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Daumoa [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yeti [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)iPhone [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Xenu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)GranParadiso [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)securecomputing [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)FyberSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Exabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)eSobiSubscriber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)RealMedia [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)DAP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)FDM [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)domino [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)huawei [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)AdCentriaIM [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)bitmagicbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)GT::WWW [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)hoge [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)WinHttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Zend [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)CFNetwork [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BigSlickGoFish [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)SiteBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)youdao [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)WinHttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Zend [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)CFNetwork [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BigSlickGoFish [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)sindice [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)facebook [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)LMQueueBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)ContactBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)flobalobsicle [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)netcraft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)lwp-trivial [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)ZangoToolbar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Twisted [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)iCafeMedia [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)huffingtonpost [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Twitturls [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Website [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)webcollage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)picsearch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)E71 [OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)tagoo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BiggerBetter [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Exabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)PostRank [NC]
RewriteRule ^(.*)$ http://www.yourwebsite.net [L,R=301]
see the web address above? This is where you would like to redirect these assholes to, i use my IP_Trap, so it looks like http://www.mysite.com/personal/ [no, that is not a real URL]
it prevents the scum, and they do actually give up after a time...
See also my tut on blocking useragents with my IP trap. |