PHP 判断蜘蛛访问的函数，通过 HTTP_USER_AGENT 判断

在网站开发过程中，有时候我们需要判断网站的访客是有血有肉的人类，还是蜘蛛。
– 比如需要统计网站里面文章的访问数量的时候，把蜘蛛访问过滤掉是很有必要的。
– 有些蜘蛛不是很流氓，不会遵守 robots.txt 里面的规则，这时候就需要通过程序直接屏蔽掉。
判断蜘蛛访问的代码
很简单，通过判断 HTTP_USER_AGENT 中是否包含蜘蛛的特征码，如果有这些特征码，就说明这些访问者是蜘蛛，对其进行相应的操作即可。

//蜘蛛ua特征码
$bots = array
(
	  'Google Bot' => 'googlebot'
	, 'Google Bot' => 'google'
	, 'MSN' => 'msnbot'
	, 'Alex' => 'ia_archiver'
	, 'Lycos' => 'lycos'
	, 'Ask Jeeves' => 'jeeves'
	, 'Altavista' => 'scooter'
	, 'AllTheWeb' => 'fast-webcrawler'
	, 'Inktomi' => 'slurp@inktomi'
	, 'Turnitin.com' => 'turnitinbot'
	, 'Technorati' => 'technorati'
	, 'Yahoo' => 'yahoo'
	, 'Findexa' => 'findexa'
	, 'NextLinks' => 'findlinks'
	, 'Gais' => 'gaisbo'
	, 'WiseNut' => 'zyborg'
	, 'WhoisSource' => 'surveybot'
	, 'Bloglines' => 'bloglines'
	, 'BlogSearch' => 'blogsearch'
	, 'PubSub' => 'pubsub'
	, 'Syndic8' => 'syndic8'
	, 'RadioUserland' => 'userland'
	, 'Gigabot' => 'gigabot'
	, 'Become.com' => 'become.com'
	, 'Baidu' => 'baiduspider'
	, 'so.com' => '360spider'
	, 'Sogou' => 'spider'
	, 'soso.com' => 'sosospider'
	, 'Yandex' => 'yandex'
);
$useragent = $_SERVER['HTTP_USER_AGENT'];
foreach ( $bots as $name => $lookfor ) {
	if ( stristr( $useragent, $lookfor ) !== false ) {
		$is_robot = true;
		break;
	}
}

很显然，代码中包含的蜘蛛 UA 特征码显然是不全的，但是足够判断大多数蜘蛛了，如果你需要更精确的判断，请参考以下两个链接。
– [蜘蛛UA列表](http://user-agent-string.info/list-of-ua/bots)
– [浏览器UA列表](http://user-agent-string.info/list-of-ua)