百度蛛蛛有很多假冒user-agent的,可靠的蜘蛛ip有反向解析地址,例如
40.221.206.111.in-addr.arpa name = baiduspider-111-206-221-40.crawl.baidu.com.
分析某站点的日志,得到以下几个可靠的地址段,及抓取次数、地址数、示例ip等
subnet | cral_times | ip_count | example |
123.125.71.0 | 1106531 | 76 | 123.125.71.36 |
220.181.108.0 | 1101992 | 75 | 220.181.108.121 |
180.76.15.0 | 412563 | 60 | 180.76.15.145 |
111.206.221.0 | 15614 | 80 | 111.206.221.40 |
假冒蜘蛛(至少严重嫌疑)示例:113.97.50.220, 123.125.67.152, 113.99.114.197, 58.218.213.134, 114.95.226.126, 123.125.143.11, 117.177.97.16, 39.108.233.0, 27.148.157.107, 122.10.64.19, 103.29.23.132