<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Crawler stats</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<style type="text/css">
h1 { text-align: center }
</style>
</head>
<body>
<h1>Web crawlers</h1>

<p>This is a list of web crawlers which have visited my main page between
September 10, 2001 and June 9, 2002.  A * indicates search output is 
not freely available.</p>

<table>
<tr><td>Hits</td><td>Agent</td><td>Output</td></tr>
<tr>
<td>33</td>
<td>Slurp/cat</td>
<td><a href="http://www.inktomi.com/">Inktomi *</a></td>
</tr>
<tr>
<td>11</td>
<td>Googlebot</td>
<td><a href="http://www.google.com/">Google</a></td>
</tr>
<tr>
<td>9</td>
<td>ArchitextSpider</td>
<td><a href="http://www.excite.com/">Excite</a></td>
</tr>
<tr>
<td>9</td>
<td>Pita</td>
<td><a href="http://pita.stanford.edu:8888/">Pita *</a></td>
</tr>
<tr>
<td>8</td>
<td>FAST-WebCrawler</td>
<td><a href="http://www.fast.no/">Fast</a></td>
</tr>
<tr>
<td>8</td>
<td>Scooter</td>
<td><a href="http://www.altavista.com/">Altavista</a></td>
</tr>
<tr>
<td>7</td>
<td>ia_archiver</td>
<td><a href="http://www.archive.org/">Wayback</a></td>
</tr>
<tr>
<td>5</td>
<td>crawler</td>
<td><a href="http://www.almaden.ibm.com/cs/">IBM Research *</a></td>
</tr>
<tr>
<td>3</td>
<td>psbot</td>
<td><a href="http://www.picsearch.com/">Picsearch</a></td>
</tr>
<tr>
<td>3</td>
<td>Whizbang</td>
<td><a href="http://www.whizbang.com/">WhizBang</a></td>
</tr>
<tr>
<td>2</td>
<td>Ask Jeeves</td>
<td><a href="http://www.ask.com/">Ask Jeeves</a></td>
</tr>
<tr>
<td>2</td>
<td>SlySearch</td>
<td><a href="http://www.plagiarism.org/">Plagiarism</a></td>
</tr>
<tr>
<td>2</td>
<td>ZyBorg</td>
<td><a href="http://www.wisenut.com/">Wisenut</a></td>
</tr>
<tr>
<td>1</td>
<td>ASSORT</td>
<td><a href="http://pcmath126.unice.fr/">Gang Xiao *</a></td>
</tr>
<tr>
<td>1</td>
<td>RRC</td>
<td><a href="http://www.bigfoot.com/">Bigfoot *</a></td>
</tr>
<tr>
<td>1</td>
<td>moget</td>
<td><a href="http://goo.ne.jp/">Goo</a></td>
</tr>
<tr>
<td>1</td>
<td>tivraSpider</td>
<td><a href="http://www.ionaut.com/">Ionaut</a></td>
</tr>
<tr>
<td>1</td>
<td>HenrytheMiragoRobot</td>
<td><a href="http://www.mirago.co.uk/">Mirago</a></td>
</tr>
<tr>
<td>1</td>
<td>Openbot</td>
<td><a href="http://www.openfind.com.tw/">Openfind</a></td>
</tr>
<tr>
<td>1</td>
<td>Fake IE</td>
<td><a href="http://www.sureseeker.com/">Sureseeker</a></td>
</tr>
<tr>
<td>1</td>
<td>W3CRobot</td>
<td><a href="http://www.w3.org/">W3C *</a></td>
</tr>
<tr>
<td>1</td>
<td>cosmos</td>
<td><a href="http://www.xyleme.com/">Xyleme</a></td>
</tr>
</table>

<p>Also, I have received six hits from machines using the
<a href="http://pauillac.inria.fr/~ailleret/prog/larbin/index-eng.html">larbin</a>
crawler, and one from the autoemailspider.</p>

<p>It is worth mentioning that among the crawlers feeding into public 
search engines, the frequency of hits does not seem to correlate very well 
with the quality of the search engine (although most of the 1-hit engines 
really suck), and hits don't seem to correspond to live database 
updates.</p>

</body>
</html>
