What's the Spinn3r Robot?

Spinn3r is a web service that crawls on behalf of dozens of companies, researchers, and web startups.

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data and you can focus on building your application / mashup.

Spinn3r handles all the difficult tasks of running a spider/crawler including spam prevention, language categorization, ping indexing, and trust ranking.

Why are you reading my site?

Spinn3r is indexing your site on behalf of our user base to provide your content so that it can influence their applications. We're used by search engines, analytic services, competitive intelligence services, etc.

Are you using wasting my bandwidth?

Spinn3r uses very little bandwidth to monitor your site. We only request pages once and cache them once we've fetched them.

Can I tell Spinn3r to stop reading my site?

We currently monitor the XML feeds syndicated from your weblog. If you want us to index your feeds let us know (as well as the HTML). Most people want us to index their site so this rarely happens.

Spinn3r also supports robots.txt so you can block us this way as well.

Why is Spinn3r requesting XML files that don't exist on my server?

We attempt to use web standards as much as possible to find the feeds which exist on your site. Unfortunately, there are many websites that break web standards in ways which can confuse robots. We attempt to assert that your weblog software is configured correctly by requesting additional files. We try to avoid downloading the entire file and only use conditional gets to avoid wasting bandwidth. The biggest problem with this approach though is that it generates 404 error messages but we only do this once per week.

Does Spinn3r index my feed?

If your site offers an RSS feed we try to find it and index it by our service. If not we also try to analyze your HTML as well. If you want to influence Spinn3r the best way possible is to use an RSS feed with a full content feed (including all HTML from your post).

How does Spinn3r attempt to minimize my bandwidth usage?

  1. Compression We use gzip compression to reduce the number of bytes between our servers and your servers. This can usually result in a significant savings in bandwidth.
  2. Only fetch when your weblog has changed. We use the If-Modified-Since and ETag HTTP headers to prevent duplicate downloads. Not every weblog system supports these standards in all scenarios.