Spinn3r

Permalink API

Returns full HTML content for all sources in our index. This also includes our content extract support which allows our users to skip indexing chrome within HTML posts.

This is basically the full firehose of Spinn3r content meant to replace home grown and proprietary crawlers.

Each item is the full HTML of a permalink published by a site in our index.

Each time we visit a site, we fingerprint URLs on the front page, and only crawl new URLs. We also crawl every URL found via RSS and Atom feeds.

Feed API

Our feed API includes raw content from RSS feeds indexed by Spinn3r. This includes all metadata, tags, author, language, etc. The feed API and RSS backend drives the permalink API. There is additional crawler support in the permalink API for sites that don't have RSS.

Comment API

Right now there are no standards with (significant market penetration) for indexing comments made within the blogosphere.

We've written hand tuned parsers for fetching the remaining comments and we support the majority of content management systems in production (TypePad, WordPress, Movable Type, etc).

The comment API allows our developers to index comments without having to worry about the details of indexing comments across the blogosphere.

Crawler API

This API is designed for customers with existing crawlers who want to use our infrastructure to prioritize their feed update frequency. This allows them to tie into our existing ping and spam infrastructure without a massive investment of time, money, and engineering resources.

Source API

Spinn3r provides the ability to check on the status of a weblog in our index, as well as register it for indexing and enumerating existing sources.

As Spinn3r expands, we're no longer limited to weblogs, and are now indexing forums, memetrackers, classified advertising sites, and other types of dynamic and social metadata.