Spinn3r
Need to index the blogosphere?
Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data, and you can focus on building your application, mashup, or search engine. We find the weblogs and RSS, index their content, fetch the links, index their comments, etc.
How does it work?
Developers within your company call our API every few seconds for the freshest news content - syndicated to you in real time. With our Open Source reference client, you can be up and running in less than an hour. All with massive costs savings. Spinn3r can save you up to $45k per month compared to running your own crawler!
Maximum Throughput
Twenty million blogs and counting. Expect a torrent of content from Spinn3r - more than 100k posts per hour. Not only do we index every A-list blog out there, we also watch every mainstream news source.
Our API implementation is fast - very fast! We can give you 24 hours of archived content in under an hour. Your bottleneck is going to be your bandwidth.
High Availability Architecture
Spinn3r is built on a fault tolerant infrastructure and is monitored 24/7 to ensure 99.9% availability. Every component in our system has three redundant copies with standby hardware already online in case of failure.
We're also hosted in a state of the art data center with redundant power and three standby generators.
Advanced Metadata
We can provide you with the top 10k or 10M weblogs, and can filter by language and by posts for a specific site - from a specific author. Tags, language, spam probability, rank, raw inbound link count, etag, HTTP status - all are included within our API.
We go above and beyond user specified metadata. Often some of the metadata can be incorrect. Language is a good example. We use a mathematical language modeling technique which has been in production for more than two years now. With only 200 bytes of text, we can analyze the language of a post with 98% accuracy.
Indexed in Real Time
Spinn3r is updated in real time. When we receive a ping from any of the major ping providers we go off and collect their content, index it, and make it available via our API.
We don't stop there. We have a hybrid indexing technology that enables us to launch our crawlers every thirty minutes against sites that don't provide pings.
When we see the content, so do you. We push the content to your application as soon as we see it so you are always current.
Full Crawler
Spinn3r goes above and beyond indexing just raw RSS - we index the full HTML of a post. After that, we extract the body of the post, excluding sidebar and chrome, and provide this content under a content extract API.
Open Source and Standards Based
The Spinn3r reference client is fully Open Source and standards based. Our API uses a version of RSS and Atom over HTTP.
Our Java API is fully documented and can be run natively from within your application or as a standalone daemon indexing new content and saving it to disk. This allows for easy integration with Perl, Python, Javascript, or any language that has an XML parser.
Archives
We currently have 8 months of content online and ready for use with your application. This is more than 21TB of content indexed and at your fingertips ready to be used.
Evaluation
What are you waiting for? You can be using Spinn3r in a few minutes. Request an evaluation and we'll give you a full one week trial period to play with Spinn3r. We also have a thirty day money back refund program. If, for whatever reason, Spinn3r doesn't work out for you, we'll give you a full refund on your first month of service.
So request an evaluation today and we'll have you up and running in no time!




