Jump to: navigation, search

SMILA/UserAgent

If you're reading this, chances are you've seen our web crawler visiting your site while looking through your server logs. If you see the agent "SMILA", that's probably a developer testing a new version of our web crawler, or someone running their own instance.

We are open-source developers, trying to build something useful for the world to use. It comes naturally to us to want to be good netizens. If you notice our web crawler misbehaving, please drop us a line at smila-dev@eclipse.org and we will investigate the problem. For more information about the SMILA project see http://www.eclipse.org/smila.

Our web crawler does retrieve and parse robots.txt files, but it does not yet look for robots META tags in HTML. These are the standard mechanisms for webmasters to tell web robots which portions of a site a robot is welcome to access.

Sysadmins/robots.txt

We're an open source project, so please understand that a misbehaving crawler appearing with our Agent string may not have been run by us. Our code is out there for anyone to tinker with. However, whether or not we ran the crawler, we'd appreciate hearing about any bad behavior- please let us know about it! If possible, please include the name of the domain and some representative log entries. We can be reached at smila-dev@eclipse.org

Our crawler follows the robots.txt exclusion standard, which is described at [1]. Depending on the configuration, our crawler may obey different rules. To make it simple to send our crawler away, we'll always obey rules for "SMILA" or "*".

To ban all bots from your site, place the following in your robots.txt file:

User-agent: *
Disallow: /

To ban all SMILA crawlers from your site, place the following in your robots.txt file:

User-agent: SMILA
Disallow: /