Skip to main content
Jump to: navigation, search

SMILA/Documentation/Importing/VisitedLinks

< SMILA‎ | Documentation
Revision as of 07:33, 19 January 2012 by Juergen.schumacher.attensity.com (Talk | contribs) (New page: == VisitedLinks: An auxiliary Service for crawler workers == === ObjectStoreVisitedLinks service implementation === The bundle <tt>org.eclipse.smila.importing.state.objectstore</tt> pro...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

VisitedLinks: An auxiliary Service for crawler workers

ObjectStoreVisitedLinks service implementation

The bundle org.eclipse.smila.importing.state.objectstore provides an implementation of the VisitedLinks service using the ObjectStore service in a similar way as the ObjectStoreDeltaService to keep track of the visited state of links.

The service uses store visitedlinks.

Configuration

As the ObjectStoreVisitedLinks service shares most of its code with the ObjectStoreDeltaService it also has the same configuration properties as the delta service. The only difference is that they are read from org.eclipse.smila.importing.state.objectstore/visitedlinksstore.properties.

VisitedLinks ReST API

Currently there is only a simple REST API for VisitedLinks that allows to see for which data source how many entries have been stored and to delete all entries of a single source or all entries or all sources.

Show active sources

  • URL: /smila/importing/visitedlinks
  • Method: GET
  • Response Code: 200 OK, if successful,
  • Response JSON:
{"sources": [
  {
    "id": "web",
    "url": "http://localhost:8080/smila/importing/visitedlinks/web"
  }
]}

Clear all sources

  • URL: /smila/importing/visitedlinks
  • Method: DELETE
  • Response Code: 200 OK, if successful
  • Response JSON: none

Get info about sources

  • URL: /smila/importing/visitedlinks/<sourcename>
  • Method: GET
  • Response Code:
    • 200 OK, if successful,
    • 404 NOT FOUND, if source does not have entries currently.
  • Response JSON:

Contains the ID of the source and the number of entries. If there are more than 10000 entries, the number is only estimated because exact counting could take a long time. To force an exact count, add ?countExact=true to the request URL.

{
  "id": "web",
  "count": "123456"
}

Clear a single source

  • URL: /smila/importing/visitedlinks/<sourcename>
  • Method: DELETE
  • Response Code: 200 OK, if successful
  • Response JSON: none

Back to the top