Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
SMILA/Documentation/Importing/VisitedLinks
Contents
VisitedLinks: An auxiliary Service for crawler workers
ObjectStoreVisitedLinks service implementation
The bundle org.eclipse.smila.importing.state.objectstore provides an implementation of the VisitedLinks service using the ObjectStore service in a similar way as the ObjectStoreDeltaService to keep track of the visited state of links.
The service uses store visitedlinks.
Configuration
As the ObjectStoreVisitedLinks service shares most of its code with the ObjectStoreDeltaService it also has the same configuration properties as the delta service. The only difference is that they are read from org.eclipse.smila.importing.state.objectstore/visitedlinksstore.properties.
VisitedLinks ReST API
Currently there is only a simple REST API for VisitedLinks that allows to see for which data source how many entries have been stored and to delete all entries of a single source or all entries or all sources.
Show active sources
- URL: /smila/importing/visitedlinks
- Method: GET
- Response Code: 200 OK, if successful,
- Response JSON:
{"sources": [ { "id": "web", "url": "http://localhost:8080/smila/importing/visitedlinks/web" } ]}
Clear all sources
- URL: /smila/importing/visitedlinks
- Method: DELETE
- Response Code: 200 OK, if successful
- Response JSON: none
Get info about sources
- URL: /smila/importing/visitedlinks/<sourcename>
- Method: GET
- Response Code:
- 200 OK, if successful,
- 404 NOT FOUND, if source does not have entries currently.
- Response JSON:
Contains the ID of the source and the number of entries. If there are more than 10000 entries, the number is only estimated because exact counting could take a long time. To force an exact count, add ?countExact=true to the request URL.
{ "id": "web", "count": "123456" }
Clear a single source
- URL: /smila/importing/visitedlinks/<sourcename>
- Method: DELETE
- Response Code: 200 OK, if successful
- Response JSON: none