HBX Screen Scrape

Overview

HBX has a very basic kind of screen scraping capability. If a Relying Party Agent (site) page follows certain HTML conventions, and if the Higgins server supporting HBX happens to have a "form map" for a dummy <form> element that is used to identify the page, and if the schema of the Higgins Context associated with the RPA site happens to contain the properties for the fields in the form, then HBX can "capture" or "scrape" data from the page and store them (overwriting current values) as the values of appropriate properties of the Context.

When HBX requests a form map from the Higgins server it identifies the form map by concatenating:

host+name+id

where:

host is the host site
name is the name attribute of the form
id = "rpformcapture"

HTML coding

The page content to be scraped must be contained within a dummy <form>...</form> structure. The form MUST have an id attribute whose value MUST be "rpformcapture". The form MUST have a name attribute, its value is used to identify the block of content that is to be scraped (captured on the broker server).

For example:

Every individual element within the form MUST have an id attribute. For example:

<a id="existKeywords"

xhref="http://beta.idmashup.net/taggregator?tag=usercentric">user-centric</a>

Another example:

Incorrect: <label>City: </label>Washington

Correct: <label>City: </label> Washington

The tags used in the above examples are not important, all that matters is that the tag have an id attribute.

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

HBX Screen Scrape

Overview

HTML coding