Skip to main content
Jump to: navigation, search

HBX Screen Scrape

{{#eclipseproject:technology.higgins|eclipse_custom_style.css}}

Higgins logo 76Wx100H.jpg

Overview

HBX has a very basic kind of screen scraping capability. If a Relying Party Agent (site) page follows certain HTML conventions, and if the Higgins server supporting HBX happens to have a "form map" for a dummy <form> element that is used to identify the page, and if the schema of the Higgins Context associated with the RPA site happens to contain the properties for the fields in the form, then HBX can "capture" or "scrape" data from the page and store them (overwriting current values) as the values of appropriate properties of the Context.

When HBX requests a form map from the Higgins server it identifies the form map by concatenating:

host+name+id

where:

  • host is the host site
  • name is the name attribute of the form
  • id = "rpformcapture"


HTML coding

The page content to be scraped must be contained within a dummy <form>...</form> structure. The form MUST have an id attribute whose value MUST be "rpformcapture". The form MUST have a name attribute, its value is used to identify the block of content that is to be scraped (captured on the broker server).

For example:

<form name="idmashup_profile" id="rpformcapture">

Every individual element within the form MUST have an id attribute. For example:

<a id="existKeywords"
xhref="http://beta.idmashup.net/taggregator?tag=usercentric">user-centric</a>

Another example:

Incorrect: <label>City: </label>Washington
Correct: <label>City: </label> Washington

The tags used in the above examples are not important, all that matters is that the tag have an id attribute.

See also HBX Form Fill

Back to the top