Difference between revisions of "HBX Screen Scrape"
m (logo, category)
|Line 1:||Line 1:|
Latest revision as of 10:47, 16 December 2008
HBX has a very basic kind of screen scraping capability. If a Relying Party Agent (site) page follows certain HTML conventions, and if the Higgins server supporting HBX happens to have a "form map" for a dummy <form> element that is used to identify the page, and if the schema of the Higgins Context associated with the RPA site happens to contain the properties for the fields in the form, then HBX can "capture" or "scrape" data from the page and store them (overwriting current values) as the values of appropriate properties of the Context.
When HBX requests a form map from the Higgins server it identifies the form map by concatenating:
- host is the host site
- name is the name attribute of the form
- id = "rpformcapture"
The page content to be scraped must be contained within a dummy <form>...</form> structure. The form MUST have an id attribute whose value MUST be "rpformcapture". The form MUST have a name attribute, its value is used to identify the block of content that is to be scraped (captured on the broker server).
- <form name="idmashup_profile" id="rpformcapture">
Every individual element within the form MUST have an id attribute. For example:
- <a id="existKeywords"
- Incorrect: <label>City: </label>Washington
- Correct: <label>City: </label> Washington
The tags used in the above examples are not important, all that matters is that the tag have an id attribute.
See also HBX Form Fill