Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "HBX Screen Scrape"

(HTML coding)
m (logo, category)
Line 1: Line 1:
 +
{{#eclipseproject:technology.higgins}}
 +
[[Image:Higgins_logo_76Wx100H.jpg|right]]
 
==Overview==
 
==Overview==
  
Line 34: Line 36:
  
 
See also [[HBX Form Fill]]
 
See also [[HBX Form Fill]]
 +
 +
[[Category:Higgins Obsolete Pages]]

Revision as of 14:08, 18 March 2008

{{#eclipseproject:technology.higgins}}

Higgins logo 76Wx100H.jpg

Overview

HBX has a very basic kind of screen scraping capability. If a Relying Party Agent (site) page follows certain HTML conventions, and if the Higgins server supporting HBX happens to have a "form map" for a dummy <form> element that is used to identify the page, and if the schema of the Higgins Context associated with the RPA site happens to contain the properties for the fields in the form, then HBX can "capture" or "scrape" data from the page and store them (overwriting current values) as the values of appropriate properties of the Context.

When HBX requests a form map from the Higgins server it identifies the form map by concatenating:

host+name+id

where:

  • host is the host site
  • name is the name attribute of the form
  • id = "rpformcapture"


HTML coding

The page content to be scraped must be contained within a dummy <form>...</form> structure. The form MUST have an id attribute whose value MUST be "rpformcapture". The form MUST have a name attribute, its value is used to identify the block of content that is to be scraped (captured on the broker server).

For example:

<form name="idmashup_profile" id="rpformcapture">

Every individual element within the form MUST have an id attribute. For example:

<a id="existKeywords"
xhref="http://beta.idmashup.net/taggregator?tag=usercentric">user-centric</a>

Another example:

Incorrect: <label>City: </label>Washington
Correct: <label>City: </label> Washington

The tags used in the above examples are not important, all that matters is that the tag have an id attribute.

See also HBX Form Fill

Back to the top