Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Talk:SMILA/Documentation/Solr"

(preliminary MLT section)
 
(moved MLT stfuff over to main page)
 
Line 1: Line 1:
==== More Like This / What's related ====
 
  
Solr offers a feature to return ''related''  documents which is called in Solr ''More Like This'' (MLT). There are 2 modes supported:
 
 
# return for all items in the SRL the top N related documents, see [http://wiki.apache.org/Solr/MoreLikeThis]
 
# the other does this ad-hoc for just one document for which it uses an own request handler, see [http://wiki.apache.org/Solr/MoreLikeThisHandler]
 
It is obvious that the first variant requires much more performance than the 2nd.
 
 
Both modes are supported thru SMILA and configured very similar. SMILA doesn't do anything special to the arguments you pass in with the record and hands them on to Solr as-is, except it does any needed URL encoding for you. While you may assign specific data types to the parameters, this is not necessary and all values may be given as strings as this is what is being passed on to Solr anyhow.
 
 
Which mode is active ultimatly depends on your handler configuration in solrconfig.xml. However, we will assume here SMILA's default setup which binds the MLT handler to <tt>/mlt</tt> and a normal query to <tt>/select</tt>. 
 
 
Both modes share most of the MLT parameters but also need/support specific ones.
 
 
 
<source lang="xml">
 
<record>
 
 
 
  <!-- this is the lucene query expression that is executed in both cases. --> 
 
  <Val key="query">euklid</Val>
 
  ...
 
  <Map key="_solr.query">
 
    <!-- this select the solr request handler. set it to /mlt when u want to use the MLT handler  --> 
 
    <Val key="qt">/mlt</Val>
 
    <!-- determines the list of fields returned for both the normal results as well as the MLT results  --> 
 
    <Val key="fl" >Id,score,Size</Val>
 
    ...
 
    <Map key="mlt">
 
      <Val key="mlt" >true</Val>
 
      <Val key="mlt.fl" >Content</Val>
 
      <Val key="mlt.mindf">1</Val>
 
      <Val key="mlt.mintf">1</Val>
 
      ...
 
    </Map>
 
  </Map>
 
<record>
 
</source>
 
 
 
===== MLT Results w/o Handler =====
 
 
In this case solr will add the <tt>moreLikeThis</tt> section on the same level as the normal <tt>response</tt> section and you need to manually look up the MLT docs for each given result item. SMILA on the other hand transforms the solr result in that it converts the MLT information as a nested part of SMILA's result item, like so:
 
 
<source lang="xml">
 
<Seq key="records">
 
  <Map>
 
    <Val key="_recordid">file:Euklid.html</Val>
 
    <Val key="_weight" type="double">0.7635468</Val>
 
    <Seq key='_mlt'>
 
      <Map>
 
        <Val key="_recordid">file:Archytas_von_Tarent_7185.html</Val>
 
        <Val key="_weight" type="double">0.5511907</Val>
 
        <Val key="Size" type="long">47934</Val>
 
        ...               
 
      </Map>
 
      <Map>
 
        <Val key="_recordid">file:Aristoxenos.html</Val>
 
        <Val key="_weight" type="double">0.44604447</Val>
 
        <Val key="Size" type="long">39332</Val>
 
        ...               
 
      </Map>
 
      ...
 
    </Seq>
 
    ...
 
  </Map>
 
  ...
 
</Seq>
 
</source>
 
 
This sample contains the Solr result item with the id <tt>file:Euklid.html</tt>. With MLT turned on, it now contains a nested  <tt>_mlt</tt> Seq which holds the N related docs for that result item each represented by a Map (MLT-Map) (yes, this prevents you from having a solr doc field of the same name and have it returned in this MLT mode). The Val elements in each MLT-Map are defined by the list of fields in the <tt>fl</tt> parameter. But how do the <tt>_recordid</tt> and <tt>_weight</tt> VALs get in there if the value is actually <tt>Id,score,Size</tt>? Well, SMILA defines the fields <tt>Id</tt> and <tt>score</tt> and automatically maps them to <tt>_recordid</tt> and <tt>_weight</tt>.  Any other field that you include thru <tt>fl</tt> is added as a Val element to the MLT result item having the same key as the field name, as is shown for ''Size'' here.
 
 
===== MLT Results with Handler =====
 
 
The more common use case of MLT is to actually return the related docs for just one document due to performance considerations.  This is done by making a request against the MLT handler itself.
 
 
The document for which you want the related docs is usually known, e.g. from a previous search and your rendered result list contains a link to fetch/show related docs. In this case the query just selects the given document by its Id ( as shown in the example below). But you also may provide any other query here. However, if the query returns >1 docs it will select just one depending on the other MLT parameter and return only the related docs for that document.
 
 
The differences to the query record above are like so:
 
 
<source lang="xml">
 
<record>
 
 
 
  <!-- this is the lucene query to select an document by its Id. Note, the escpaing of the ID string! --> 
 
  <Val key="query">Id:file\:Euklid.html</Val>
 
  ...
 
  <Map key="_solr.query">
 
    <!-- this select the solr MLT request handler. --> 
 
    <Val key="qt">/mlt</Val>
 
    ...
 
  </Map>
 
<record>
 
</source>
 
 
The results for such an MLT request are contained in the standard <tt>records</tt> Seq the same way that normal search results are returned, except that they signify MLT docs.
 
 
<source lang="xml">
 
<Seq key="records">
 
  <Map>
 
    <Val key="_recordid">file:Archytas_von_Tarent_7185.html</Val>
 
    <Val key="_weight" type="double">0.5511907</Val>
 
    <Val key="Size" type="long">47934</Val>
 
  </Map>
 
  <Map>
 
    <Val key="_recordid">file:Aristoxenos.html</Val>
 
    <Val key="_weight" type="double">0.44604447</Val>
 
    <Val key="Size" type="long">39332</Val>
 
  </Map>
 
  ...
 
</Seq>
 
 
</source>
 
 
In case of <tt>mlt.interestingTerms=details</tt>  the result record will contain the following additional information:
 
 
<source lang="xml">
 
<Map key="_solr.result">
 
    ...
 
    <Map key="interestingTerms">
 
      <Val key="Content:euklid" type="double">1.0</Val>
 
      <Val key="Content:geometrie" type="double">1.0</Val>
 
      ...
 
    </Map>
 
    ...
 
  </Seq>
 
</Map>
 
</source>
 
 
or in case of <tt>mlt.interestingTerms=list</tt> just:
 
 
<source lang="xml">
 
<Map key="_solr.result">
 
    ...
 
    <Seq key="interestingTerms">
 
     
 
      <Val>euklid</Val>
 
      <Val>geometrie</Val>
 
      ...
 
    </Map>
 
    ...
 
  </Seq>
 
</Map>
 
</source>
 

Latest revision as of 06:35, 17 November 2011

Back to the top