Talk:SMILA/Documentation/Solr

More Like This / What's related

Solr offers a feature to return related documents which is called in Solr More Like This (MLT). There are 2 modes supported:

return for all items in the SRL the top N related documents, see [1]
the other does this ad-hoc for just one document for which it uses an own request handler, see [2]

It is obvious that the first variant requires much more performance than the 2nd.

Both modes are supported thru SMILA and configured very similar. SMILA doesn't do anything special to the arguments you pass in with the record and hands them on to Solr as-is, except it does any needed URL encoding for you. While you may assign specific data types to the parameters, this is not necessary and all values may be given as strings as this is what is being passed on to Solr anyhow.

Which mode is active ultimatly depends on your handler configuration in solrconfig.xml. However, we will assume here SMILA's default setup which binds the MLT handler to /mlt and a normal query to /select.

Both modes share most of the MLT parameters but also need/support specific ones.

<record>
 
  <!-- this is the lucene query expression that is executed in both cases. -->  
  <Val key="query">euklid</Val>
  ...
  <Map key="_solr.query">
    <!-- this select the solr request handler. set it to /mlt when u want to use the MLT handler  -->  
    <Val key="qt">/mlt</Val>
    <!-- determines the list of fields returned for both the normal results as well as the MLT results  -->  
    <Val key="fl" >Id,score,Size</Val>
    ...
    <Map key="mlt">
      <Val key="mlt" >true</Val>
      <Val key="mlt.fl" >Content</Val>
      <Val key="mlt.mindf">1</Val>
      <Val key="mlt.mintf">1</Val>
      ...
    </Map>
  </Map>
<record>

MLT Results w/o Handler

In this case solr will add the moreLikeThis section on the same level as the normal response section and you need to manually look up the MLT docs for each given result item. SMILA on the other hand transforms the solr result in that it converts the MLT information as a nested part of SMILA's result item, like so:

<Seq key="records">
  <Map>
    <Val key="_recordid">file:Euklid.html</Val>
    <Val key="_weight" type="double">0.7635468</Val>
    <Seq key='_mlt'>
      <Map>
        <Val key="_recordid">file:Archytas_von_Tarent_7185.html</Val>
        <Val key="_weight" type="double">0.5511907</Val>
        <Val key="Size" type="long">47934</Val>
        ...                
      </Map>
      <Map>
        <Val key="_recordid">file:Aristoxenos.html</Val>
        <Val key="_weight" type="double">0.44604447</Val>
        <Val key="Size" type="long">39332</Val>
        ...                
      </Map>
      ... 
    </Seq>
    ...
  </Map>
   ... 
</Seq>

This sample contains the Solr result item with the id file:Euklid.html. With MLT turned on, it now contains a nested _mlt Seq which holds the N related docs for that result item each represented by a Map (MLT-Map) (yes, this prevents you from having a solr doc field of the same name and have it returned in this MLT mode). The Val elements in each MLT-Map are defined by the list of fields in the fl parameter. But how do the _recordid and _weight VALs get in there if the value is actually Id,score,Size? Well, SMILA defines the fields Id and score and automatically maps them to _recordid and _weight. Any other field that you include thru fl is added as a Val element to the MLT result item having the same key as the field name, as is shown for Size here.

MLT Results with Handler

The more common use case of MLT is to actually return the related docs for just one document due to performance considerations. This is done by making a request against the MLT handler itself.

The document for which you want the related docs is usually known, e.g. from a previous search and your rendered result list contains a link to fetch/show related docs. In this case the query just selects the given document by its Id ( as shown in the example below). But you also may provide any other query here. However, if the query returns >1 docs it will select just one depending on the other MLT parameter and return only the related docs for that document.

The differences to the query record above are like so:

<record>
 
  <!-- this is the lucene query to select an document by its Id. Note, the escpaing of the ID string! -->  
  <Val key="query">Id:file\:Euklid.html</Val>
  ...
  <Map key="_solr.query">
    <!-- this select the solr MLT request handler. -->  
    <Val key="qt">/mlt</Val>
    ...
  </Map>
<record>

The results for such an MLT request are contained in the standard records Seq the same way that normal search results are returned, except that they signify MLT docs.

<Seq key="records">
  <Map>
    <Val key="_recordid">file:Archytas_von_Tarent_7185.html</Val>
    <Val key="_weight" type="double">0.5511907</Val>
    <Val key="Size" type="long">47934</Val>
  </Map>
  <Map>
    <Val key="_recordid">file:Aristoxenos.html</Val>
    <Val key="_weight" type="double">0.44604447</Val>
    <Val key="Size" type="long">39332</Val>
  </Map>
  ...
</Seq>

In case of mlt.interestingTerms=details the result record will contain the following additional information:

<Map key="_solr.result">
    ...
    <Map key="interestingTerms">
      <Val key="Content:euklid" type="double">1.0</Val>
      <Val key="Content:geometrie" type="double">1.0</Val>
      ...
    </Map>
    ...
  </Seq>
</Map>

or in case of mlt.interestingTerms=list just:

<Map key="_solr.result">
    ...
    <Seq key="interestingTerms">
 
      <Val>euklid</Val>
      <Val>geometrie</Val>
      ...
    </Map>
    ...
  </Seq>
</Map>

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Talk:SMILA/Documentation/Solr

More Like This / What's related

MLT Results w/o Handler

MLT Results with Handler