Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Specifications/Smila Security Concept"

(New page: =Description= This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on d...)
 
Line 2: Line 2:
 
This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on data (records).  
 
This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on data (records).  
 
Authentication (e.g. login to a SMILA based webapplication) is not in the scope of this document.  
 
Authentication (e.g. login to a SMILA based webapplication) is not in the scope of this document.  
 +
 +
Some thoughts about security information in SMILA:
 +
 +
* for the majority of business cases READ access rights will suffice, but any kind of access rights should be representable
 +
* security information of a record is most likely a list of Users (Principals) and/or Groups that have certain access rights on that record
 +
** maybe Users XOR Groups is easier to handle than allowing to combine both
 +
** access rights that exclude Users/Groups from reading could be supported
 +
** security based on Groups has some benefits over security based on Users
 +
*** less data is stored in the search index (Groups have not to be resolved to their members)
 +
*** membership of a group can be changed without the need to reindex the record
 +
* a data source (e.g. a NTFS filesystem) may be connected to a security provider (e.g. a LDAP server). Let's call these __DSSP__ (data source security provider)
 +
* for data sources that are not connected to a DSSP and do not provide any security information a defined constant <tt>READ_ALL</tt> is used instead of any specific security information. This value is needed for filtering by search engines, as in general the filter expression will contain the user and the groups he is member of, for example <tt> ... and (trustee="stuc07" or trustee="group1" or trustee="group2")</tt>. To include all documents that have no access rights restrictions the statement <tt>or trustee="READ_ALL"</tt> is added. With no value set, filtering is not possible.
 +
* a data source may enforce access rights on it's data but it may be not possible for the Agent/Crawler to access the information who has the rights to access the data (e.g. a webserver)
 +
* security information may be different for various data sources. SMILA will not provide any functionality of harmonizing security information, they are used as provided by the DSSP.
 +
* Names/IDs of users and groups may be not unique. For example two DSSPs may provide a group named "authenticated users" with a totally different set of users. For such cases it will be neccessary to somehow couple the security information with the data source id (e.g. via a simple concat of the data source Id and the User/Group IDs).
 +
* DSSP specific services are needed (e.g used in the Connectivity Framework and in the Search Framework) to
 +
** resolve all subgroups of a group
 +
** resolve all users of a group
 +
** resolve all groups a user is a member of
 +
* in the search process these resolving services are used by a login/single sign on component to get the security information for the current user. This is part of the application logic, but the basic functionality has to be provided
 +
* search results are filtered against the provided security information, only returning the records a user has access to. As there is always a delta in the access rights stored in the index and the access rights on the data source a online check for each search result entry could be executed for high risk data.
  
  
Line 7: Line 28:
  
 
=Technical proposal=
 
=Technical proposal=
 +
 +
 +
 +
==Datamodell==
 +
As we don't know for what use cases SMILA will be used, we should not restrict security information to READ access rights, but provide a generic representation of security information. The default use case will be indexing and search, for which READ access will suffice. Security information should be separated from record metadata, though represented by reusing classes of the datamodel. The record itself is annotatable, so we can store the security information as annotations in the record. Therefore a specific annotation ACCESS_RIGHTS is defined. It contains subannotations for various access right types ( e.g. READ, WRITE, DELETE ) which in turn contain annotations for entities (e.g. PRINCIPALS and GROUPS). It is easily possible to add new access right types or entities, but the Security Converters/Resolvers have to be adopted to support them.
 +
 +
 +
===Indexing===
 +
During Indexing the ACCESS_RIGHTS annotations on records are created by Crawler/Agents. In the IndexOrderConfiguration it should be configurable what annotations are created for each record (what access right types, if to use principals or groups or both). It should also be possible to disable the creation of these annotations if no security information is used in Smila. Crawlers/Agents should pass security information as provided by the data source (e.g. SIDs). Further processing of this data will be done by the Security Converters/Resolvers.
 +
Here is an example for the ACCESS_RIGHTS record annotations:
 +
 +
<source lang="xml">
 +
<An n="ACCESS_RIGHTS">
 +
    <An n="READ">
 +
        <An n="PRINCIPALS">
 +
            <V>SID_0815</V>
 +
        </An>
 +
        <An n="GROUPS">
 +
            <V>SID_4711</V>
 +
            <V>SID_2525</V>
 +
        </An>
 +
    </An>
 +
    <An n="WRITE">
 +
        <An n="PRINCIPALS">
 +
            <V>SID_0815</V>
 +
        </An>
 +
    </An>
 +
    ...
 +
</An>
 +
</source>
 +
 +
 +
Regular Pipelets/Processingservices will not take these annotations into account. Before storing a record in a search index, the security annotations have to be converted to regular attributes that are indexable. Therefore SecurityConverters (general or index specific) will do this transformation of annotations into attributes. Here is an example, of how the READ access rights could be represented as regular attributes:
 +
 +
<source lang="xml">
 +
<A n="ReadGroups">
 +
    <L>
 +
        <V>SID_4711</V>
 +
        <V>SID_2525</V>
 +
    </L>
 +
</A>
 +
<A n="ReadUsers">
 +
    <L>
 +
        <V>SID_0815</V>
 +
    </L>
 +
</A>
 +
</source>
 +
 +
There may be use cases where instead of IDs human readable names should be used for indexing, or Groups should be resolved to their members. In this case a SecurityConverter can make use of a SecurityResolver to handle these tasks. So it could resolve human readable names for the principals and groups
 +
 +
<source lang="xml">
 +
<A n="ReadGroups">
 +
    <L>
 +
        <V>empolis\group1</V>
 +
        <V>empolis\group9</V>
 +
    </L>
 +
</A>
 +
<A n="ReadUsers">
 +
    <L>
 +
        <V>empolis\testuser</V>
 +
    </L>
 +
</A>
 +
</source>
 +
or it could also resolve the members of the groups
 +
<source lang="xml">
 +
<A n="ReadUsers">
 +
    <L>
 +
        <V>empolis\testuser</V>
 +
        <V>empolis\group1member1</V>
 +
        <V>empolis\group1member2</V>
 +
        <V>empolis\group2member1</V>
 +
        <V>empolis\group2member2</V>
 +
    </L>
 +
</A>
 +
</source>
 +
 +
Often data sources with and without security restrictions will be used together in one index (e.g. a filesystem and a public web site). The SecurityConverter should generate a default value for those data sources named ALL_READ and fill all required attributes with it. This is needed during the search process, as possible results without any security information would be filtered from the result list. A SecurityResolver may also use this value to replace generic groups (like authenticated_users or domain_users).
 +
<source lang="xml">
 +
<A n="ReadGroups">
 +
    <L>
 +
        <V>ALL_READ</V>
 +
    </L>
 +
</A>
 +
<A n="ReadUsers">
 +
    <L>
 +
        <V>ALL_READ</V>
 +
    </L>
 +
</A>
 +
</source>
 +
 +
 +
===Search===
 +
 +
A search client represents security information in exact the same way as a Crawler does it during indexing. Most likely this will only be the ID of a user executing the search.
 +
<source lang="xml">
 +
<An n="ACCESS_RIGHTS">
 +
    <An n="READ">
 +
        <An n="PRINCIPALS">
 +
            <V>SID_0815</V>
 +
        </An>
 +
    </An>
 +
</An>
 +
</source>
 +
The security annotations are then again processed by a SecurityConverter (now in search mode) that transforms the security annotations into a filter annotation for the security attributes in the index. Note that ALL_READ is always included in the filter! (see the Search concept for details on Filters)
 +
<source lang="xml">
 +
<A n="ReadUsers">
 +
    <An n="filter">
 +
        <V n="type">enumeration</V>
 +
        <V n="mode">include</V>
 +
        <V>SID_0815</V>
 +
        <V>ALL_READ</V>
 +
    </An>
 +
</A>
 +
</source>
 +
Again, SecurityResolvers may be used by the SecurityConverter for various tasks. For example if only groups are used for security checking, then all groups the provided user is a membor of have to be determined and this information is then used to create the filter. For example
 +
 +
<source lang="xml">
 +
<An n="ACCESS_RIGHTS">
 +
    <An n="READ">
 +
        <An n="PRINCIPALS">
 +
            <V>SID_0815</V>
 +
        </An>
 +
    </An>
 +
</An>
 +
</source>
 +
is resolved to
 +
 +
<source lang="xml">
 +
<An n="ACCESS_RIGHTS">
 +
    <An n="READ">
 +
        <An n="PRINCIPALS">
 +
            <V>SID_0815</V>
 +
        </An>
 +
        <An n="GROUPS">
 +
            <V>SID_4711</V>
 +
            <V>SID_0190</V>
 +
        </An>
 +
    </An>
 +
</An>
 +
</source>
 +
and then converted to the filter
 +
<source lang="xml">
 +
<A n="ReadGroupss">
 +
    <An n="filter">
 +
        <V n="type">enumeration</V>
 +
        <V n="mode">include</V>
 +
        <V>SID_4711</V>
 +
        <V>SID_0190</V>
 +
        <V>ALL_READ</V>
 +
    </An>
 +
</A>
 +
</source>
 +
If security is NOT relevant for search process, then simply don't use any SecurityConverters in your index and search pipelines (and configure your agents/crawlers appropriately to reduce data load)!
 +
 +
 +
==Enhancement for DeltaIndexing==
 +
A change of the security information of a document leads to an update of the search index. It may be desirable to distinguish between changes of the security information and changes of a document itself. Therefore one could introduce a second hash token that is created from the security information and stored in DeltaIndexingManager. If during a crawl only the hash for the security information has changed the whole processing for the document needs not to be be executed but just the update of the security information (thus saving overhead processing). Therefore the CrawlerController needs to add some kind of flag to the Record (e.g. a special Attribute) that shows if the regular hash or the security hash changed. In the Router this Attribute could be used in rules to trigger different Pipelines: the "complete processing pipeline" if the regular hash (or both hashes) changed, or the "security update pipeline" if only the security hash changed. As not all indexes will support update of selected attributes but will most likely support only a delete/add logic based on whole documents, the already processed data of this record must be loaded (either from the index or from the XML/Binary-Storage) and merged with the current Record (e.g. by a special Pipelet). This is an optional enhancement that is totally independent of the security concept. However, it should be implemented after the security concept was implemented and tested.

Revision as of 04:19, 13 January 2009

Description

This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on data (records). Authentication (e.g. login to a SMILA based webapplication) is not in the scope of this document.

Some thoughts about security information in SMILA:

  • for the majority of business cases READ access rights will suffice, but any kind of access rights should be representable
  • security information of a record is most likely a list of Users (Principals) and/or Groups that have certain access rights on that record
    • maybe Users XOR Groups is easier to handle than allowing to combine both
    • access rights that exclude Users/Groups from reading could be supported
    • security based on Groups has some benefits over security based on Users
      • less data is stored in the search index (Groups have not to be resolved to their members)
      • membership of a group can be changed without the need to reindex the record
  • a data source (e.g. a NTFS filesystem) may be connected to a security provider (e.g. a LDAP server). Let's call these __DSSP__ (data source security provider)
  • for data sources that are not connected to a DSSP and do not provide any security information a defined constant READ_ALL is used instead of any specific security information. This value is needed for filtering by search engines, as in general the filter expression will contain the user and the groups he is member of, for example ... and (trustee="stuc07" or trustee="group1" or trustee="group2"). To include all documents that have no access rights restrictions the statement or trustee="READ_ALL" is added. With no value set, filtering is not possible.
  • a data source may enforce access rights on it's data but it may be not possible for the Agent/Crawler to access the information who has the rights to access the data (e.g. a webserver)
  • security information may be different for various data sources. SMILA will not provide any functionality of harmonizing security information, they are used as provided by the DSSP.
  • Names/IDs of users and groups may be not unique. For example two DSSPs may provide a group named "authenticated users" with a totally different set of users. For such cases it will be neccessary to somehow couple the security information with the data source id (e.g. via a simple concat of the data source Id and the User/Group IDs).
  • DSSP specific services are needed (e.g used in the Connectivity Framework and in the Search Framework) to
    • resolve all subgroups of a group
    • resolve all users of a group
    • resolve all groups a user is a member of
  • in the search process these resolving services are used by a login/single sign on component to get the security information for the current user. This is part of the application logic, but the basic functionality has to be provided
  • search results are filtered against the provided security information, only returning the records a user has access to. As there is always a delta in the access rights stored in the index and the access rights on the data source a online check for each search result entry could be executed for high risk data.


Discussion

Technical proposal

Datamodell

As we don't know for what use cases SMILA will be used, we should not restrict security information to READ access rights, but provide a generic representation of security information. The default use case will be indexing and search, for which READ access will suffice. Security information should be separated from record metadata, though represented by reusing classes of the datamodel. The record itself is annotatable, so we can store the security information as annotations in the record. Therefore a specific annotation ACCESS_RIGHTS is defined. It contains subannotations for various access right types ( e.g. READ, WRITE, DELETE ) which in turn contain annotations for entities (e.g. PRINCIPALS and GROUPS). It is easily possible to add new access right types or entities, but the Security Converters/Resolvers have to be adopted to support them.


Indexing

During Indexing the ACCESS_RIGHTS annotations on records are created by Crawler/Agents. In the IndexOrderConfiguration it should be configurable what annotations are created for each record (what access right types, if to use principals or groups or both). It should also be possible to disable the creation of these annotations if no security information is used in Smila. Crawlers/Agents should pass security information as provided by the data source (e.g. SIDs). Further processing of this data will be done by the Security Converters/Resolvers. Here is an example for the ACCESS_RIGHTS record annotations:

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>SID_0815</V>
        </An>
        <An n="GROUPS">
            <V>SID_4711</V>
            <V>SID_2525</V>
        </An>
    </An>
    <An n="WRITE">
        <An n="PRINCIPALS">
            <V>SID_0815</V>
        </An>
    </An>
    ...
</An>


Regular Pipelets/Processingservices will not take these annotations into account. Before storing a record in a search index, the security annotations have to be converted to regular attributes that are indexable. Therefore SecurityConverters (general or index specific) will do this transformation of annotations into attributes. Here is an example, of how the READ access rights could be represented as regular attributes:

<A n="ReadGroups">
    <L>
        <V>SID_4711</V>
        <V>SID_2525</V>
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>SID_0815</V>
    </L>
</A>

There may be use cases where instead of IDs human readable names should be used for indexing, or Groups should be resolved to their members. In this case a SecurityConverter can make use of a SecurityResolver to handle these tasks. So it could resolve human readable names for the principals and groups

<A n="ReadGroups">
    <L>
        <V>empolis\group1</V>
        <V>empolis\group9</V>
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>empolis\testuser</V>
    </L>
</A>

or it could also resolve the members of the groups

<A n="ReadUsers">
    <L>
        <V>empolis\testuser</V>
        <V>empolis\group1member1</V>
        <V>empolis\group1member2</V>
        <V>empolis\group2member1</V>
        <V>empolis\group2member2</V>
    </L>
</A>

Often data sources with and without security restrictions will be used together in one index (e.g. a filesystem and a public web site). The SecurityConverter should generate a default value for those data sources named ALL_READ and fill all required attributes with it. This is needed during the search process, as possible results without any security information would be filtered from the result list. A SecurityResolver may also use this value to replace generic groups (like authenticated_users or domain_users).

<A n="ReadGroups">
    <L>
        <V>ALL_READ</V>
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>ALL_READ</V>
    </L>
</A>


Search

A search client represents security information in exact the same way as a Crawler does it during indexing. Most likely this will only be the ID of a user executing the search.

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>SID_0815</V>
        </An>
    </An>
</An>

The security annotations are then again processed by a SecurityConverter (now in search mode) that transforms the security annotations into a filter annotation for the security attributes in the index. Note that ALL_READ is always included in the filter! (see the Search concept for details on Filters)

<A n="ReadUsers">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>SID_0815</V>
        <V>ALL_READ</V>
    </An>
</A>

Again, SecurityResolvers may be used by the SecurityConverter for various tasks. For example if only groups are used for security checking, then all groups the provided user is a membor of have to be determined and this information is then used to create the filter. For example

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>SID_0815</V>
        </An>
    </An>
</An>

is resolved to

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>SID_0815</V>
        </An>
        <An n="GROUPS">
            <V>SID_4711</V>
            <V>SID_0190</V>
        </An>
    </An>
</An>

and then converted to the filter

<A n="ReadGroupss">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>SID_4711</V>
        <V>SID_0190</V>
        <V>ALL_READ</V>
    </An>
</A>

If security is NOT relevant for search process, then simply don't use any SecurityConverters in your index and search pipelines (and configure your agents/crawlers appropriately to reduce data load)!


Enhancement for DeltaIndexing

A change of the security information of a document leads to an update of the search index. It may be desirable to distinguish between changes of the security information and changes of a document itself. Therefore one could introduce a second hash token that is created from the security information and stored in DeltaIndexingManager. If during a crawl only the hash for the security information has changed the whole processing for the document needs not to be be executed but just the update of the security information (thus saving overhead processing). Therefore the CrawlerController needs to add some kind of flag to the Record (e.g. a special Attribute) that shows if the regular hash or the security hash changed. In the Router this Attribute could be used in rules to trigger different Pipelines: the "complete processing pipeline" if the regular hash (or both hashes) changed, or the "security update pipeline" if only the security hash changed. As not all indexes will support update of selected attributes but will most likely support only a delete/add logic based on whole documents, the already processed data of this record must be loaded (either from the index or from the XML/Binary-Storage) and merged with the current Record (e.g. by a special Pipelet). This is an optional enhancement that is totally independent of the security concept. However, it should be implemented after the security concept was implemented and tested.

Back to the top