Description

This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on data (records). Authentication (e.g. login to a SMILA based webapplication) is not in the scope of this document.

Some thoughts about security information in SMILA:

for the majority of business cases READ access rights will suffice, but any kind of access rights should be representable
security information of a record is most likely a list of Users (Principals) and/or Groups that have certain access rights on that record
- maybe Users XOR Groups is easier to handle than allowing to combine both
- access rights that exclude Users/Groups from reading could be supported
- security based on Groups has some benefits over security based on Users
  - less data is stored in the search index (Groups have not to be resolved to their members)
  - membership of a group can be changed without the need to reindex the record
a data source (e.g. a NTFS filesystem) may be connected to a security provider (e.g. a LDAP server). Let's call these __DSSP__ (data source security provider)
for data sources that are not connected to a DSSP and do not provide any security information a defined constant EVERYONE is used instead of any specific security information. This value is needed for filtering by search engines, as in general the filter expression will contain the user and the groups he is member of, for example ... and (trustee="stuc07" or trustee="group1" or trustee="group2"). To include all documents that have no access rights restrictions the statement or trustee="EVERYONE" is added. With no value set, filtering is not possible.
a data source may enforce access rights on it's data but it may be not possible for the Agent/Crawler to access the information who has the rights to access the data (e.g. a webserver)
security information may be different for various data sources. SMILA will not provide any functionality of harmonizing security information, they are used as provided by the DSSP.
Names/IDs of users and groups may be not unique. For example two DSSPs may provide a group named "authenticated users" with a totally different set of users. For such cases it will be neccessary to somehow couple the security information with the data source id (e.g. via a simple concat of the data source Id and the User/Group IDs).
DSSP specific services are needed (e.g used in the Connectivity Framework and in the Search Framework) to
- resolve all subgroups of a group
- resolve all users of a group
- resolve all groups a user is a member of
in the search process these resolving services are used by a login/single sign on component to get the security information for the current user. This is part of the application logic, but the basic functionality has to be provided
search results are filtered against the provided security information, only returning the records a user has access to. As there is always a delta in the access rights stored in the index and the access rights on the data source a online check for each search result entry could be executed for high risk data.

Discussion

User:Leo.sauermann.dfki.de (16.1.2009):

I agree that groups are simpler than both groups and users. From experience, copying the model of operating systems (NTFS, ext, hfs: one owner, multiple groups) could be safer than an optimization to groups alone wher we don't fully understand the implications.
- User:Daniel.stucky.empolis.com: We don't want to enforce any security model/pattern. As a framework we should support users to model security the way they want. Either based on users, groups or both. But we should offer suggestions (based on experience) which approach is better for which usecase.
Multiple names/IDS of the same user may exist for multiple systems. this is the __user-id native to the DSSP__.
- User:Daniel.stucky.empolis.com: I will add a separate section Ambigous SIDs in datasources discussing this topic.
Inside the SMILA record format, SMILA-specific user-ids or the user-ids native to the DSSP could be used.
- When we use DSSP-proprietary ids, we should represent them as URIs.
- When we have SMILA-internal user ids also, we should represent them as UUIDs (for decentralized management) or integers (for optimization).
- SMILA may have internal user and group IDS represented as "numbers" which are centrally mapped to external user-ids native to the DSSP. A central "SMILA user id and group id" user-identification database provides means to find the internal number of a user for a user-id native to a DSSP. Optionally, the user-identification database can map multiple DSSP user-ids to one SMILA user-id (if they are the same).
There may be both the user and group given by the DSSP and indexed, and custom user&group rights that are added later as part of the SMILA index.

Sum up of my view on users and groups and DSSP-integration: External DSSP user ids should be mapped centrally to SMILA specific IDs. SMILA then has (like ldap, unix, or other security systems) its own security database for user identification. membership of users to groups must be resolvable using external DSSP services for authorization, for identification a connection to the DSSPs user/password system must be available.

if SIDs are resolved to more human readable names by a SecurityResolver, then they should be transported in different XML attributes than the original SID.
- User:Daniel.stucky.empolis.com: This is up to the SecurityConverter respectively it's configuration. In this step the SECURITY annotations are converted to regular attributes. How these attributes are named, what values they include should be completly configurable. It should not be neccessary to index both SIDs and human readable names. If someone wants to do this different attributes are needed indeed.

<A n="ReadGroups">
    <L>
        <V>4711</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
        <V>2525</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
    </L>
</A>
<A n="ReadGroupsDatasourceSpecific">
    <L>
        <V>empolis\group1</V> <!-- a group SID resolved to a a human readable group name  -->
        <V>empolis\group9</V> <!-- a group SID resolved to a a human readable group name  -->
    </L>
</A>
<A n="ReadUsersDatasourceSpecific">
    <L>
        <V>empolis\testuser</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
    </L>
</A>

I suggest to use the constant name EVERYONE instead of ALL_READ
- User:Daniel.stucky.empolis.com: Yes, it fits better if used in context with other access rights than READ.
In the attribute values for ReadGroups and ReadUsers, only numeric values and the predefined constants are allowed (EVERYONE). More constants will be defined. This could be optimized by defining integer constants for EVERYONE.
In the attribute values for ReadUsersDatasourceSpecific, URIs are preferred. These should be expressed using standardized formats. For LDAP, RFC 4516 specifies the URI format.
SMILA is an enterprise framework. There should be better guidelines for representing the "no ACL" option then to not use certain pipelets
LDAP is maybe not the right standard used in the chart, SASL could be used (LDAP is compatible with SASL).
the SecurityResolver will need functionality to map a DSSP user/group IDs to SMILA user/group ids:

/**
 * map a datasource specific user id to a SMILA user id.
 * todo: how to identify the datasource here?
 */
long resolveDataSourceUserToSID(String datasourceSpecificUserID, datasourceID);
 
/**
 * map a datasource specific group id to a SMILA group id.
 * todo: how to identify the datasource here?
 */
long resolveDataSourceGroupToSID(String datasourceSpecificGoupID, datasourceID);

User:Daniel.stucky.empolis.com: Smila currently has no user or user rights management, as this was not a requirement for the framework. Therefore it was not intended to convert external user/group IDs to internal Smila user/group IDs at all. Of course there will be clients to Smila that will have a separate user management (e.g. a Liferay Portal). It is the job of a client to map it's logins to any logins for external datasources, like the ones indexed by Smila. Smila could provide some Services that offer such a functionality, but this is not required. The security information provided by Crawlers/Agents should be usable in it's raw unmodified form. If desired it is possible to convert these into more human readable names by Resolvers, e.g. plain text or standardized URIs. Again, we should not enforce any model/pattern.

Technical proposal

The basic idea is that a record created by a Agent/Crawler contains "raw" security information. This optional information is processed by special Pipelets in the executed pipeline that prepare the security information to be stored with the record's metadata in a search index.

Datamodell

As we don't know for what use cases SMILA will be used, we should not restrict security information to READ access rights, but provide a generic representation of security information. The default use case will be indexing and search, for which READ access will suffice. Security information should be separated from record metadata, though represented by reusing classes of the datamodel. The record itself is annotatable, so we can store the security information as annotations in the record. Therefore a specific annotation ACCESS_RIGHTS is defined. It contains subannotations for various access right types ( e.g. READ, WRITE, DELETE ) which in turn contain annotations for entities (e.g. PRINCIPALS and GROUPS). It is easily possible to add new access right types or entities, but the Security Converters/Resolvers have to be adopted to support them.

Indexing

During Indexing the security information for a record is read from the datasource by Crawlers/Agents, which create the ACCESS_RIGHTS annotations thereof and store them in the record. In the IndexOrderConfiguration it should be configurable what annotations are created for each record (what access right types, if to use principals or groups or both). It should also be possible to disable the creation of these annotations if no security information is used in Smila. Crawlers/Agents should pass security information as provided by the data source, e.g. SIDs (Security IDs). Further processing of this data will be done by the Security Converters/Resolvers. Here is an example for the ACCESS_RIGHTS record annotations:

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
        </An>
        <An n="GROUPS">
            <V>4711</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
            <V>2525</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
        </An>
    </An>
    <An n="WRITE">
        <An n="PRINCIPALS">
            <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
        </An>
    </An>
    ...
</An>

Regular Pipelets/Processingservices will not take these annotations into account. Before storing a record in a search index, the security annotations have to be converted to regular attributes that are indexable. Therefore SecurityConverters (general or index specific) will do this transformation of annotations into attributes. Here is an example, of how the READ access rights could be represented as regular attributes:

<A n="ReadGroups">
    <L>
        <V>4711</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
        <V>2525</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
    </L>
</A>

There may be use cases where instead of SIDs more human readable names should be used for indexing. Another use case is that Groups should be resolved to their members (either SIDs or also human readable names). In this case a SecurityConverter can make use of a SecurityResolver to handle these tasks. So it could resolve datasource-specific human readable names for the principals and groups

<A n="ReadGroups">
    <L>
        <V>empolis\group1</V> <!-- a group SID resolved to a a human readable group name  -->
        <V>empolis\group9</V> <!-- a group SID resolved to a a human readable group name  -->
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>empolis\testuser</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
    </L>
</A>

or it could also resolve the members of the groups

<A n="ReadUsers">
    <L>
        <V>empolis\testuser</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
        <V>empolis\group1member1</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
        <V>empolis\group1member2</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
        <V>empolis\group2member1</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
        <V>empolis\group2member2</V> <!-- a user/principal SID resolved to a a human readable user/principal name  -->
    </L>
</A>

Often data sources with and without security restrictions will be used together in one index (e.g. a filesystem and a public web site). The SecurityConverter should generate a default value for those data sources named EVERYONE and fill all required attributes with it. This is needed during the search process, as possible results without any security information would be filtered from the result list. A SecurityResolver may also use this value to replace generic groups (like authenticated_users or domain_users).

<A n="ReadGroups">
    <L>
        <V>EVERYONE</V>
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>EVERYONE</V>
    </L>
</A>

Search

A search client represents security information in exact the same way as a Crawler does it during indexing. Most likely this will only be the ID of a user executing the search.

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
        </An>
    </An>
</An>

The security annotations are then again processed by a SecurityConverter (now in search mode) that transforms the security annotations into a filter annotation for the security attributes in the index. Note that EVERYONE is always included in the filter! (see the Search concept for details on Filters)

<A n="ReadUsers">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>0815</V>
        <V>EVERYONE</V>
    </An>
</A>

Again, SecurityResolvers may be used by the SecurityConverter for various tasks. For example if only groups are used for security checking, then all groups the provided user is a membor of have to be determined and this information is then used to create the filter. For example

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
        </An>
    </An>
</An>

is resolved to

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>0815</V> <!-- an unmodified user/principal id (SID) as provided by a DSSP -->
        </An>
        <An n="GROUPS">
            <V>4711</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
            <V>0190</V> <!-- an unmodified group id (SID) as provided by a DSSP -->
        </An>
    </An>
</An>

and then converted to the filter

<A n="ReadGroupss">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>4711</V>
        <V>0190</V>
        <V>EVERYONE</V>
    </An>
</A>

If security is NOT relevant for search process, then simply don't use any SecurityConverters in your index and search pipelines (and configure your agents/crawlers appropriately to reduce data load)!

Security Converters and Resolvers

At some points in the SMILA framework the security information needs to be converted. We can distinguish between real conversion and resolving of security information (this list may not be complete):

Converters
- preparations for Search Index (e.g. converting from Annotation to Attribute representation)
- combining data source and security information (like adding a domain or data source Id prefix to the security information)
Resolvers
- resolve a Principals Sub-Principals (e.g. members of a group, subgroups of a group)
- resolve a Principals Membership (e.g. get all groups the user is a member of)
- resolve properties of a Principal (e.g. human readable names of Principal IDs)

Handling of security information should be optional (configurable).
Crawlers return unmodified security information as provided by the data source
Search Clients provide Principals (in general a user context) to execute the search for
Converters are implemented as Pipelets/ProcessingServices. Converters may be generic or search index specific.
Converters execution logic is different for indexing (conversion to attributes) and search (conversion to filter annotations) process (perhaps it is better to seperate these tasks in different pipelets)
Converters may use Resolvers for further processing security information
Resolvers are implemented as OSGi services, not as Processing services !
Resolvers may be used by Converters or any other component in SMILA (e.g. a login componentn of a search application)

Here is an illustration of the proposed architecture of security resolvers and converters. Note that the use of Resolvers is optional:

Here is a proposal for the SecurityResolver interface.

interface SecurityResolver
{
    /**
     * Returns all properties of the given principal.
     */
    Properties getProperties(String principal);
 
    /**
     * Returns all principals that are member to the given group, including any subgroups.
     */
    Set<String> resolveGroupMembers(String group);
 
    /**
     * Returns all groups the given principal is member of.
     */    
    Set<String> resolveMembership(String principal);
 
    /**
     * Checks if the given principal is a group.
     */    
    boolean isGroup(String principal);
}

Ambigous SIDs in datasources

Enhancement for DeltaIndexing

A change of the security information of a document leads to an update of the search index. It may be desirable to distinguish between changes of the security information and changes of a document itself. Therefore one could introduce a second hash token that is created from the security information and stored in DeltaIndexingManager. If during a crawl only the hash for the security information has changed the whole processing for the document needs not to be be executed but just the update of the security information (thus saving overhead processing). Therefore the CrawlerController needs to add some kind of flag to the Record (e.g. a special Attribute) that shows if the regular hash or the security hash changed. In the Router this Attribute could be used in rules to trigger different Pipelines: the "complete processing pipeline" if the regular hash (or both hashes) changed, or the "security update pipeline" if only the security hash changed. As not all indexes will support update of selected attributes but will most likely support only a delete/add logic based on whole documents, the already processed data of this record must be loaded (either from the index or from the XML/Binary-Storage) and merged with the current Record (e.g. by a special Pipelet). This is an optional enhancement that is totally independent of the security concept. However, it should be implemented after the security concept was implemented and tested.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Specifications/Smila Security Concept

Contents

Description

Discussion

Technical proposal

Datamodell

Indexing

Search

Security Converters and Resolvers

Ambigous SIDs in datasources

Enhancement for DeltaIndexing

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Specifications/Smila Security Concept

Contents

Description

Discussion

Technical proposal

Datamodell

Indexing

Search

Security Converters and Resolvers

Ambigous SIDs in datasources

Enhancement for DeltaIndexing