Description

This page is about Security in SMILA (Authorization). Records may be associated with security information, services may use security information to restrict/grant access on data (records). Authentication (e.g. login to a SMILA based webapplication) is not in the scope of this document.

Some thoughts about security information in SMILA:

for the majority of business cases READ access rights will suffice, but any kind of access rights should be representable
security information of a record is most likely a list of Users (Principals) and/or Groups that have certain access rights on that record
- maybe Users XOR Groups is easier to handle than allowing to combine both
- access rights that exclude Users/Groups from reading could be supported
- security based on Groups has some benefits over security based on Users (LeoSauermann agrees that groups are simpler than both groups and users. From experience, copying the model of operating systems (NTFS, ext, hfs: one owner, multiple groups) could be safer than an optimization to groups alone wher we don't fully understand the implications. 16.1.2009)
  - less data is stored in the search index (Groups have not to be resolved to their members)
  - membership of a group can be changed without the need to reindex the record
a data source (e.g. a NTFS filesystem) may be connected to a security provider (e.g. a LDAP server). Let's call these __DSSP__ (data source security provider)
for data sources that are not connected to a DSSP and do not provide any security information a defined constant READ_ALL is used instead of any specific security information. This value is needed for filtering by search engines, as in general the filter expression will contain the user and the groups he is member of, for example ... and (trustee="stuc07" or trustee="group1" or trustee="group2"). To include all documents that have no access rights restrictions the statement or trustee="READ_ALL" is added. With no value set, filtering is not possible.
a data source may enforce access rights on it's data but it may be not possible for the Agent/Crawler to access the information who has the rights to access the data (e.g. a webserver)
security information may be different for various data sources. SMILA will not provide any functionality of harmonizing security information, they are used as provided by the DSSP.
Multiple names/IDS of the same user may exist for multiple systems. this is the __user-id native to the DSSP__.
LeoSauermann suggests (16.1.2009): Inside the SMILA record format, SMILA-specific user-ids or the user-ids native to the DSSP could be used.
- When we use DSSP-proprietary ids, we should represent them as URIs.
- When we have SMILA-internal user ids also, we should represent them as UUIDs (for decentralized management) or integers (for optimization).
- SMILA may have internal user and group IDS represented as "numbers" which are centrally mapped to external user-ids native to the DSSP. A central "SMILA user id and group id" user-identification database provides means to find the internal number of a user for a user-id native to a DSSP. Optionally, the user-identification database can map multiple DSSP user-ids to one SMILA user-id (if they are the same).
Names/IDs of users and groups may be not unique. For example two DSSPs may provide a group named "authenticated users" with a totally different set of users. For such cases it will be neccessary to somehow couple the security information with the data source id (e.g. via a simple concat of the data source Id and the User/Group IDs).
DSSP specific services are needed (e.g used in the Connectivity Framework and in the Search Framework) to
- resolve all subgroups of a group
- resolve all users of a group
- resolve all groups a user is a member of
There may be both the user and group given by the DSSP and indexed, and custom user&group rights that are added later as part of the SMILA index.
in the search process these resolving services are used by a login/single sign on component to get the security information for the current user. This is part of the application logic, but the basic functionality has to be provided
search results are filtered against the provided security information, only returning the records a user has access to. As there is always a delta in the access rights stored in the index and the access rights on the data source a online check for each search result entry could be executed for high risk data.

LeoSauermann sums up his view on users and groups and DSSP-integration:

External DSSP user ids should be mapped centrally to SMILA specific IDs. SMILA then has (like ldap, unix, or other security systems) its own security database for user identification. membership of users to groups must be resolvable using external DSSP services for authorization, for identification a connection to the DSSPs user/password system must be available.

Discussion

Technical proposal

The basic idea is that a record created by a Agent/Crawler contains "raw" security information. This optional information is processed by special Pipelets in the executed pipeline that prepare the security information to be stored with the record's metadata in a search index.

Datamodell

As we don't know for what use cases SMILA will be used, we should not restrict security information to READ access rights, but provide a generic representation of security information. The default use case will be indexing and search, for which READ access will suffice. Security information should be separated from record metadata, though represented by reusing classes of the datamodel. The record itself is annotatable, so we can store the security information as annotations in the record. Therefore a specific annotation ACCESS_RIGHTS is defined. It contains subannotations for various access right types ( e.g. READ, WRITE, DELETE ) which in turn contain annotations for entities (e.g. PRINCIPALS and GROUPS). It is easily possible to add new access right types or entities, but the Security Converters/Resolvers have to be adopted to support them.

Indexing

During Indexing the ACCESS_RIGHTS annotations are read by Crawler/Agents from the datasource and stored in records. In the IndexOrderConfiguration it should be configurable what annotations are created for each record (what access right types, if to use principals or groups or both). It should also be possible to disable the creation of these annotations if no security information is used in Smila. Crawlers/Agents should pass security information as provided by the data source (e.g. Smila USER IDs = SIDs). Further processing of this data will be done by the Security Converters/Resolvers. Here is an example for the ACCESS_RIGHTS record annotations:

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>815</V> <!-- a SMILA specific user/principal id (SID) -->
        </An>
        <An n="GROUPS">
            <V>4711</V> <!-- a SMILA specific group id (SID) -->
            <V>2525</V> <!-- a SMILA specific group id (SID) -->
        </An>
    </An>
    <An n="WRITE">
        <An n="PRINCIPALS">
            <V>815</V> <!-- a SMILA specific user/principalid (SID) -->
        </An>
    </An>
    ...
</An>

Regular Pipelets/Processingservices will not take these annotations into account. Before storing a record in a search index, the security annotations have to be converted to regular attributes that are indexable. Therefore SecurityConverters (general or index specific) will do this transformation of annotations into attributes. The user and group IDs are also converted from source-specific formats to SMILA-specific formats. A central lookup table is used for matching source-specific user names to SMILA user names. Here is an example, of how the READ access rights could be represented as regular attributes. SID is the ID internal to SMILA:

<A n="ReadGroups">
    <L>
        <V>4711</V> <!-- a SMILA specific group id (SID) -->
        <V>2525</V> <!-- a SMILA specific group id (SID) -->
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>815</V> <!-- a SMILA specific user/principal id (SID) -->
    </L>
</A>

There may be use cases where instead of IDs the ids from the datasource-specific user and group representation should be used. The datasource specific representation may be more human-readable. In this case a SecurityConverter can make use of a SecurityResolver to handle these tasks. So it could resolve datasource-specific human readable names for the principals and groups, but these must be transported in different XML attributes than the SMILA SID numbers.

<A n="ReadGroupsDatasourceSpecific">
    <L>
        <V>empolis\group1</V>
        <V>empolis\group9</V>
    </L>
</A>
<A n="ReadUsersDatasourceSpecific">
    <L>
        <V>empolis\testuser</V>
    </L>
</A>

or it could also resolve the members of the groups

<A n="ReadUsersDatasourceSpecific">
    <L>
        <V>empolis\testuser</V>
        <V>empolis\group1member1</V>
        <V>empolis\group1member2</V>
        <V>empolis\group2member1</V>
        <V>empolis\group2member2</V>
    </L>
</A>

Often data sources with and without security restrictions will be used together in one index (e.g. a filesystem and a public web site). For data sources without securty considerations, the constant EVERYONE must be entered as group allowed for reading. Each record must have at least one group set for reading rights. This is needed during the search process, as possible results without any security information would be filtered from the result list. A SecurityResolver may also use EVERYONE to replace generic groups (like authenticated_users or domain_users).

<A n="ReadGroups">
    <L>
        <V>EVERYONE</V>
    </L>
</A>
<A n="ReadUsers">
    <L>
        <V>EVERYONE</V>
    </L>
</A>

In the attribute values for ReadGroups and ReadUsers, only numeric values and the predefined constants are allowed (EVERYONE). More constants will be defined. This could be optimized by defining integer constants for EVERYONE.

In the attribute values for ReadUsersDatasourceSpecific, URIs are preferred. These should be expressed using standardized formats. For LDAP, RFC 4516 specifies the URI format.

Search

A search client represents security information in exact the same way as a Crawler does it during indexing. Most likely this will only be the ID of a user executing the search.

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>815</V>
        </An>
    </An>
</An>

The security annotations are then again processed by a SecurityConverter (now in search mode) that transforms the security annotations into a filter annotation for the security attributes in the index. Note that ALL_READ is always included in the filter! (see the Search concept for details on Filters)

<A n="ReadUsers">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>815</V>
        <V>EVERYONE</V>
    </An>
</A>

Again, SecurityResolvers may be used by the SecurityConverter for various tasks. For example if only groups are used for security checking, then all groups the provided user is a membor of have to be determined and this information is then used to create the filter. For example

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>815</V>
        </An>
    </An>
</An>

is resolved to

<An n="ACCESS_RIGHTS">
    <An n="READ">
        <An n="PRINCIPALS">
            <V>815</V>
        </An>
        <An n="GROUPS">
            <V>4711</V>
            <V>190</V>
        </An>
    </An>
</An>

and then converted to the filter

<A n="ReadGroupss">
    <An n="filter">
        <V n="type">enumeration</V>
        <V n="mode">include</V>
        <V>4711</V>
        <V>190</V>
        <V>EVERYONE</V>
    </An>
</A>

If security is NOT relevant for search process, then simply don't use any SecurityConverters in your index and search pipelines (and configure your agents/crawlers appropriately to reduce data load)!

LeoSauermann questions this: SMILA is an enterprise framework. There should be better guidelines for representing the "no ACL" option.

Security Converters and Resolvers

At some points in the SMILA framework the security information needs to be converted. We can distinguish between real conversion and resolving of security information (this list may not be complete):

Converters
- preparations for Search Index (e.g. converting from Annotation to Attribute representation)
- combining data source and security information (like adding a domain or data source Id prefix to the security information)
Resolvers
- resolve a Principals Sub-Principals (e.g. members of a group, subgroups of a group)
- resolve a Principals Membership (e.g. get all groups the user is a member of)
- resolve properties of a Principal (e.g. human readable names of Principal IDs)

Handling of security information should be optional (configurable).
Crawlers return unmodified security information as provided by the data source
Search Clients provide Principals (in general a user context) to execute the search for
Converters are implemented as Pipelets/ProcessingServices. Converters may be generic or search index specific.
Converters execution logic is different for indexing (conversion to attributes) and search (conversion to filter annotations) process (perhaps it is better to seperate these tasks in different pipelets)
Converters may use Resolvers for further processing security information
Resolvers are implemented as OSGi services, not as Processing services !
Resolvers may be used by Converters or any other component in SMILA (e.g. a login componentn of a search application)

Here is an illustration of the proposed architecture of security resolvers and converters. Note that the use of Resolvers is optional:

LeoSauermann: LDAP is maybe not the right standard here, SASL could be used (LDAP is compatible with SASL).

Here is a proposal for the SecurityResolver interface.

interface SecurityResolver
{
    /**
     * Returns all properties of the given principal.
     */
    Properties getProperties(String principal);
 
    /**
     * Returns all principals that are member to the given group, including any subgroups.
     */
    Set<String> resolveGroupMembers(String group);
 
    /**
     * Returns all groups the given principal is member of.
     */    
    Set<String> resolveMembership(String principal);
 
    /**
     * Checks if the given principal is a group.
     */    
    boolean isGroup(String principal);
 
    /**
     * map a datasource specific user id to a SMILA user id.
     * todo: how to identify the datasource here?
     */
    long resolveDataSourceUserToSID(String datasourceSpecificUserID, datasourceID);
 
    /**
     * map a datasource specific group id to a SMILA group id.
     * todo: how to identify the datasource here?
     */
    long resolveDataSourceGroupToSID(String datasourceSpecificGoupID, datasourceID);
 
}

Enhancement for DeltaIndexing

A change of the security information of a document leads to an update of the search index. It may be desirable to distinguish between changes of the security information and changes of a document itself. Therefore one could introduce a second hash token that is created from the security information and stored in DeltaIndexingManager. If during a crawl only the hash for the security information has changed the whole processing for the document needs not to be be executed but just the update of the security information (thus saving overhead processing). Therefore the CrawlerController needs to add some kind of flag to the Record (e.g. a special Attribute) that shows if the regular hash or the security hash changed. In the Router this Attribute could be used in rules to trigger different Pipelines: the "complete processing pipeline" if the regular hash (or both hashes) changed, or the "security update pipeline" if only the security hash changed. As not all indexes will support update of selected attributes but will most likely support only a delete/add logic based on whole documents, the already processed data of this record must be loaded (either from the index or from the XML/Binary-Storage) and merged with the current Record (e.g. by a special Pipelet). This is an optional enhancement that is totally independent of the security concept. However, it should be implemented after the security concept was implemented and tested.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Specifications/Smila Security Concept

Contents

Description

Discussion

Technical proposal

Datamodell

Indexing

Search

Security Converters and Resolvers

Enhancement for DeltaIndexing

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Specifications/Smila Security Concept

Contents

Description

Discussion

Technical proposal

Datamodell

Indexing

Search

Security Converters and Resolvers

Enhancement for DeltaIndexing