Jump to: navigation, search

PTP/designs/SCI

< PTP‎ | designs
Revision as of 12:16, 12 December 2010 by Tuhongj.cn.ibm.com (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Introduction

SCI (Scalable Communication Infrastructure) is a light-weight communication library which provides scalable message transmission functions for a client-server model, especially for a central server associated with a large number of clients. Internally, SCI makes use of a classical tree-based hierarchical structure to build up message transmission paths among server and clients. Typically, the server can be considered as front end and the clients can be considered as back ends.

Installation

Get the source code from the Eclipse CVS repository or download it into your Eclipse workspace. To download the SCI source from the Eclipse CVS repository:

  • CD to a directory where the SCI source will be extracted
  • Set CVSROOT by issuing the command export CVSROOT=:pserver:anonymous@dev.eclipse.org/cvsroot/tools
  • Checkout the SCI source by issuing the command cvs checkout org.eclipse.ptp/tools/sci/org.eclipse.ptp.sci
  • The SCI source will be located in the org.eclipse.ptp/tools/sci/org.eclipse.ptp.sci subdirectory
  • Note that you can specify qualifiers to the cvs command to extract SCI source other than at the HEAD (latest) level.

To download the SCI source into your Eclipse workspace:

  • Start Eclipse
  • Open the Eclipse installation wizard by clicking the Help menu then clicking Install New Software.
  • To download the initial SCI source, select the latest Eclipse release download site, Helios in 2010, from the Work with: dropdown. To download SCI updates, select the "Eclipse Project Update site"
  • Open the General Purpose Tools node in the software list and check the checkbox next to "PTP Scalable Communication Infrastructure (SCI)"
  • Click Next and follow the remaining prompts in the installation wizard
  • The SCI source code will be installed in the plugins directory of your Eclipse installation. The installation process will create a subdirectory where the directory name includes a time stamp. For instance, org.eclipse.ptp.sci_1.0.0.201006142322.
  • Once you download the SCI source, you must transfer the entire contents of this directory to the system where you will build SCI using FTP or other file transfer mechanism.

To configure, build and install SCI, change to the source directory and run the following commands:

./configure --prefix=/install/location
make
make install

If you want to enable the OpenSSL security mechanism in SCI, add the --enable-openssl option to the configure command.

Assuming SCI is installed into the directory /opt/sci, the SCI daemon scid is located in /opt/sci/sbin. You must have root privileges to start scid. Once you have root privileges, run the command /opt/sci/sbin/scid. You can also modify your system startup scripts to start scid at system startup.

Topology

Typically, a SCI session contains the following processes:

  • A front end process (FE)
  • One or multiple back end processes (BEs)

If using standalone agent mode which will be explained below, there will also be

  • Zero or multiple agent processes (scia)

The processes build up a tree-based structure. The front end is the tree root and the back ends are the leaves. The communications are between the front end and the back ends. The messages are forwarded by agents and messages also can be filtered by the plug-ins running in the front end or agents when they are passing messages either upstream or downstream.

SCI supports both stand-alone agent mode and embedded agent mode which can be specified by the environment variable SCI_EMBED_AGENT=[yes|no]. The interfaces for both modes are almost identical and are transparent to users with the exception of a connection call-back function which can be only used in the embedded mode. For stand-alone mode, the scia processes forward messages from the FE to the BEs. For the embedded mode, the scia's are embedded into the BEs, in which case the BEs are called embedded agents (EAs). The tree-based hierarchical structures are the same for both modes. The front end is actually the root agent.

Stand-alone agent mode
Embedded agent mode

Note: in the second diagram, a big block stands for a real entire back end which contains the user application code (BE) and the embedded agents (EA). Embedded agents are linked with the back end and run on separate threads from the application code. Not all the back ends have embedded agents. A back end will receive configuration information during SCI_Initialize which indicates if it should create embedded agents.

Transmitting Messages

The SCI_Bcast and SCI_Upload functions are the most important functions for transmitting messages. SCI_Bcast is used to send messages from front end to any of the back ends while SCI_Upload is used to send messages from a back end to the front end.

SCI provides a one-sided communication model for message transmission, that is, there is a message handler which is a call-back function registered when SCI_Initialize is called. The call-back function is called automatically when a message is received. SCI has both interrupt mode and polling mode. The message handler is triggered by an incoming message in either mode. For interrupt mode, this handler is called in a separate SCI thread while for polling mode, it is called within the SCI_Poll function. Typically, it is the main thread which is blocking on SCI_Poll when waiting for a message.

Filtering

SCI provides a plug-in mechanism for messages filtering which is done in SCI agents (scia or EA, also FE). Only one filter can be used for downstream (FE->BEs) per agent while multiple filters can be cascaded for upstream (BEs->FE) per agent/embedded agent.

Groups

A group may contain one or multiple back end IDs. Users can create groups and communicate with groups of back ends through a set of group APIs. A back end ID is a predefined group consisting of only that back end. The SCI_GROUP_ALL is also a predefined group which contains all the back ends.

Internally, a group provides two kinds of information. The first is the back end IDs belonging to this group. The second is, for a particular agent or FE, the direct children of the agent or FE.

Launch modes

SCI implements two launch modes: internal launch and external launch.

Internal launching means the SCI back ends are forked by the scids directly when the front end calls SCI_Initialize.

The external launching mode is intended for when the back ends have to be launched by a third party job launcher other than scid. For example, if the back end is a debug engine and it has to be launched by another job launcher such as POE/PMD. To enable this mode the SCI_CLIENT_ID and SCI_JOB_KEY environment variables must be set prior to the back ends calling SCI_Initialize. The front end must also set the same value for SCI_JOB_KEY and call SCI_Initialize() so that the back end can find its parent agent and connect back to the SCI tree hierarchy. Both the front end and back ends should also set SCI_USE_EXTLAUNCHER=yes.

It would be convenient to use scid to help building the session for both internal launching and external launching mode, but it is also possible to run the job without scid for the both modes. e.g, For internal launching, if user specifies SCI_REMOTE_SHELL=ssh, then a ssh session will help to launch the agent or the back end remotely. For external launching mode, three other environment variables have to be specified at the back ends: SCI_PARENT_HOSTNAME (the parent scia/fe's hostname), SCI_PARENT_PORT(the parent scia/fe's port),SCI_PARENT_ID(the parent scia/fe's agent id); Also, the front end needs to specify SCI_REMOTE_SHELL=true to bypass the scid connecting or the SCI_REMOTE_SHELL also can be set to a script to help building the routing information.

Working model

The figures below are two examples for typical programming models. The handler is triggered once a message arrives.

Working model 1
Working model 2

APIs

All the function definitions and the related data structures can be referenced in sci.h and man pages. All the APIs except SCI_Initialize() and SCI_Terminate() are thread-safe.

SCI Environment initialize/terminate/query functions

int SCI_Initialize(sci_info_t *info);

This must be the first function called in a SCI job for both front end and back end.

typedef struct {
    sci_end_type_t          type;
    SCI_self_init_hndlr     *connect_hndlr;
    union {
        sci_fe_info_t           fe_info;
        sci_be_info_t           be_info;
    } _u;
#define fe_info _u.fe_info;
#define be_info _u.be_info
} sci_info_t;
type
used to specify SCI_FRONT_END or SCI_BACK_END.
connect_hndlr
used when users want to define their own connection method. Only embedded agent mode can use it.
fe_info
used in the front end while be_info is used in the back end.
typedef struct {
    sci_mode_t           mode;
    SCI_msg_hndlr        *hndlr;
    void                 *param;
    SCI_err_hndlr        *err_hndlr;
    char                 *hostfile;
    char                 *bepath;
    char                 **beenvp;
    sci_filter_list_t    		filter_list;
    char                 **host_list;
    char                 reserve[64];
} sci_fe_info_t;
mode
used to specify SCI_INTERRUPT or SCI_POLLING.
hndlr
a call back function which is called when messages arrive.
param
the message handler hndlr’s input parameter.
hostfile or host_list
used to specify the hostnames or IP addresses where the back ends will be launched, these two parameters are exclusive. If hostfile field is set in this structure, it can be overridden at runtime through the environment variable SCI_HOST_FILE. Normally using a hostfile is convenient, but if the host list is retrieved from somewhere at runtime, using the host_list directly is preferable.

The host list entries can be either host names or IP addresses. if IP addresses are used, no name resolution will happen internally. When working on a very large scale, users may prefer to use IP addresses to reduce processing time.

bepath
used to specify the path of the back end, which can be changed at runtime through the environment variable SCI_BACKEND_PATH.
beenvp
used to pass the environment variables other than SCI_ to the back ends. The last element in this array is required to be NULL. The format of each environment variable is “XXX=YYY”.
filter_list
used to specify a set of filters to be loaded during initializing. The filters will be loaded before the back ends are launched and before any messages are uploaded.
err_hndlr
intended for failover and recovery in the future. Simply set it to NULL to ignore it.
typedef struct {
    sci_mode_t       mode;
    SCI_msg_hndlr    *hndlr;
    void             *param;
    SCI_err_hndlr    *err_hndlr;
    char             reserve[64];
} sci_be_info_t;
mode
used to specify SCI_INTERRUPT or SCI_POLLING.
hndlr
the message handler
param
the message handler’s input parameter.
err_hndlr
intended for failover and recovery in the future. This field should be set to NULL.

int SCI_Terminate()

This is the last function in a SCI job, the entire SCI session will terminate and all resources are freed when this function returns.

int SCI_Query(sci_query_t query, void *ret_val);

typedef enum {
    JOB_KEY,
    NUM_BACKENDS,
    BACKEND_ID,
    POLLING_FD,
    NUM_FILTERS,
    FILTER_IDLIST,
    AGENT_ID,
    NUM_SUCCESSORS,
    SUCCESSOR_IDLIST,
    HEALTH_STATUS,
    AGENT_LEVEL
} sci_query_t;

The individual enumerations for sci_query_t have the following meanings

  • JOB_KEY: get the job key.
  • NUM_BACKENDS: get the number of back ends under this agent or FEBACKEND_ID: get the backend id.
  • POLLING_FD: get the file descriptor which can be used by select or poll.
  • NUM_FILTERS: get the number of filters.
  • FILTER_IDLIST: get the filters’ ids. Query NUM_FILTERS to get the number of filters first, then you can allocate proper space to get the filter id list. The output variable ret_val should be an int array. AGENT_ID: get the agent id;
  • NUM_SUCCESSORS: get the number of successors.
  • SUCCESSOR_IDLIST: get the ids of the successor list. Query NUM_SUCCESSOR first, then you can allocate proper space to get the successor id list. The output variable ret_val should be an int array.
  • HEALTH_STATUS: get the working status of a SCI front end, agent, or back end. (normal or exited).
  • AGENT_LEVEL: get the agent’s level. (FE is at level 0, the FE’s direct successors are level 1, and the successors’ successors are level 2 and so on).

Below is a table to summarize where each SCI query may be issued:

Front End Filter Back End
JOB_KEY X X X
NUM_BACKENDS X X
BACKEND_ID X
POLLING_FD X X
NUM_FILTERS X X X
FILTER_IDLIST X X X
AGENT_ID X X
NUM_SUCCESSORS X X
SUCCESSOR_IDLIST X X
HEALTH_STATUS X X X
AGENT_LEVEL X X

SCI Communication functions

int SCI_Bcast(int filter_id, sci_group_t group, int num_bufs, void *bufs[], int sizes[]);

This is used to broadcast messages from FE to BEs, only FE can call it. SCI_Bcast() sends a single message this is composed of all the message fragments in bufs.

filter_id
the id of the filter which is going to filter a message when it is processed by an agent/embedded agent/front end.
group
the destination of the message; it can be a user defined group id or simply a back end id. SCI_GROUP_ALL means the destinations are all the back ends.
num_bufs
the number of bufs.
bufs[]
the messages array.
sizes[]
the messages’ length array corresponding to bufs[].

int SCI_Upload(int filter_id, sci_group_t group, int num_bufs, void *bufs[], int sizes[]);

This function is used to upload messages from a back end to the front end.

filter_id
the id of the filter which is going to filter the message.
group
ignored
num_bufs, bufs[], sizes[]
have the same meaning as the parameters in SCI_Bcast.

int SCI_Poll(int timeout);

This function will block and wait until a message arrives or the timeout interval is reached.

timeout
the timeout in milliseconds; < 0 means no timeout. >=0 means waiting until timeout.

The return value will be 0 for success and SCI_ERR_POLL_TIMEOUT when a timeout happens.

SCI Group manipulation functions

int SCI_Group_create(int num_bes, int *be_list, sci_group_t *group);

This function is used to create a new group. It is a blocking call so the caller can assume group is ready to use upon the return of the function.

num_bes
the number of back ends in the be_list.
be_list
the back end id list to be contained in the new group.
group
an output parameter; it is the created new group and can be used by SCI_Bcast as a set of destinations.

int SCI_Group_free(sci_group_t group);

This function is used to free an existing group which was previously created by SCI_Group_create.

group
the group to be freed.

int SCI_Group_operate(sci_group_t group1, sci_group_t group2, sci_op_t op, sci_group_t *newgroup);

group1 & group2
the groups participating in the operation.
newgroup
the result group.
typedef enum {
    SCI_UNION,
    SCI_INTERSECTION,
    SCI_DIFFERENCE
} sci_op_t;
SCI_UNION
the newgroup is the union of group1 & group2.
SCI_INTERSECTION
the newgroup is the intersection of group1 & group2.
SCI_DIFFERENCE
the newgroup is the difference of group1 & group2.

int SCI_Group_operate_ext(sci_group_t group, int num_bes, int *be_list, sci_op_t op, sci_group_t *newgroup);

group
an existing group.
num_bes and be_list
the back end ids to be added to or removed from the group
newgroup
the result group.
op
has the same meaning as the one in SCI_Group_operate.

int SCI_Group_query(sci_group_t group, sci_group_query_t query, void *ret_val);

group
the group id to be queried.
ret_val
the output parameter which saves the result. The user is responsible for allocating sufficient space to hold the result.. Typically GROUP_MEMBER_NUM is called before GROUP_MEMBER and GROUP_SUCCESSOR_NUM is called before GROUP_SUCCESSOR in order to determine the size of the result.
typedef enum {
    GROUP_MEMBER_NUM,
    GROUP_MEMBER,
    GROUP_SUCCESSOR_NUM,
    GROUP_SUCCESSOR
} sci_group_query_t;
GROUP_MEMBER_NUM
get the number of members of this group. TThis number is the number of the back ends which have the caller agent (or FE) as their common ancestor.
GROUP_MEMBER
get the member list of this group.
GROUP_SUCCESSOR_NUM
get the number of successors which have the group members. A successor means it is a direct child of the caller and some of the group members are under it. This is intended for a filter which wants to send messages through a specified path.
GROUP_SUCCESSOR
get the successor list. . A successor means it is a direct child of the caller and it is an ancestor of some of the group members unless it is a back end. This is intended for users who want to send messages to part of the group members through some specified paths instead of all the members.

Note: GROUP_MEMBER means the back ends belonging to this group whose parent or grand-parent is this agent or front end where this query is issued, while successor means the direct children of the caller. For example, an SCI job has 8 back ends and the fanout is set to 2. Then when the query is issued in the front end, the GROUP_MEMBER of SCI_GROUP_ALL are 0, 1, … 7 and GROUP_MEMBER_NUM is 8, while the GROUP_SUCCESSOR are the agents whose id are -2 and -3, and the GROUP_SUCCESSOR_NUM will be 2 according to the tree hierarchy. If this query is called in agent -2, then the GROUP_MEMBER will be 0, 1, … 3, GROUP_MEMBER_NUM will be 4 but the GROUP_SUCCESSOR will be -4 and -5 as well as GROUP_SUCCESSOR_NUM will be 2.

SCI Filter related functions

int SCI_Filter_load(sci_filter_info_t *filter_info);

This function is used to load a filter plug-in.

filter_info
contains the information for the filter plug-in to be loaded by dlopen in all the agents/embedded agents/front end. The back ends do not load the filters, but they have the filter list information.
typedef struct {
    int              filter_id;
    char             *so_file;
} sci_filter_info_t;
filter_id
the id of this filter.
so_file
location of this filter plug-in.

int SCI_Filter_unload(int filter_id);

This function is used to unload a filter whose id is filter_id.

int SCI_Filter_bcast(int filter_id, int num_successors, int *successor_list, int num_bufs, void *bufs[], int sizes[]);

This function broadcasts the messages downstream to the destinations specified by the successor_list; it must be called in the filter.

filter_id
stands for the filter to handle this message in the next hop agent/embedded agent; if set SCI_FILTER_NULL, it actually means the original filter specified in the SCI_Bcast which is called in the front end.
num_bufs, bufs[], and sizes[]
have the same meaning as SCI_Bcast.

int SCI_Filter_upload(int filter_id, sci_group_t group, int num_bufs, void *bufs[], int sizes[]);

This function is used to transmit the messages to another filter or upper layer and it must be called in the filter.

filter_id
the destination filter in the same agent/embedded agent/front end; if set to SCI_FILTER_NULL, the message will be set back with the original filter id specified in SCI_Upload in the back end and transmitted to the parent agent/embedded agent/front end. The filters are cascaded with this function.
group
ignored.
num_bufs, bufs[], and sizes[]
have the same meaning as SCI_Upload.

SCI Dynamic add/remove back end

int SCI_BE_add(sci_be_t *be);

This function is used to add a new back end dynamically.

typedef struct {
    int              id;
    char             *hostname;
    int              level;
} sci_be_t;
id
the target back end id; If it a positive value, SCI will assign that id to the back end. While if set to -1, the id of the back end to be added will be allocated internally and assigned back. That is, when the API returns, it will be assigned the real back end id.
hostname
the target node where the back end will be launched. It can be either a host name or an IP address.

int SCI_BE_remove(int be_id);

This function is used to remove a back end whose id is be_id.

API Function Return Values

#define SCI_SUCCESS                  (0)
#define SCI_ERR_INVALID_HOSTFILE     (-2001)
#define SCI_ERR_INVALID_ENDTYPE      (-2002)   
#define SCI_ERR_INITIALIZE_FAILED    (-2003)
#define SCI_ERR_INVALID_CALLER       (-2004)
#define SCI_ERR_GROUP_NOTFOUND       (-2005)
#define SCI_ERR_FILTER_NOTFOUND      (-2006)
#define SCI_ERR_INVALID_FILTER       (-2007)
#define SCI_ERR_BACKEND_NOTFOUND     (-2008)
#define SCI_ERR_UNKNOWN_INFO         (-2009)
#define SCI_ERR_UNINTIALIZED         (-2010)
#define SCI_ERR_GROUP_PREDEFINED     (-2011)
#define SCI_ERR_GROUP_EMPTY          (-2012)
#define SCI_ERR_INVALID_OPERATOR     (-2013)
#define SCI_ERR_FILTER_PREDEFINED    (-2014)
#define SCI_ERR_POLL_TIMEOUT         (-2015)
#define SCI_ERR_INVALID_JOBKEY       (-2016)
#define SCI_ERR_MODE                 (-2017)
#define SCI_ERR_FILTER_ID            (-2018)
#define SCI_ERR_INVALID_SUCCESSOR    (-2019)
#define SCI_ERR_BACKEND_EXISTED      (-2020)
#define SCI_ERR_NO_MEM               (-2021)
#define SCI_ERR_LAUNCH_FAILED        (-2022)
#define SCI_ERR_POLL_INVALID         (-2023)
#define SCI_ERR_INVALID_USER         (-2024)
#define SCI_ERR_INVALID_MODE         (-2025)

Code Examples

SCI Environment initialize/terminate/query functions

Front end

This is a minimal code sample for a front end. SCI_Initialize() is a complicated function. Users must specify the type, mode, bepath and host list to let SCI know the role of the program, the working mode, the back ends’ location as well as where to launch them. The call-back function handler is the key of the one-sided communication model used by SCI, that is, the handler will be triggered when a message is received. SCI_Query() queries for the number of back ends here. SCI_Terminate() ends the entire session.

void handler(void *user_param, sci_group_t group, void *buffer, int size)
{
	…
}

int main(int argc, char *argv[])
{
    sci_info_t info;
    int num_bes;
    ...

    memset(&info, 0, sizeof(info));
    info.type = SCI_FRONT_END;
    info.fe_info.mode = SCI_INTERRUPT;
    info.fe_info.hostfile = hfile;
    info.fe_info.bepath = bpath;
    info.fe_info.hndlr = (SCI_msg_hndlr *)&handler;

    rc = SCI_Initialize(&info);
    ...
    rc = SCI_Query(NUM_BACKENDS, &num_bes);
    ...
    /* Doing communications */
    ...

    rc = SCI_Terminate();
    ...
    return 0;
}

Back end

In this code sample, the result is the user_param which is set as the info.be_info.param and is delivered as the input parameter for handler. Users can pass any type of variables into the handler through this argument when SCI_Initialize() is called instead of using global variables. SCI_Terminate() in a back end is blocking and internally waits for the SCI_Terminate() from the front end. The mode for both front end and back end can be either SCI_INTERRUPT or SCI_POLLING. The user code for a front end or a back end can choose its own mode without any dependency upon the mode chosen by the other end.

#define RST_SIZE 4096

void handler(void *user_param, sci_group_t group, void *buffer, int size)
{
	char *result = (char *)user_param;
	...
}

int main(int argc, char *argv[])
{
    sci_info_t info;
    int rc;
    char *result = (char *)malloc(RST_SIZE * sizeof(char)); 

    memset(&info, 0, sizeof(info));
    info.type = SCI_BACK_END;
    info.be_info.mode = SCI_INTERRUPT;
    info.be_info.hndlr = (SCI_msg_hndlr *)&handler;
    info.be_info.param = result;

    rc = SCI_Initialize(&info);
    ...
    rc = SCI_Terminate();
    free(result);
    return rc;
}

SCI Communication functions

Front end

In this code sample, the front end broadcast the message to all of the back ends.

    bufs[0] = msg;
    sizes[0] = strlen(msg) + 1;
    rc = SCI_Bcast(SCI_FILTER_NULL, SCI_GROUP_ALL, 1, bufs, sizes);

Back end

In this code sample, the back end uploads a message to the front end. SCI_GROUP_ALL here doesn’t have the same meaning as the front end; it is a tag here to bring some information in case user may need it. Any value is fine here in case the code doesn’t need it – briefly: ignore it. Both SCI_Bcast and SCI_Upload are thread-safe. This code sends the two ‘bufs’ concatenated as one message.

    int sizes[2];
    void *bufs[2];
    bufs[0] = &my_id;
    sizes[0] = sizeof(my_id);
    bufs[1] = result;
    sizes[1] = strlen(result) + 1;
    rc = SCI_Upload(SCI_FILTER_NULL, SCI_GROUP_ALL, 2, bufs, sizes);

Polling mode

SCI_Poll is intended for a single thread programming model. If SCI_INTERRUPT mode is used, the message handler is called in a separate thread internally in SCI; while if SCI_POLLING is used, the user code must use SCI_Poll to handle the messages because the message handler is called inside of the SCI_Poll. Once SCI_Poll returns, it means the handler has been called once and a message has been processed.

void handler(void *user_param, sci_group_t group, void *buffer, int size)
{
	...
}

int main()
{
    rc = SCI_Initialize(&info);
    
    do {
        rc = SCI_Poll(-1);
    } while (rc == SCI_SUCCESS);

    rc = SCI_Terminate();
}

Polling a file descriptor

There is also another scenario that users have a lot of file descriptors and they want to do a poll to see if there is data arriving, therefore, SCI provides a file descriptor which can be polled. But this file descriptor is only used for polling purposes, that is, if there is a message arriving, FD_ISSET(polling_fd, &fd_set) will return true, however, users can not read the messages from this fd, they should use SCI_Poll to handle or retrieve the message in the message handler instead.

    int polling_fd, max_fd, rc, rt;
    fd_set  fd_set;

    SCI_Query(POLLING_FD, &polling_fd);
    while (running) {
        FD_ZERO(&fdSet);
        FD_SET(polling_fd, &fd_set);
        ... /* add other file descriptors and compute max_fd */
        rc = select(max_fd+1, &fd_set, 0, 0, NULL); if (rc > 0) {
	if (FD_ISSET(polling_fd, &fd_set)) {
            rt = SCI_Poll(-1);
            ...
        }
        ... /* using FD_ISSET to test other fds */
    }

SCI Group manipulation functions

The group creating and operating functions can be called only in the front end and they are intended for users who want to create a set of back ends as a group.

Creating a new group

This code sample creates a new group even_group whose members (back ends) ids are even numbered back ends. Normally if users want to retrieve the group members, they should do a SCI_Query GROUP_MEMBER_NUM first to get the number of the members to ensure enough space can be allocated to store the members list before retrieving (SCI_Query GROUP_MEMBER).

    int even_size;
    int *even_list = NULL;
    int num_bes, even_group;
    int * evenlist_query_res; 
    rc = SCI_Query(NUM_BACKENDS, &num_bes);
    even_size = (num_bes + num_bes%2) / 2;
    even_list = (int *)malloc(sizeof(int) * even_size);
    for (i=0; I < even_size; i++) {
        even_list[i] = i*2;
    }
    rc = SCI_Group_create(even_size, even_list, &even_group);
    ...
    rc = SCI_Group_query(even_group, GROUP_MEMBER_NUM, &even_list_benum);
    ...
    evenlist_query_res = (int *)malloc(sizeof(int)*even_list_benum);
    rc = SCI_Group_query(even_group, GROUP_MEMBER, evenlist_query_res);

Operations on existing groups

In this example, group1 and group2 are existing groups, and group3 is the result group after the operation SCI_UNION, SCI_INTERSECTION, or SCI_DIFFERENCE. The names (UNION, INTERSECTION, and DIFFERENCE) are self-explanatory.

    sci_group_t group1, group2, group3;
    ...
    rc = SCI_Group_operate(group1, group2, SCI_UNION, &group3);
    ...
    rc = SCI_Group_operate(group1, group2, SCI_INTERSECTION, &group3);
    ...
    rc = SCI_Group_operate(group1, group2, SCI_DIFFERENCE, &group3);

Operations on an existing group and a set of back ends

In this example, group1 is an existed group and group2 is the result group after the operation SCI_UNION, SCI_INTERSECTION, or SCI_DIFFERENCE. The names (UNION, INTERSECTION, DIFFERENCE) are self-explanatory.

    sci_group_t group1, group2;
    int num_bes = 3;
    int be_list[3] = { 0, 3, 5 };
    SCI_Group_operate_ext(group1, num_bes, be_list, SCI_UNION, &group2)
    ...
    SCI_Group_operate_ext(group1, num_bes, be_list, SCI_INTERSECTION, &group2)
    ...
    SCI_Group_operate_ext(group1, num_bes, be_list, SCI_DIFFERENCE, &group2)

SCI Filter related functions

Filter plug-in

One or multiple filter plug-ins can be loaded in the front end or agents/embedded agents. filter_initialize() is called when the filter is loaded while filter_Terminate() is called when the filter is unloaded. Users can prepare the resources in filter_initialize() and set it to the user_param which can be delivered to filter_input and filter_terminate(). filter_input() is triggered when a message passes through whose target filter id matches the id for this filter. It is highly recommended not to use global variables in a filter especially in embedded agent mode. When using stand-alone agents, there is no harm in using a global variable. But if you are using embedded agents, which mean there may be multiple embedded agents in a back end, each embedded agent contains the same set of filters, so the sets of filters may potentially have race conditions when they are running in different embedded agents if you are using global variables.

extern "C" {

#define BUF_SIZE 4096

int filter_initialize(void **user_param)
{
	char *buffer = (char *)malloc(BUF_SIZE * sizeof(char));
	*user_param = buffer;
	return 0;
}

int filter_terminate(void *user_param)
{
    char *buffer = (char *) user_param;
    free(buffer);
    return 0;
}

int filter_input(void *user_param, sci_group_t group, void *buf, int size)
{
    void *bufs[1];
    int sizes[1];
    
    char *buffer = (char *) user_param;
    ...
    bufs[0] = buf;
    sizes[0] = size;
    rc = SCI_Filter_upload(SCI_FILTER_NULL, group, 1, bufs, sizes);
    ...

    return 0;
}
}

Assume this filter’s file name is filter.c, and then users can compile it with the commands

Linux:

gcc -fpic -shared -o filter.so filter.c 

AIX:

xlc -qmkshrobj -o filter.so filter.c

Loading filter plug-ins

    #define FILTER 1
    sprintf(fpath, "%s/filter.so", pwd);
    bzero(&filter_info, sizeof(filter_info));
    filter_info.filter_id = FILTER;
    filter_info.so_file = fpath;
    rc = SCI_Filter_load(&filter_info);
    ... /* load multiple filters if needed */
    rc = SCI_Filter_unload(FILTER);

Loading multiple filters during SCI_Initialize

Users can load a set of filters one by one through SCI_Filter_load as the previous example shows or users also can load bulk of filters once when calling SCI_Initialize(). The filters passed in the call to SCI_Initialize() are guaranteed to be loaded before any messages are transmitted.

    sprintf(fpath[0], "%s/downfilter.so", pwd);
    sprintf(fpath[1], "%s/upfilter.so", pwd);
    sprintf(fpath[2], "%s/upfiltera.so", pwd);
    sprintf(fpath[3], "%s/upfilterb.so", pwd);
    filter_info[0].filter_id = DOWN_FILTER;
    filter_info[0].so_file = fpath[0];
    filter_info[1].filter_id = UP_FILTER;
    filter_info[1].so_file = fpath[1];
    filter_info[2].filter_id = UP_FILTER_A;
    filter_info[2].so_file = fpath[2];
    filter_info[3].filter_id = UP_FILTER_B;
    filter_info[3].so_file = fpath[3];

    bzero(&info, sizeof(info));
    info.type = SCI_FRONT_END;
    info.fe_info.mode = SCI_INTERRUPT;
    info.fe_info.hostfile = hfile;
    info.fe_info.bepath = bpath;
    info.fe_info.hndlr = (SCI_msg_hndlr *)&handler;
    info.fe_info.param = NULL;
    info.fe_info.filter_list.num = 4;
    info.fe_info.filter_list.filters = filter_info;

    rc = SCI_Initialize(&info);

Messages transmitting into a filter

In any agent that contains the corresponding filters specified in the SCI_Bcast or SCI_Upload, The filter_input function will be called to handle this message. If any of the filters are not loaded in an agent, the message will be forwarded transparently to the next level until destination.

In front end:

    rc = SCI_Bcast(DOWN_FILTER, SCI_GROUP_ALL, 1, bufs, sizes);

In back end:

    rc = SCI_Upload(UP_FILTER, group, 2, bufs, sizes);

Cascaded filters when up streaming

Assume all the filters do nothing but simply deliver the same message to its target. That means, in any agent/embedded agent/front end, the message hops are:

Cascaded filters in upstream

When the target filter id is specified as SCI_FILTER_NULL, the message will be transmitted to the upper layer. SCI_Filter_upload() can be called only in a filter. The embedded agent works exactly the same way.

In back end:

    rc = SCI_Upload(UP_FILTER, group, 2, bufs, sizes);

In upfilter.c’s filter_input:

    rc = SCI_Filter_upload(UP_FILTER_A, group, 1, bufs, sizes);

In upfiltera.c filter_input:

    rc = SCI_Filter_upload(UP_FILTER_B, group, 1, bufs, sizes);

In upfilterb.c filter_input:

    rc = SCI_Filter_upload(SCI_FILTER_NULL, group, 1, bufs, sizes);

Filters when down streaming

Although SCI_FILTER_NULL is used, the message will still contain the original filter id information DOWN_FILTER when it arrives at the next layer’s agents. That means the filter DOWN_FILTER specified in SCI_Bcast will be used in the whole path. Unlike up streaming, the down streaming filters can not be cascaded. If users want to change the filter in one of the agents, it should specify another filter id other than SCI_FILTER_NULL, and then the filter_input of the new filter in the next hop will be triggered when the message arrives. Another point is the successor list; a SCI_Query is used here to retrieve the successors of this agent. Users can make a new list which must be part of this list. With this method, the destination can be changed in an agent.

In front end:

    rc = SCI_Bcast(DOWN_FILTER, SCI_GROUP_ALL, 1, bufs, sizes);

In downfilter:

int filter_input(void *user_param, sci_group_t group, void *buf, int size)
{
    int num_successors;
    int * successor_ids;

    rc = SCI_Query(NUM_SUCCESSORS, &num_successors);
    successor_ids = (int *)malloc(num_successors * sizeof(int *));
    rc = SCI_Query(SUCCESSOR_IDLIST, successor_ids);
    ...
    bufs[0] = buf;
    sizes[0] = size;
    rc = SCI_Filter_bcast(SCI_FILTER_NULL, num_successors, successor_ids, 1, bufs, sizes);
    ...
    return 0
}

SCI Dynamic add/remove back end

The functions SCI_BE_add and SCI_BE_remove are thread-safe and only can be called in the front end. A new back end with the be.id will be added upon the return of SCI_BE_Add(), and it will be removed when a SCI_BE_remove() is issued. The be.id must be assigned a non-existing back end id or -1 to specify that SCI will assign a back end id, in which case SCI allocates a valid id internally which will be stored in the input/output parameter be.id. The level means the back end will be launched by the agent whose level is >= level.

    sci_be_t be;

    rc = SCI_Query(NUM_BACKENDS, &num_bes);
    be.id = num_bes;
    be.hostname = hostname;
    be.level = 1;
    rc = SCI_BE_add(&be);
    ...
    rc = SCI_BE_remove(be.id)