Jump to: navigation, search

Difference between revisions of "OSEE/ReqAndDesign"

(Design)
(Rest Interface)
 
(88 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Logging ==
+
== Fast, highly relevant search  ==
 +
=== Requirements ===
 +
* no historical searches
 +
* history on name and url changes
 +
* history on search terms changes
 +
* access control
 +
 
 +
=== Design ===
 +
* Query Parser Steps
 +
# make search text lowercase
 +
# split search text into search terms by white space, allow quotes to enclose a search term that includes white space
 +
# discard single char terms that are not digits
 +
# transform words into singular form
 +
# discard stop words. see http://www.gobloggingtips.com/wp-content/uploads/2014/08/Google-stopwords.txt
 +
# discard duplicate terms
 +
# sort terms
 +
# compute query_hash with Arrays.deepHashCode() on the sorted terms from previous step
 +
 
 +
==== Data Model ====
 +
osee_search_url(BIGINT url_id, BIGINT type, Varchar(128) title, Varchar(2048) url, valid_test_time) PRIMARY KEY (match_id) ORGANIZATION INDEX
 +
:* Title - search terms will be extracted from the title using the query parser steps and stored with rank name
 +
:* url uniqueness is enforced via atomic insert command
 +
:: INSERT INTO osee_search_url m1 (match_id, name, url, valid_test_time) SELECT ?,?,?,? FROM dual WHERE NOT EXISTS (SELECT 1 From osee_search_url m2 WHERE m2.url = ?)
 +
:* maintenance operation to identify dead links.  valid_test_time is updated with current date and time when the link is tested as valid
 +
 
 +
osee_search_term(Varchar(128) text, INT rank, BIGINT term_group_id, term_group_size, BIGINT match_id) PRIMARY KEY (text, rank, match_id) ORGANIZATION INDEX COMPRESS 2
 +
:* match_id can be a osee_search_url.url_id or gamma_id or any other id
 +
:* the rank allows early exclusion of unwanted term types such as content with the sql criteria "type < 200"
 +
 
 +
:* Results are returned in ascending rank order of the sum of all ranks for a match_id
 +
:* search term rank types:
 +
::* exact unique id [20] (employee id, charge number, USPS Tracking number, etc.)
 +
::* keyword [100] terms that when searched together should represent a top result
 +
::* (meta) tag [120] terms applied by human intelligence to classify and categorize
 +
::* team [160]
 +
::* content [200] terms extracted from the content body using the query parser steps
 +
 
 +
osee_search_query(BIGINT query_hash, Varchar(128) query_text, integer status)
 +
:* CREATE INDEX OSEE_SEARCH_QUERY_TEXT_IDX ON OSEE_SEARCH_QUERY (QUERY_TEXT);
 +
:* query_hash is computed using the Query Parser Steps
 +
:* query_text exactly as entered
 +
:* status
 +
::* no results
 +
::* unverified results
 +
::* verified results
 +
:* search prediction based on query_text with past queries of user sorted first
 +
 
 +
osee_search_share(BIGINT query_hash, BIGINT from_user, BIGINT to_user, timestamp time)
 +
----
 +
osee_search_history(BIGINT item_id, user_id, timestamp)
 +
 
 +
==== Rest Interface ====
 +
:* Allow request to specify max number of links or all (default is 25). Top ranked n links are returned as json {id, name, url, search terms, rank}
 +
:* in order to set access control on a link you have to have the access to the group you are setting it to, you can add anyone else
 +
 
 +
==== User Interface ====
 +
When entering a new URL the name is initially populated from the web page title
 +
Search results will include a why link that provides the rank information and allows users to improve this information
 +
 
 +
== Fast, versioned tuple service  ==
 +
=== Requirements ===
 +
* Creation and deletion of tuples shall have transactional history
 +
* tuple operations shall be near constant time
 +
* shall efficiently support tuples of numbers and strings of length 2, 3, and 4
 +
==== Uses Cases ====
 +
* multiple tags per transaction with with history
 +
  osee_tuple3(tx_key_type, branch_id, tx_id, attr_id, gamma_id)
 +
 
 +
* cross branch linking
 +
  osee_tuple2(relation_type, a_art_id, b_art_id, gamma_id)
 +
 
 +
* user defined tables
 +
  osee_tuple2(BIGINT my_table_type, BIGINT row_id, String row_data, gamma_id)
 +
:* the row_id allows the rows to be returned in order, also allows showing history of a given row
 +
:* the row_data stores the json representing the row data
 +
 
 +
=== Design ===
 +
==== Data Model ====
 +
osee_tuple2 (tuple_type, element1, element2, gamma_id) PRIMARY KEY (tuple_type, element1, element2) ORGANIZATION INDEX COMPRESS 2;
 +
osee_tuple3 (tuple_type, element1, element2, element3, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3) ORGANIZATION INDEX COMPRESS 2;
 +
osee_tuple4 (tuple_type, element1, element2, element3, element4, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3, element4) ORGANIZATION INDEX COMPRESS 3;
 +
unique index: gamma_id
 +
* gamma_id provides for branching and history
 +
* elements can be of type number or string.  A string is stored as a key to the osee_key_value table
 +
 
 +
== Product Line Engineering ==
 +
=== Requirements ===
 +
* User can quickly select branch-view without requiring knowledge of corresponding version of product line.  Select by branch-view name (i.e. Country X San) which is short hand for selecting My Product Line 2.0 branch with Country X view.
 +
* View applicability has history and can vary independently by branch.
 +
* External tools can join on view applicability in their sql queries
 +
* need list of branch-view to show in branch selection dialogs
 +
 
 +
=== Design ===
 +
==== Data Model ====
 +
* item applicability is fully transactional on all branches
 +
 
 +
osee_txs (branch_id, transaction_id, gamma_id, tx_current, mod_type, app_id)
 +
 
 +
Default value for app_id in osee_txs is 1 which means in all views (base).  View artifacts are on the product line branches.
 +
 
 +
* view applicability
 +
osee_tuple2 (CoreTupleTypes.ViewApplicability, ArtifactId view, String applicabilityText, gammaId)
 +
 
 +
The view applicability tuples and view artifacts are on the product line branches thus maintaining full transactional history.  The applicability text is the feature applicability text or the name of the of the view for non product line engineering use of views.  Each view will be mapped to applicability id 1 (Base) so that base will be included for all configs by simply joining on the view_id.  The featureApplicabilityId is the randomly generated key to applicabilityText in osee_key_value.
 +
 
 +
* named branch-views
 +
osee_tuple3 (CoreTupleTypes.BranchView, BranchId branch, ArtifactId view, String name, BIGINT gammaId)
 +
 
 +
Named branch-views are stored as tuples on the common branch.
 +
 
 +
* feature applicability definition
 +
osee_tuple4 (app_feature_tuple_type, app_id, feature_id, value_attr_id, and_or_flag, gamma_id)
 +
 
 +
The Feature artifact has an attribute for each possible value.  It also has a name, description, abbreviation, and type (single or multi-valued).
 +
 
 +
osee_tuple3 (compound_app_tuple_type, app_id, app_id, and_or_flag, gamma_id)
 +
 
 +
* feature applicability involving more than one feature is stored as a list of feature applicability (recursive to any depth).
 +
<source lang="sql">
 +
-- return all items (referenced by gamma_id) currently in a view of a branch
 +
select * from osee_txs txs, osee_tuple2 app where branch_id = ? and tx_current = 1 and txs.app_id = app.e2 and e1 = ?;
 +
-- applicability clause in SQL is only applied when using a branch view
 +
-- select applicability for a given view (e1)
 +
select e2, value from osee_txs txs, osee_tuple2 app, osee_key_value where tuple_type = 2 and e1 = ? and app.gamma_id = txs.gamma_id and branch_id = ? and tx_current = 1 and e2 = key;
 +
</source>
 +
 
 +
== Activity Logging and Monitoring ==
 
=== Requirements ===
 
=== Requirements ===
 
* shall handle creation/update of fine-grained log entries for at least 500 concurrent users
 
* shall handle creation/update of fine-grained log entries for at least 500 concurrent users
Line 11: Line 137:
  
 
=== Design ===
 
=== Design ===
* Data
+
==== Data Model ====
* Log entry in DB: entry_id, parent_id, type_id, time, duration, account_id, status, details in JSON format
+
* osee_activity db table
* Log entry in Java: long entryId, long parentId, long typeId, long time, long duration, long agentId, long status, String details
+
:* Log entry in Java: BIGINT entryId, BIGINT parentId, BIGINT typeId, BIGINT startTime, BIGINT duration, BIGINT agentId, BIGINT status, String msgArgs
** entry_id - random long returned for log method call
+
:* entry_id - random long returned for log method call
** parent_id - id of entry used for grouping this and related entries. Is negative for root entries and is the id of source of the entry client or server random id.  Ranges are used to group by client/server kind (IDE client, app server, rest client).
+
:* parent_id - id of entry used for grouping this and related entries. For root entries, it is the negative of session id of the client or the server id.  Ranges are used to group by client/server kind (IDE client, app server, rest client).
** type_id - foreign key to type_id in osee_log_type table
+
:* type_id - foreign key to type_id in osee_log_type table
** time- long with ms since epoch
+
:* start_time - long with ms since epoch
** duration - starts at -1 and is never updated if duration does not apply, otherwise updates when the associated job ends with duration in ms
+
:* duration - starts at -1 and is never updated if duration does not apply, otherwise updates when the associated job ends with duration in ms
** account_id - long account id (the account_id returned from account management services
+
:* account_id - long account id (the account_id returned from account management services
** Status:
+
:* status:
 
   0    initial value
 
   0    initial value
 
   1-99  percent complete
 
   1-99  percent complete
 
   100  completed normally
 
   100  completed normally
 
   101  completed abnormally
 
   101  completed abnormally
* Each new log entry's parent_id, agent_id is mapped to the thread that created it (only the most recent mapping per thread is maintained)
+
:* msg_args newline separated list of strings used with String.format(msg_format, msg_args);
When an exception is thrown, it is logged as a child of the parent corresponding to the current thread.  If no mapping is found  
+
* Each new log entry's parent_id, agent_id is mapped to the thread that created it (only the most recent mapping per thread is maintained)
ConcurrentHashMap<Thread, Pair<Long, Long>>()
+
* When an exception is thrown, it is logged as a child of the parent corresponding to the current thread.  If no mapping is found in ConcurrentHashMap<Thread, Pair<Long, Long>>()
  
 
* Log entry type in DB: type_id, log_level, software_unit, message_format
 
* Log entry type in DB: type_id, log_level, software_unit, message_format
 
** type_id - a fine-grained application defined type, random id, defined as tokens and stored in the db for cross application support
 
** type_id - a fine-grained application defined type, random id, defined as tokens and stored in the db for cross application support
 
** log_level - as defined by java.util.logging.Level
 
** log_level - as defined by java.util.logging.Level
** software_unit - application defined name of the software unit that uses this log entry type
+
** module - application defined name of the software unit that uses this log entry type
** message_format - format defined by [http://docs.oracle.com/javase/6/docs/api/java/util/Formatter.html java.util.Formatter]
+
** msg_format - format defined by [http://docs.oracle.com/javase/6/docs/api/java/util/Formatter.html java.util.Formatter] or if blank the raw message details are used directly
 
+
* high performance
+
** 2 ConcurrentHashMap are allocated with an initial configurable size: newLogEntires, updatedEntries
+
** newly created log entries are added to newLogEntires using the entry_id as the key and the array of sql insert parameters as the value
+
** updated log entries are checked for in newLogEntires and updated if they exist, otherwise the update map is checked and updated if exists, else added to updatedEntries
+
** A timer tasks runs at a configurable (short) periodic rate and batch inserts the log entires in the insert map and then runs the updates. This means that any update to a log entry that occurs in less than this configured time will not require a database update (i.e. writing the duration of a short operation).  This also means only one thread writes to the log table per JVM.
+
** new DrainingIterator(newLogEntires.values().iterator()) is used to iterate through the values and remove them one at a time during the batch insert
+
** upon server shutdown must flush log
+
** IDE client will directly use the same service that is used on the server
+
** [http://stackoverflow.com/questions/8203864/the-best-concurrency-list-in-java data structure options]
+
** [http://www.precisejava.com/javaperf/j2ee/JDBC.htm Optimize JDBC Performance]
+
  
 +
==== High Performance ====
 +
* 2 ConcurrentHashMap are allocated with an initial configurable size: newLogEntires, updatedEntries
 +
* newly created log entries are added to newLogEntires using the entry_id as the key and the array of sql insert parameters as the value
 +
* updated log entries are checked for in newLogEntires and updated if they exist, otherwise the update map is checked and updated if exists, else added to updatedEntries
 +
* A timer tasks runs at a configurable (short) periodic rate and batch inserts the log entries in the insert map and then runs the updates. This means that any update to a log entry that occurs in less than this configured time will not require a database update (i.e. writing the duration of a short operation).  This also means only one thread writes to the log table per JVM.
 +
* new DrainingIterator(newLogEntires.values().iterator()) is used to iterate through the values and remove them one at a time during the batch insert
 +
* upon server shutdown must flush log
 +
* IDE client will directly use the same service that is used on the server
 +
* [http://stackoverflow.com/questions/8203864/the-best-concurrency-list-in-java data structure options]
 +
* [http://www.precisejava.com/javaperf/j2ee/JDBC.htm Optimize JDBC Performance]
  
 +
==== Java API ====
 
<source lang="java">
 
<source lang="java">
 
  Long createThreadEntry(long userId, Long typeId);
 
  Long createThreadEntry(long userId, Long typeId);

Latest revision as of 19:53, 21 July 2016

Fast, highly relevant search

Requirements

  • no historical searches
  • history on name and url changes
  • history on search terms changes
  • access control

Design

  • Query Parser Steps
  1. make search text lowercase
  2. split search text into search terms by white space, allow quotes to enclose a search term that includes white space
  3. discard single char terms that are not digits
  4. transform words into singular form
  5. discard stop words. see http://www.gobloggingtips.com/wp-content/uploads/2014/08/Google-stopwords.txt
  6. discard duplicate terms
  7. sort terms
  8. compute query_hash with Arrays.deepHashCode() on the sorted terms from previous step

Data Model

osee_search_url(BIGINT url_id, BIGINT type, Varchar(128) title, Varchar(2048) url, valid_test_time) PRIMARY KEY (match_id) ORGANIZATION INDEX
  • Title - search terms will be extracted from the title using the query parser steps and stored with rank name
  • url uniqueness is enforced via atomic insert command
INSERT INTO osee_search_url m1 (match_id, name, url, valid_test_time) SELECT ?,?,?,? FROM dual WHERE NOT EXISTS (SELECT 1 From osee_search_url m2 WHERE m2.url = ?)
  • maintenance operation to identify dead links. valid_test_time is updated with current date and time when the link is tested as valid
osee_search_term(Varchar(128) text, INT rank, BIGINT term_group_id, term_group_size, BIGINT match_id) PRIMARY KEY (text, rank, match_id) ORGANIZATION INDEX COMPRESS 2
  • match_id can be a osee_search_url.url_id or gamma_id or any other id
  • the rank allows early exclusion of unwanted term types such as content with the sql criteria "type < 200"
  • Results are returned in ascending rank order of the sum of all ranks for a match_id
  • search term rank types:
  • exact unique id [20] (employee id, charge number, USPS Tracking number, etc.)
  • keyword [100] terms that when searched together should represent a top result
  • (meta) tag [120] terms applied by human intelligence to classify and categorize
  • team [160]
  • content [200] terms extracted from the content body using the query parser steps
osee_search_query(BIGINT query_hash, Varchar(128) query_text, integer status)
  • CREATE INDEX OSEE_SEARCH_QUERY_TEXT_IDX ON OSEE_SEARCH_QUERY (QUERY_TEXT);
  • query_hash is computed using the Query Parser Steps
  • query_text exactly as entered
  • status
  • no results
  • unverified results
  • verified results
  • search prediction based on query_text with past queries of user sorted first
osee_search_share(BIGINT query_hash, BIGINT from_user, BIGINT to_user, timestamp time)

osee_search_history(BIGINT item_id, user_id, timestamp)

Rest Interface

  • Allow request to specify max number of links or all (default is 25). Top ranked n links are returned as json {id, name, url, search terms, rank}
  • in order to set access control on a link you have to have the access to the group you are setting it to, you can add anyone else

User Interface

When entering a new URL the name is initially populated from the web page title

Search results will include a why link that provides the rank information and allows users to improve this information

Fast, versioned tuple service

Requirements

  • Creation and deletion of tuples shall have transactional history
  • tuple operations shall be near constant time
  • shall efficiently support tuples of numbers and strings of length 2, 3, and 4

Uses Cases

  • multiple tags per transaction with with history
 osee_tuple3(tx_key_type, branch_id, tx_id, attr_id, gamma_id)
  • cross branch linking
 osee_tuple2(relation_type, a_art_id, b_art_id, gamma_id)
  • user defined tables
 osee_tuple2(BIGINT my_table_type, BIGINT row_id, String row_data, gamma_id)
  • the row_id allows the rows to be returned in order, also allows showing history of a given row
  • the row_data stores the json representing the row data

Design

Data Model

osee_tuple2 (tuple_type, element1, element2, gamma_id) PRIMARY KEY (tuple_type, element1, element2) ORGANIZATION INDEX COMPRESS 2;
osee_tuple3 (tuple_type, element1, element2, element3, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3) ORGANIZATION INDEX COMPRESS 2;
osee_tuple4 (tuple_type, element1, element2, element3, element4, gamma_id) PRIMARY KEY (tuple_type, element1, element2, element3, element4) ORGANIZATION INDEX COMPRESS 3;
unique index: gamma_id
  • gamma_id provides for branching and history
  • elements can be of type number or string. A string is stored as a key to the osee_key_value table

Product Line Engineering

Requirements

  • User can quickly select branch-view without requiring knowledge of corresponding version of product line. Select by branch-view name (i.e. Country X San) which is short hand for selecting My Product Line 2.0 branch with Country X view.
  • View applicability has history and can vary independently by branch.
  • External tools can join on view applicability in their sql queries
  • need list of branch-view to show in branch selection dialogs

Design

Data Model

  • item applicability is fully transactional on all branches
osee_txs (branch_id, transaction_id, gamma_id, tx_current, mod_type, app_id)

Default value for app_id in osee_txs is 1 which means in all views (base). View artifacts are on the product line branches.

  • view applicability
osee_tuple2 (CoreTupleTypes.ViewApplicability, ArtifactId view, String applicabilityText, gammaId)

The view applicability tuples and view artifacts are on the product line branches thus maintaining full transactional history. The applicability text is the feature applicability text or the name of the of the view for non product line engineering use of views. Each view will be mapped to applicability id 1 (Base) so that base will be included for all configs by simply joining on the view_id. The featureApplicabilityId is the randomly generated key to applicabilityText in osee_key_value.

  • named branch-views
osee_tuple3 (CoreTupleTypes.BranchView, BranchId branch, ArtifactId view, String name, BIGINT gammaId)

Named branch-views are stored as tuples on the common branch.

  • feature applicability definition
osee_tuple4 (app_feature_tuple_type, app_id, feature_id, value_attr_id, and_or_flag, gamma_id)

The Feature artifact has an attribute for each possible value. It also has a name, description, abbreviation, and type (single or multi-valued).

osee_tuple3 (compound_app_tuple_type, app_id, app_id, and_or_flag, gamma_id)
  • feature applicability involving more than one feature is stored as a list of feature applicability (recursive to any depth).
-- return all items (referenced by gamma_id) currently in a view of a branch
SELECT * FROM osee_txs txs, osee_tuple2 app WHERE branch_id = ? AND tx_current = 1 AND txs.app_id = app.e2 AND e1 = ?;
-- applicability clause in SQL is only applied when using a branch view
-- select applicability for a given view (e1)
SELECT e2, VALUE FROM osee_txs txs, osee_tuple2 app, osee_key_value WHERE tuple_type = 2 AND e1 = ? AND app.gamma_id = txs.gamma_id AND branch_id = ? AND tx_current = 1 AND e2 = KEY;

Activity Logging and Monitoring

Requirements

  • shall handle creation/update of fine-grained log entries for at least 500 concurrent users
  • shall support logging by OSEE and other applications
  • the web of log entries related to an individual instance of a user request shall be able to be hierarcically related
  • log entries shall be quickly accessible based on any combination of source, user, timestamp, log type, duration, status
  • log entries shall be accessible (especially) when an application server is unresponsive
  • log entries shall be available until they are deleted by an admin or admin policy (applied by server automatically)
  • at run-time logging shall be enabled/disabled based on any combination of user, source, log level, and type
  • access control shall be applied at the log entry type basis

Design

Data Model

  • osee_activity db table
  • Log entry in Java: BIGINT entryId, BIGINT parentId, BIGINT typeId, BIGINT startTime, BIGINT duration, BIGINT agentId, BIGINT status, String msgArgs
  • entry_id - random long returned for log method call
  • parent_id - id of entry used for grouping this and related entries. For root entries, it is the negative of session id of the client or the server id. Ranges are used to group by client/server kind (IDE client, app server, rest client).
  • type_id - foreign key to type_id in osee_log_type table
  • start_time - long with ms since epoch
  • duration - starts at -1 and is never updated if duration does not apply, otherwise updates when the associated job ends with duration in ms
  • account_id - long account id (the account_id returned from account management services
  • status:
 0     initial value
 1-99  percent complete
 100   completed normally
 101   completed abnormally
  • msg_args newline separated list of strings used with String.format(msg_format, msg_args);
  • Each new log entry's parent_id, agent_id is mapped to the thread that created it (only the most recent mapping per thread is maintained)
  • When an exception is thrown, it is logged as a child of the parent corresponding to the current thread. If no mapping is found in ConcurrentHashMap<Thread, Pair<Long, Long>>()
  • Log entry type in DB: type_id, log_level, software_unit, message_format
    • type_id - a fine-grained application defined type, random id, defined as tokens and stored in the db for cross application support
    • log_level - as defined by java.util.logging.Level
    • module - application defined name of the software unit that uses this log entry type
    • msg_format - format defined by java.util.Formatter or if blank the raw message details are used directly

High Performance

  • 2 ConcurrentHashMap are allocated with an initial configurable size: newLogEntires, updatedEntries
  • newly created log entries are added to newLogEntires using the entry_id as the key and the array of sql insert parameters as the value
  • updated log entries are checked for in newLogEntires and updated if they exist, otherwise the update map is checked and updated if exists, else added to updatedEntries
  • A timer tasks runs at a configurable (short) periodic rate and batch inserts the log entries in the insert map and then runs the updates. This means that any update to a log entry that occurs in less than this configured time will not require a database update (i.e. writing the duration of a short operation). This also means only one thread writes to the log table per JVM.
  • new DrainingIterator(newLogEntires.values().iterator()) is used to iterate through the values and remove them one at a time during the batch insert
  • upon server shutdown must flush log
  • IDE client will directly use the same service that is used on the server
  • data structure options
  • Optimize JDBC Performance

Java API

 Long createThreadEntry(long userId, Long typeId);
 
 Long createThreadEntry(long userId, Long typeId, long parentId);
 
 Long createEntry(Long typeId, Object... messageArgs);
 
 Long createEntry(Long typeId, Long parentId, Object... messageArgs);
 
 void updateEntry(Long entryId, Long status);
 
 Long createExceptionEntry(Throwable throwable);
  • The first interface to the logging data can be the basic REST navigation

Exception Handling

Requirements

  • avoid unnecessary wrapping of exceptions

Design

Checked exceptions I love you, but you have to go Why should you use Unchecked exceptions over Checked exceptions Clean Code by Example: Checked versus unchecked exceptions

  • Use application specific exceptions that extend RuntimeException - application specific allows for setting exception breakpoints in the debugger
  • Do not declare any run-time exceptions in any method signatures