Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: for the plan.

Jump to: navigation, search

Persistence in ICE

This article is concerned with discussing and evaluating how to implement a database and XML schemas in the ICE environment for various configuration files and other pieces of data necessary to be stored for various server and local environments. To start, we will focus on the creation of a database prototype by discussing the various types of requirements, listing several databases, the implementation of the code, and the tests for the task. The prototype will then evaluate the XML file necessity, the requirements, and list an example schema.

If you haven’t done so, please read the About ICE page before continuing. It will give you better insight to the task below.

Database Requirements

  • Free for Open Source applications (GPL license or something equivalent)
  • Encryption – database side and transmission
  • Full support of SQL 2003 and up
  • Fast for small and large databases (versatility)
  • Unicode support
  • Multi-platform support within JVM
  • Multiple database support for a single server
  • Vast popularity
  • Java compliant
  • Datasource API
  • Standalone Server
  • Tried and Tested
  • ACID Compliant
  • Multiple DataTypes
  • Write to DISC support

Secondary list - not requirements

  • User Access Controls
  • Multiple readers, single writer.
  • Multiple connections
  • Free for commercial use or “cheap” purchase cost
  • Strongly Typed

Database Comparisons

The following databases are considered researched:

  • Berkeley
  • Derby
  • SQLite
  • PosGreSQL
  • H2
  • DB2: Express C Edition
  • MySQL: innoDB


“Berkeley DB Java Edition (JE) is a pure Java database. Its design resembles that of Berkeley DB without replicating it exactly, and has a feature set that includes many of those found in the traditional Berkeley DB and others that are specific to the Java Edition. Since it is written in pure Java, no native code is required. It has a log structured storage architecture, which gives it different performance and concurrency characteristics.” – wiki on Berkeley DB.

Based on this information, it fulfills a few requirements. First off, it is open-source under the SleepyCat Public License and can handle large databases with much greater ease compared to other databases out there. It can support multiple operating systems because it runs in a JVM. Nevertheless, the biggest fault is it is not standalone and is initially required to run in the same application as the program.

So Berkeley DB Java Edition is ruled out.

Apache Derby

“Apache Derby (previously distributed as IBM Cloudscape) is a relational database management system (RDBMS) developed by the Apache Software Foundation that can be embedded in Java programs and used for online transaction processing. It has a 2 MB disk-space footprint.” – wiki on Derby

Once again, this database can be embedded in Java programs. On the positive side, it can be hosted locally and over the network. It supports basic Encryption and is popular within the coding community. Nevertheless, one of the major requirements in making our choice of the database is the speed, and as it turns out there are many other databases that can be faster than Derby.

“In terms of execution speed, Derby has not performed well in some comparisons to other embedded SQL databases such as the also free and open source H2 database.” – wiki on Derby

When there are over several million elements in a database, speed is key. Therefore, Derby is not a suitable database for ICE.


“SQLite is an ACID-compliant embedded relational database management system contained in a small C programming library. SQLite implements most of the SQL standard, using a dynamically and weakly typed SQL syntax that does not guarantee the domain integrity. In contrast to other database management systems, SQLite is not a separate process that is accessed from the client application, but an integral part of it. SQLite read operations can be multitasked, though writes can only be performed sequentially.” – wiki on SQLite

SQLite is free. It is ACID-compliant, and it is written in a very quick language. Many webservers use it, and many people are happy with it. Nevertheless, it falls into the same problem as Berkeley since there are issues of no remote access. The worst part is it does not have native Encryption support, which can be dangerous if the data is very sensitive.


“PostgreSQL, often simply Postgres, is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Microsoft Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software. PostgreSQL is developed by the PostgreSQL Global Development Group, consisting of a handful of community volunteers employed and supervised by companies such as Red Hat. It implements the majority of the SQL:2008 standard,[6] is ACID-compliant, is fully transactional (including all DDL statements), has extensible data types, operators, and indexes, and has a large number of extensions written by third parties.” – wiki on PosGresSQL

PostgreSQL is an open-source database with multiple operating system support, SQL standard 2008, and is vastly popular. It is also ACID-compliant, has third-party support, and works with java. PostgreSQL has major support from the community and has undergone various revisions over the past several years. It is a great choice for using as a database.


“H2 is a relational database management system written in Java. It can be embedded in Java applications or run in the client-server mode. The disk footprint (size of the jar file) is about 1 MB. The software is available as open-source software under modified versions of the Mozilla Public License or the original Eclipse Public License. The modification of the MPL is a shorter file header and the license name.” – wiki on H2

H2 is a database system created off the ideas of Hypersonic. It supports various forms of encryption, multiple platforms, is written in java, and is free for open-source use. The downside of this system is that it is not completely acid-compliant (missing the D) which can be vital to any system. Something to note, on the H2 website it mentions that POSTGRESQL does not support any type of encryption. This is bothersome, as many other websites (including POSTGRESQL website) state otherwise.

DB2 Express C version

“The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix (namely AIX), Linux, IBM i (formerly OS/400), z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions. Alongside DB2 is another RDBMS: Informix, which was acquired by IBM in 2001.” – wiki on IBM DB2

DB2 is another very popular database out on the market today. It has many features of encryption, java interfacing, and is multi-platform. It has various tools to make programming life easier, including SQL returns on execution in a database. Finally, it is ACID-compliant. Another good database to use with ICE, especially since its free for open source applications. The only problem is it is not fully multi-threaded and limits ram usage in the free version. At today's standards, 2 GB of RAM and maximum duo core support is almost like a minimum requirement for a database.

MySQL: innoDB

“MySQL is a relational database management system (RDBMS that runs as a server providing multi-user access to a number of databases. The MySQL development project has made its source code available under the terms of the GNU General Public License, as well as under a variety of proprietary agreements. Free-software-open source projects that require a full-featured database management system often use MySQL. For commercial use, several paid editions are available, and offer additional functionality. Applications which use MySQL databases include: TYPO3, Joomla, WordPress, phpBB, Drupal and other software built on the LAMP software stack. MySQL is also used in many high-profile, large-scale World Wide Web products, including Wikipedia, Google (though not for searches), Facebook, and Twitter.” – wiki on MySQL

MySQL is one of the most renown databases out there. It supports various types of SQL, it is free under the GNU license, and has various types of encryption and multiple operating system support. It is vastly popular and has many expandable tools to make databasing easier. A very strong contender for what the project might need in its foreseeable future.

Database of Choice

ICE development is based on MySQL. Future releases will address ICE database plugins for PostgreSQL and IBM DB2.

Design and Implementation

Once a database is chosen, then a wrapper will need to be created in order to connect the database to ICE. In essence, it will just be an interface used to link ICE to a database and handle the requests as necessary. Since ICE is agile and adaptive (the requirements are always changing and ICE is done iteratively), the code has to be modular enough to accept new types of databases. Keep in mind that there are several layers in order to handle these types of requests, so one will create just a direct piece of code for ICE to handle. It will initially not have all the functionality of database manipulation, but it will handle the simple requests of insertions, edits, deletions, and returns of the data to the user.

Since the program may be over an unsecured network and/or the project may be confidential, the database needs to consider encryption on the database side and via connection from the database to the program. Even if the program and database share the same server, there are ways of sniffing around computers that are used in the cloud.

For the insert, edit, and delete calls, there will be a strictly enforced no SQL injection policy to protect the database and to make sure the edits to the database are correct. In other words, ICE sends the database wrapper (DatabaseManager) an edited object with the information needing to be stored on the server (DatabaseItem), and the DatabaseManager handles the SQL insertion calls. By having the data in a passed DatabaseItem object, the program can check it for integrity before making those respective changes to the database.

Even if the database of choice is weakly typed, the DatabaseManager can still enforce a strongly-typed methodology this way. Nevertheless, the operation to obtain items from the database will be through restrictive SQL requests via a get() method in the DatabaseManager object. It will return an arraylist of DatabaseItem(s). It will be necessary to make sure there is no stack overflow due to the database being bigger than the memory AND to come up with a strategy to iterate over lists too big for the size “arraylist buffer”.

The prototype will also allow the ICE program to connect to MULTIPLE databases within one server. This is important because some projects might have specific requirements or our requirements themselves might change. I also considered making DatabaseItem abstract since there might be multiple types of data that may need to be stored on the database. If the ICE program needs to connect to multiple servers, then the program can make multiple instances of the prototype.

Finally, DatabaseManager will need to be able to handle the opening and closing of those connections to various databases and the server itself. Plus, it will need to be able to softly notify the program when it cannot connect to the database or encounters errors on SQL calls and insertion/edits/deletes, along with all of the other juicy goodness of problems that can go wrong with connections to a database from a separate program.

As a final note, one will need to make sure to apply several design patterns to the code. Factory pattern and strategy pattern are good places to start.

DriveManager versus DataSource API

For versatility reasons, this prototype will use the DataSource API for connecting to a database in java. There are several extra features, including datapooling, that will help out for future iterations of ICE.


Insert name of JDBC driver for X database in use with version number and references to author as necessary.

Testing architecture

The testing architecture will be broken up into several stages beginning with TDD. In Test Driven Development (TDD), the test code will have to be written first before one can begin implementing any of the source code for the actual program. The first stage of the testing will consist of having a TestDatabase on the server running. From there, one will go through the process of unit testing each method in each class for correct and incorrect uses. These tests will remain as a person continues to deliver more iterations of this prototype towards ICE. Once TDD is finished, there will be several types of databases created for the DatabaseManager to be tested on. These are more of “integration” tests for the DatabaseManager to commit itself to in order to ensure quality work. For now, I have devised 3 types of databases for it to be tested on.

  • TestDatabase1 will consist of a correct database implementation.
  • TestDatabase2 will consist of incorrect data in the database. It will have missing fields and may even contain several completely incorrect pieces of data. The real test on this database is to make sure the DatabaseManager completes these tests, notifying ICE of potential hazards but continuing to functioning nominally.
  • TestDatabase3 will be something implemented last and possibly after several iterations. It will be a database used to flood the server with “benchmarking” like tests within certain constraints. Overtime, this will show the performance numbers of ICE as it works with large amounts of data. If there are major dips or speed ups when going through the TestDatabase3, then these will be noted and recorded with time stamps to help narrow down possible positive or negative effects in the code (and also help narrow down what changes were made to reflect the changes in the database).

If and when it is integrated into ICE, these will be apart of the tests that will run nightly.

More types of tests will be added in the near future. These include sudden disconnections from database to program and ACID compliance testing.


ICE needs the ability to parse XML files for specific types of profiles within the program. The goal of this task is to produce a prototype for iterating and creating XML files for ICE. The prototype will be programmed in java with the modeling help of RSA. Schemas will help regulate the flow of data inside the XML files to the program to check for data integrity and correctness.

Since there will be many different configuration settings for the various possible simulation and models within ICE, having different XML profiles for each model and simulator is key to differentiating various conditions between them. Therefore, ICE needs to have an assortment of types of profiles. These profiles are broken down into simulator profiles and model profiles. The simulation profiles should define the computing platform and execution string; whereas, the model profile should define the simulation itself, such as input files and model setups. These settings should be portable between machines so that their settings can be transferred to different types of machines.

Introduction to XML

XML, or eXtensible Markup Language, is a markup language that is a flexible way to design documents that is readable by machines. (1) XML has the ability to break down documents into a tree like structure for easier formatting, separation of concerns, and gives the ability to programming languages, like Java, to create parsers that can iterate over the document in a clean and effective manner. Standard text files would require extra overhead of code in order to effectively parse a document compared to an XML Document parser.

XML files also have the ability to be compared to a specific type of standardization by creating schemas. These schemas can be compared to the XML files for quality assurance, thereby helping to promote document security and data integrity. Standard text files would need more code overhead to compare each piece of information; whereas, in languages in Java, the XML file can be compared to a schema for validation [7]. Each schema can contain limitations on the structure and ordering through sequences, tags and manipulation of documents through XSD indications. (1)(3)(4)(12).

Finally, XML files are a great way to store small pieces of portable information with some extra overhead compared to a standard text file. XML files, combined with their validations within schemas, are easily portable to multiple types of programs and different types of programming languages. (9)


The schema for this prototype will need to contain many various attributes of XSD indicators in order to give a maximum appeal. It needs to have sub elements, min/max occurrences, restrictions, and optional restrictions. For this particular area of study, I will borrow a schema from the main IBM website with some slight modifications. It exposes many areas of XSD without overbearing the beginning user with excess details. Including what is stated above; the listed schema also contains complex types, simple elements, and a simple element that is also a complex type (<name>). Even so, it does not have all of the order indicators or group indicators. For more information on the address schema, please see Appendix A for details and Appendix B for examples of valid XML documents to the schema.


Java API for XML Processing, or JAXP, is a parser used to manipulate XML files in the java programming language (11). JAXP allows a program to parse a stream of data using the simple API for XML Parsing (SAX) and the Document Object Model (DOM). JAXP also supports the use of Extensible Stylesheet Language Transformations (XSLT) standard, giving a program full control over the presentation on the streaming of data and allow the ease of transforming XML documents into other formats. Finally, it allows provides the ability to use namespaces so that the program can use DTDs that might have naming conflicts. (11).

XML Prototype Design

The XML Prototype, known as XMLDocumentManager, will be written as a wrapper for the XML DOM parser with JAXP. The JAXP DOM parser loads the XML document into memory for access, returning a reference to the Document for data manipulation. (5)(6) From there, the prototype will need to be able to compare the Document reference to a schema in order to check for data integrity, and to be able to close the Document when finished. The prototype will need to be able to create a document given a file path and load a document when given either a stream or a XML file [8]. Keep in mind that the schema will be declared in the program, therefore overriding the XML document’s declaration of a schema. (12) Finally, the prototype will need to be able to be integration tested with a manipulator class, called XMLAddressDocManipulator, which correlates to the schema (see Appendix A) and XML documents (see Appendix B and Appendix C).

Breakdown of methods in XMLDocumentManager:

Operation Arguments Description
load filepath: String, schemaPath: String returns a reference to a Document given a XML filepath and XSD Schema.
load filepath: String, schemaPath: String returns a reference to a Document given a XML Stream.
writeXMLFile document:Document, filePath: String creates, transforms and closes a created Document. Returns 0 if successful, throws an exception if it failed.
createDocumentParser schemaPath: String returns an empty reference to a Document.

Note that schemaFile is listed as an attribute. It is set in createDocumentParser, loaded by schemaPath, and used in writeToXML. Keep in mind that you can avoid schema validation in writeToXML by not calling createDocumentParser() or load(). Logical Architecture for XMLDocumentManager

Breakdown of methods in XMLAddressDocManipulator:

Operation Arguments Description
addressWrite map: HashMap, filePath: String, schemaPath: String returns a reference to a Document given a XML filepath and XSD Schema.
addressLoad stream: Stream, schemaPath: String returns a reference to a Document given a XML Stream.
addressLoad document:Document, filePath: String Creates, transforms and closes a created Document. Returns 0 if successful, throws an exception if it failed.
MLAddressDocManipulator none A constructor.
getTextFromElementsChild doc:Document, element:Element, item:String, text:String Returns text from the child item within element
append parent: Element, child: String Appends a child to a parent element. Returns new element of child.

To reiterate over functionality within XMLAddressDocManipulator to XMLDocumentManager:

  • addressLoad() function will call load() in XMLDocumentManager.
  • write() function will call createDocumentParser() and writeToXML() in XMLDocumentManager.

Implementing these pieces will be through Test Driven Development. The tests will be written first, then the actual source code.

Testing Design

The testing will be broken down into stages. The first stage will be unit testing the methods within the XMLDocumentManager class. This will include a series of schema validation tests, which will be discussed in more detail later. After that, the XMLDocumentManager integration tests will consist of using the XMLAddressDocManipulator class to utilize all the toolsets within XMLDocumentManager class. The types of tests will be discussed in the following sections. Keep in mind that the validateFiles() is not specifically oriented at testing the functionality of the methods within the testee classes. It is used to make sure the files required for the tests exist.

XMLDocumentManager TestLogicalArchitecture.png

Testing XML Prototype: XMLDocumentManager

This section will discuss the unit testing for XMLDocumentManager. Keep in mind that any exceptions caught will be flagged and the operations should return. References to schema is in Appendix A. Valid XML files are kept in Appendix B while invalid XML files to schema reside in Appendix C.

The load(filePath: String, schemaPath: String): Document operations should throw an exception if they are not test “1”

Table 1: load(filePath, schemaPath)

Test ID Description
Test 1 Passing a valid filename (Sample1.xml) with a valid XML file (addressSchema.xsd) should return a reference to Document.
Test 2 Passing a valid filename that is not a valid XML file (BadSample1.txt) should throw an exception.
Test 3 Passing a null filePath should cause the operation to throws an exception.
Test 4 Passing an empty XML file (BadSample3.xml) should throw an exception. The load(stream: Stream, schemaPath: String):

Document operations should throw an exception if they are not test “1”

Table2: load(stream, schemaPath)

Test ID Description
Test 1 Passing a valid filename(Sample1.xml- inputstream) with a valid XML file (addressSchema.xsd). Should return a reference to Document.
Test 2 Passing an empty XML file. Should throw an exception.

A separate test will be specifically used to validate the schemas within the load functions. Tests "1" and "2” should allow load() to return a Document. All the other tests will throw an exception.

Keep in mind that load(filePath, schemaPath) calls load(stream, schemaPath) once filePath is verified. Therefore, all validations are technically done within load(stream, schemaPath) and there is no need to test both load functions for validation.

Table 3: load(filePath, schemaPath)

Test ID Description
Test 1 Passing a valid filename with a valid schema. Should return a reference to a Document.
Test 2 Passing a valid filename with a valid schema. Should return a reference to a Document.
Test 3-9 Passing invalid XML(BadSample#.xml) files to be tested against schema. Should throw an exception.
Test 10 Passing a bad schema path should throw an exception.

The writeToXML operations should return 0 on tests “1”, "2", and "3" but throw an exception (fail) on the rest.

Table 4: writeToXML(document, filePath)

Test ID Description
Test 1 Passing a valid filepath and document. Return 0.
Test 2 Passing a duplicate filepath. Return 0.
Test 3 Passing an initialized document and filepath. Return 0.
Test 4 Passing a filepath equal to null. Throw an exception.
Test 5 Passing a document equal to null. Throw an exception.
Test 6 Passing an invalid filename (not .xml). Throw an exception.-

The createDocumentParser operations should return a Document on test “1” and throw an exception on the rest (failed)

Table 5: createDocumentParser(schemaPath)

Test ID Description
Test 1 Passing a valid schemaFile. Return an empty document.
Test 2 Passing a schemafile equal to null. Should throw an exception.
Test 3 Passing a nonexistent schemafile. Should throw an exception.

Testing XML Prototype: XMLAddressDocManipulator

This section will discuss the integration testing for XMLDocumentManager using the XMLAddressDocManipulator. Each operation inside of XMLAddressDocManipulator will have a series of tests to go through. The schema to be passed will be the schema in [Appendix A].

The load operations should return a HashMap for test “1”, but throw an exception for the rest.

Table 6: addressLoad(stream, schemaPath)

Test ID Description
Test 1 Pipe Sample1.xml into InputStream and test it.
Test 2 Pipe Sample1.xml into InputStream, but make schemaFilePath be null. Should throw an exception.
Test 3 Pipe BadSample1.xml into InputStream and throw an exception.

The load(filePath: String, schemaPath:String) should return a HashMap for test “1”, but throw an exception for the rest.

Table 7: addressLoad(filePath, schemaPath)

Test ID Description
Test 1 Use Sample1.xml and test it.
Test 2 Use Sample1.xml, but make schemaFilePath be null.
Test 3 Use BadSample1.xml and should throw an exception.

Due to the writing and verification natures of the write() function, it will need to be tested with the load() function for verification purposes. The write() operation should return 0 for test “1” and throw an exception (fail) for the rest.

Table 8: addressWrite(hashMap, filePath, schemaPath)

Test ID Description
Test 1 Data in Sample1.xml into hashmap and test it. Return 0.
Test 2 Data in Sample1.xml into hashmap, but pass null as schemaPath. Should throw an exception.
Test 3 Data in Sample1.xml into hashmap, but pass null as filePath. Should throw an exception.
Test 4 Data in BadSample1.xml into hashmap and test it. Should throw an exception.

Data Marshalling

Data Marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission and vice versa.

Data marshaling in the XML persistence prototype will consist of converting the XML file to a Document object in the XMLDocumentManager. Then the XMLAddressDocManipulator will convert the Document object to and from a HashMap.

Example of HashMap:

Key = <tag>
Value = information inside of tag.

Converting Sample1.xml

from Appendix B to a HashMap gives us…

HashMap temp:
Address, null
Name, null
title, Mr.
first-Name, Billy
last-Name, Madison
street, 111 blah street
city, Cashville
state, VA
postal-code – 99933-4455

Location in repository

Location in repository The prototype will be stored under,

Titled: XMLDocumentManager for model.

Titled: XMLDocumentManagerJava for Java code and tests.


  1. “Introduction to XML” Tidwell, Doug. 07 August 2002. IBM, Jan 18th 2012.
  2. “Introduction to XML” Tidwell, Doug. 07 August 2002. IBM, Jan 18th 2012.
  3. “XSD Indicators” W3Schools, Jan 18th 2012.
  4. “An XSD Example” W3Schools, Jan 18th 2012.
  5. “XML and Java – Parsing XML using Java Tutorial”., Jan 18th 2012.
  6. “How to read XML file in Java” mkyong. 04 August 2011. Mkyong, Jan 18th 2012.
  7. “Validating with XML Schema” Oracle, Jan 18th 2012.
  8. “How to create XML file in Java – (Dom Parser)” mkyong. 04 August 2011. Mkyong, Jan 18th 2012.
  9. “XML Schema Part 0: Primer Second Edition” Fallside, David C. and Walmsley, Priscilla. 28 Oct. 2004., Jan 18th 2012.
  10. “What’s the best way to validate an XML file against an XSD file?” McDowell, 16th Aug 2009. StackOverflow. Jan 19th, 2012.
  11. “Introduction to JAXP” Oracle, Jan 23rd, 2012.
  12. “Validating With XML Schema” Oracle, Jan 23rd, 2012.
  13. “How to create an InputStream from a Document or Node” Gary Kephart, 14th May 2009. Oracle, Jan 23rd, 2012.
  14. “Validator.validate() fails when using DOM Document instead of File” Mike Sucena 02 November 2010. Jan 26th, 2012.


Appendix A: Address Schema

Address Schema

    <xsd:schema xmlns:xsd="">
     <xsd:element name="address">
           <xsd:element ref="name"/>
           <xsd:element ref="street"/>
           <xsd:element ref="city"/>
           <xsd:element ref="state"/>
           <xsd:element ref="postal-code"/>
     <xsd:element name="name">
           <xsd:element ref="title" minOccurs="0" maxOccurs=”1”/>
           <xsd:element ref="first-Name"/>
           <xsd:element ref="last-Name"/>
     <xsd:element name="title"      type="xsd:string"/>
     <xsd:element name="first-Name" type="xsd:string"/>
     <xsd:element name="last-Name"  type="xsd:string"/>
     <xsd:element name="street"     type="xsd:string"/>
     <xsd:element name="city"       type="xsd:string"/>
     <xsd:element name="state">
         <xsd:restriction base="xsd:string">
           <xsd:length value="2"/>
     <xsd:element name="postal-code">
         <xsd:restriction base="xsd:string">
           <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/>

Appendix B: Sample valid to schema XML files


       <title>Mr. </title>
     <street>111 blah street</street>
     <state> VA </state>


     <street>113 blam blvd</street>
     <state> WI </state>

Appendix C

BadSample1.xml: –no <address> starting tag

     <street>111 blah street</street>

BadSample2.xml: –no <name>

     <street>111 blah street</street>
     <state> VA </state>


Empty file (nothing inside)


– invalid restriction in <postal-code>, letter

     <street>111 blah street</street>
     <state> VA </state>

BadSample5.xml: – invalid restriction on <postal-code>, too short.

     <street>111 blah street</street>
     <state> VA </state>

BadSample6.xml: – missing <last-Name> tag under <name>

       <title>Mr. </title>
     <street>111 blah street</street>
     <state> VA </state>

BadSample7.xml: – added <country> tag after <state>

     <street>111 blah street</street>
     <state> VA </state>
        <country> USA </country>

Back to the top