Data Objects

Data Objects #

This page contains the following content:

Data Objects, Case Bases, and Object Pools #

Creating Instances of Data Classes (Data Objects) #

Instances of instantiable data classes (see data classes) can be created during run-time using a loaded model (default model can be retrieved using ModelFactory.getDefaultModel()). It has a method createObject which requires the name of the desired data class. After the creation of the object/instance, class-specific values can be set by using the corresponding methods:

StringObject stringObjectA = ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObjectA.setNativeString("test");

For system data classes, the model has methods to return the data class directly. Thus, the String class used above can also be queried with the getStringSystemClass() method:

StringObject stringObjectB = model.getStringSystemClass().newObject();

For system and custom user data classes, the model provides the following method to access a class by its name:

StringObject stringObjectC = model.getClass(StringClass.CLASS_NAME).newObject();

Properties of Data Objects #

In addition to the “main value” of a data object, each data object can store properties to store additional information about the object. For example, maintenance information like how often the object was retrieved is stored in such a way. The system knows several predefined properties but user-defined ones are also possible. A property is a key-value pair whereby the key as well as the value must be a string. This restriction is necessary to be able to embed the properties into XML. If another data type has to be handled an encoding to and from string must be realized.

Object Pools #

In ProCAKE, data objects can be collected in pools. These pools exist in two variants, as readable object pools and as writable object pools. A pool can contain arbitrary DataObjects. That means that the objects can belong to different DataClasses.

Each object pool must have an identifier and this identifier must be unique in complete ProCAKE during run-time. If the pool is created using the method ObjectPoolFactory.newObjectPool(), this is guaranteed. If custom object pools are created, it must be taken care, that the identifier is different to ObjectPoolFactory.POOL_NAME.

To access a data object in an object pool, each data object has to be identified. A data object can be element of several pools, so a simple identification number is not sufficient, because it must be unique in all pools. Therefore, a data object has a specific objectId. This ID can be set manually, for example in the casebase. In this case, the user is responsible for ensuring that this ID is unique. If an object without an ID is read into an object pool, an ID is generated. This ID consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is an unique ID that must be unique within the object pool and is managed by the object pool itself. This is just a necessary pre-condition to realize the synchronization and identification of objects, not the complete synchronization technique.

New objectIds can be created manually using the method ObjectPoolFactory.newObjectId(String objectPoolId, String offset) or are created automatically using the method WriteableObjectPool.store(DataObject dataObject). To access the objects of a pool without using the id, the DataObjectIterator can be used. It extends the standard Iterator and contains one additional method nextDataObject, that returns a DataObject.

Readable Object Pools #

A ReadableObjectPool pool is a container for data objects. The pool provides special access methods that are interpreting the ObjectIds of the data objects. This class is only an interface, so it’s not possible, to create any instances of it.

To check, if two object pools have the same objects in, the method hasSameValueAsIn(ReadableObjectPool objectPool) can be used. If two pools contain the same objects, true is returned.

Writable Object Pools #

A WritableObjectPool extends the ReadableObjectPool and contains additional methods to modify the pool. Therefore, remove and store methods are provided.

The remove method needs any kind of data object or the offset of the object, to remove this data object from the pool. The objectId is automatically removed from the object. It is also possible to use the method removeAll() to clear the complete pool.

The store method needs any kind of data object, which will be stored in the pool. Thereby, a new unique objectId is automatically created, if the object does not have one already. If the object is already a member of the pool, nothing will happen. It’s also possible to use the method storeAll with a collection or a ReadableObjectPool, which will be stored in the WritableObjectPool.

When using the method CAKEInstance.start, a WritableObjectPool is returned. If the method had a case base given, the objects of the case base are all stored in the pool.

Using a CSV Case Base #

It’s also possible, to use a CSV file as case base or parse it into an existing one. In a CSV file, only structural cases can be represented. So, the parser can be used to create aggregate objects. To read and use such a file, the class CakeCSVParser can be used. It requires the path of the CSV file, which can be set by using the method setFilename. This looks like:

CakeCSVParser csvParser = new CakeCSVParser();
csvParser.setFilename("casebase.csv");

For using the parser, a declaration containing the corresponding classes is necessary. It’s important, that every name in the CSV file is unique, otherwise an exception will be thrown. If the column names in the CSV are equal to the attribute names of the aggregate class to map each CSV example to, the CSV parser can be used without any more configurations.

For example, a simple CSV file can look like that:

"attribute1","attribute2"
"value1","valueA"
"value2","valueB"

In the above example, it is necessary, that the model contains an aggregate class that contains attributes with the names attribute1 and attribute2. Otherwise, the file cannot be read without further information.

If there are different names in the CSV columns and the model class or the file doesn’t contain any header information at all, a mapping file is required. This file also has to be in the CSV format. It contains the name of the column in the CSV file as the first column and the name in the model as the second column. It might look like this:

"nameInCSV","nameInModell"
"attribute1","attributeA"
"attribute2","attributeB"

Here, attribute1 will correspond to attributeA in the model, as well as attribute2 will be mapped to attributeB.

Alternatively, the column number can be the first argument and the attribute name in the model the second one. This can look like:

"rowInCSV","nameInModell"
1,"attributeA"
2,"attributeB"

Here, the values of the first column of the CSV file containing the examples will correspond to attributeA in the model, and the values of the second column to attributeB.

To use such a mapping file, the method setMappingFile can be used. This might look like this:

csvParser.setMappingFile("mapping.csv");

When the configuration of the parser is finished, it can be started via the method createAggregateObjects(String className). The parameter className must be the name of the corresponding aggregate class in the model. The method will return a list of aggregate objects, filled with the values of the CSV file as aggregate attributes.

Because the creation of the objects runs in parallel, the order of the objects in the object pool isn’t deterministic.

It’s also possible to use aggregate classes, which contain nested aggregate classes in any depth. In this case, the classes need to be defined in the model file. The attribute names need to be unique, so they or their mapping partners can be found in the CSV file. This way, the information of the CSV file, which contains no hierarchical relations, can be mapped to an aggregate object that might contain nested attributes, thus expressing hierarchical relationships.

Cloning and Comparing Objects #

ProCAKE provides particular methods for cloning data objects, i.e. DataObject copy(), and comparing them, i.e. boolean hasSameValueAsIn(DataObject object), in the class DataObject. The method copy() makes a deep copy of the data object. This means that the method is called recursively for any nested data object, e.g., list elements or aggregate attributes. The following code gives an example, where a String object is created and copied:

StringObject stringObject = (StringObject) ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObject.setNativeString("test");
StringObject copiedStringObject = (StringObject) stringObject.copy();

The object copiedStringObject is an instance of the same class (StringObject) and contains the same value as the original object ("test").

In order to compare if the values of two objects are equal, the method hasSameValueAsIn can be used. It performs no reference check but instead checks all relevant information of two data objects for equality. The definition of relevant information is different for each data object. For instance, two integer objects are simply equal if their values are equal. In contrast, the test for equality of two lists (or even NEST graphs) is much more complex. Therefore, hasSameValueAsIn() makes recursive calls for nested data objects, analogous to copy(). The following code gives an example, using the objects from the example of the copy method above:

copiedStringObject.hasSameValueAsIn(stringObject);  // returns true

This method call would return true, because both objects contain the same value. Using a String object, that contains another value, the method would return false.

When using the hasSameValueAsIn method for objects, that instantiate different data classes, it returns false. The following list contains a few more specific behaviors of some data classes:

  • When comparing Numeric objects, it is checked, if the values are identical. So, for example a Integer object with the value 1 and a Double object with the value 1.0 are identified as equal by this method.
  • When comparing Numeric objects with String objects, it is checked, if the String object contains numeric values. So, for example a Integer object with the value 1 and a String object with the value "1" are identified as equal by this method.
  • When comparing Collection objects, it is checked, if they contain the same elements. Each element must appear at the same position in both collections (if the data structure supports positional information). So, for example a List object and a Set object, that both contain the same five values, are identified as equal by this method. Since it is possible that a List object contains an object more than once, this would lead to the method identifying both objects as unequal.

Object IDs #

Each data object in ProCAKE, can have an objectId. This ID is necessary to identify an object in an object pool. For this purpose, the ID must be unique.

The ID can be set manually in the case base or afterwards by using the method setId(String objectId). In this case, the user is responsible for ensuring that this ID is unique. Otherwise, an exception would be thrown, when this object is read to an object pool.

For example, an ID can be set in a casebase as follows:

<cdol:V id="exampleId" c="Void"/>

This represents a Void object, which can be identified by the ID exampleId. It’s important to ensure, that no other object has this id.

During runtime, the ID can be set as follows:

Wiki_DataClassesTest.java

    voidObject.setId("exampleId");

Note, that this requires an object voidObject, that has to be created before.

If an object without an ID is read into an object pool, an ID is generated. This ID consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is a unique ID that must be unique within the object pool and is managed by the object pool itself. This type of new objectIds can be created manually using the method ObjectPoolFactory.newObjectId(String objectPoolId, String offset). This method is also called automatically by the method WriteableObjectPool.store(DataObject dataObject), if no ID is set.

Readers and Writers #

This page contains a short overview of the (de-)serialization utilities provided within the ProCAKE framework which offer object IO capabilities for external projects.

IOUtil #

ProCAKE provides several implementations for reading and writing objects. All the available reader and writer implementation are registered in the ‘composition.xml’ configuration file. See Configuration page for more details.

Reading Objects #

For instance, to read an object pool, the following method can be used:

WriteableObjectPool pool = IOUtil.readFile(filePath, WriteableObjectPool.class);

For reading an object with a specific reader, the name of the desired reader can be given as an argument as follows:

WriteableObjectPool pool = IOUtil.readFile(filePath, ObjectPoolParser.PARSERNAME);

To read a single data object, the ‘ObjectParser’ can be used:

DataObject object = IOUtil.readFile(filePath, ObjectParser.PARSERNAME);

The ‘StringReader’ can be used for reading arbitrary text files:

String stringObject = IOUtil.readFile(filePath, StringReader.READERNAME);

Writing Objects #

To write an object pool to a file, the following method can be used:

WriteableObjectPool poolToWrite = ObjectPoolFactory.newObjectPool();
IOUtil.writeFile(poolToWrite, filePath);

When writing an object, also a specific writer can be used. The name of the desired writer has to be given for this purpose:

IOUtil.writeFile(poolToWrite, filePath, ObjectPoolWriter.WRITERNAME);

To write a data object as XML, the ‘ObjectPoolWriterImpl’ can be used:

IOUtil.writeFile(dataObject, filePath, ObjectWriter.WRITERNAME);

XStreamUtil #

ProCAKE provides a utility class for using XStream, which supports both XML and JSON serialization. The utility class XStreamUtil is located in the package de.uni_trier.wi2.procake.utils.io.xstream and contains the respective methods toXML() toJSON() and fromXML(), fromJSON().

To serialize a RetrievalResultList object to XML for instance, simply call

RetrievalResultList rrl;
String serializedList = XStreamUtil.toXML(rrl);

To deserialize a Query in XML representation to a Query object, call

String queryObjectXML;
QueryImpl query = XStreamUtil.fromXML(queryObjectXML);

Inheritance Structure #

The data objects are organized in an object-oriented tree. Except for Union the object hierarchy is the same as for the Data Classes.