Data Objects #
This page contains the following content:
Data Objects, Case Bases, and Object Pools #
Creating Instances of Data Classes (Data Objects) #
Instances of instantiable data classes (see data classes) can be created during run-time using a loaded model (default model can be retrieved
using ModelFactory.getDefaultModel()
). It has a method createObject
which requires the name of the desired data class. After the creation of the object/instance, class-specific values can be set by using the corresponding methods:
StringObject stringObjectA = ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObjectA.setNativeString("test");
For system data classes, the model has methods to return the data class directly. Thus, the String class used above can also be queried with the getStringSystemClass()
method:
StringObject stringObjectB = model.getStringSystemClass().newObject();
For system and custom user data classes, the model provides the following method to access a class by its name:
StringObject stringObjectC = model.getClass(StringClass.CLASS_NAME).newObject();
Properties of Data Objects #
In addition to the “main value” of a data object, each data object can store properties to store additional information about the object. For example, maintenance information like how often the object was retrieved is stored in such a way. The system knows several predefined properties but user-defined ones are also possible. A property is a key-value pair whereby the key as well as the value must be a string. This restriction is necessary to be able to embed the properties into XML. If another data type has to be handled an encoding to and from string must be realised.
Object Pools #
In ProCAKE, data objects can be collected in pools. These pools exist in two variants, as readable object pools and as
writeable object pools. A pool can contain arbitrary DataObject
s. That means that the objects can belong to
different DataClass
es.
Each object pool must have an identifier and this identifier must be unique in complete ProCAKE during run-time. If the
pool is created using the method ObjectPoolFactory.newObjectPool()
, this is guaranteed. If custom object pools are
created, it must be taken care, that the identifier is different to ObjectPoolFactory.POOL_NAME
.
To access a data object in an object pool, each data object has to be identified. A data object can be element of several pools, so a simple identification number is not sufficient, because it must be unique in all pools. Therefore, a data object has a specific objectId. This id can be set manually, for example in the casebase. In this case, the user is responsible for ensuring that this id is unique. If an object without an id is read into an object pool, an id is generated. This id consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is an unique id that must be unique within the object pool and is managed by the object pool itself. This is just a necessary pre-condition to realize the synchronization and identification of objects, not the complete synchronization technique.
New objectIds can be created manually using the
method ObjectPoolFactory.newObjectId(String objectPoolId, String offset)
or are created automatically using the
method WriteableObjectPool.store(DataObject dataObject)
. To access the objects of a pool without using the id,
the DataObjectIterator
can be used. It extends the standard Iterator and contains one additional
method nextDataObject
, that returns a DataObject.
Readable Object Pools #
A ReadableObjectPool
pool is a container for data objects. The pool provides special access methods that are
interpreting the ObjectIds of the data objects. This class is only an interface, so it’s not possible, to create any
instances of it.
To check, if two object pools have the same objects in, the method hasSameValueAsIn(ReadableObjectPool objectPool)
can
be used. If two pools contain the same objects, true
is returned.
Writeable Object Pools #
A WritableObjectPool
extends the ReadableObjectPool
and contains additional methods to modify the pool.
Therefore, remove
and store
methods are provided.
The remove
method needs any kind of data object or the offset of the object, to remove this data object from the
pool. The objectId is automatically removed from the object. It is also possible to use the method removeAll()
to
clear the complete pool.
The store
method needs any kind of data object, which will be stored in the pool. Thereby, a new unique objectId
is automatically created, if the object does not have one already. If the object is already a member of the pool, nothing will happen. It’s also possible to use the method storeAll
with a collection or a ReadableObjectPool, which will be stored in the WritableObjectPool.
When using the method CAKEInstance.start
, a WritableObjectPool is returned. If the method had a case base given, the objects of the case base are all stored in the pool.
Using a CSV Case Base #
It’s also possible, to use a CSV file as case base or parse it into an existing one. In a CSV file, only structural
cases can be represented. So, the parser can be used to create aggregate objects. To read and use
such a file, the class CakeCSVParser
can be used. It requires the path of the CSV file, which can be set by using
the method setFilename
. This looks like:
CakeCSVParser csvParser = new CakeCSVParser();
csvParser.setFilename("casebase.csv");
For using the parser, a declaration containing the corresponding classes is necessary. It’s important, that every name in the CSV file is unique, otherwise an exception will be thrown. If the column names in the CSV are equal to the attribute names of the aggregate class to map each CSV example to, the CSV parser can be used without any more configurations.
For example, a simple CSV file can look like that:
"attribute1","attribute2"
"value1","valueA"
"value2","valueB"
In the above example, it is necessary, that the model contains an aggregate class that contains attributes with the names attribute1 and attribute2. Otherwise, the file cannot be read without further information.
If there are different names in the CSV columns and the model class or the file doesn’t contain any header information at all, a mapping file is required. This file also has to be in the CSV format. It contains the name of the column in the CSV file as the first column and the name in the model as the second column. It might look like this:
"nameInCSV","nameInModell"
"attribute1","attributeA"
"attribute2","attributeB"
Here, attribute1 will correspond to attributeA in the model, as well as attribute2 will be mapped to attributeB.
Alternatively, the column number can be the first argument and the attribute name in the model the second one. This can look like:
"rowInCSV","nameInModell"
1,"attributeA"
2,"attributeB"
Here, the values of the first column of the CSV file containing the examples will correspond to attributeA in the model, and the values of the second column to attributeB.
To use such a mapping file, the method setMappingFile
can be used. This might look like this:
csvParser.setMappingFile("mapping.csv");
When the configuration of the parser is finished, it can be started via the
method createAggregateObjects(String className)
. The parameter className must be the name of the corresponding
aggregate class in the model. The method will return a list of aggregate objects, filled with the values of the CSV
file as aggregate attributes.
Because the creation of the objects runs in parallel, the order of the objects in the object pool isn’t deterministic.
It’s also possible to use aggregate classes, which contain nested aggregate classes in any depth. In this case, the classes need to be defined in the model file. The attribute names need to be unique, so they or their mapping partners can be found in the CSV file. This way, the information of the CSV file, which contains no hierarchical relations, can be mapped to an aggregate object that might contain nested attributes, thus expressing hierarchical relationships.
Cloning and Comparing Objects #
ProCAKE provides particular methods for cloning data objects, i.e. DataObject copy()
, and comparing them, i.e. boolean hasSameValueAsIn(DataObject object)
, in the class DataObject
.
The method copy()
makes a deep copy of the data object. This means that the method is called recursively for any nested data object, e.g., list elements or aggregate attributes.
The following code gives an example, where a String object is created and copied:
StringObject stringObject = (StringObject) ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObject.setNativeString("test");
StringObject copiedStringObject = (StringObject) stringObject.copy();
The object copiedStringObject
is an instance of the same class (StringObject
) and contains the same value as the original object ("test"
).
In order to compare if the values of two objects are equal, the method hasSameValueAsIn
can be used.
It performs no reference check but instead checks all relevant information of two data objects for equality.
The definition of relevant information is different for each data object.
For instance, two integer objects are simply equal if their values are equal.
In contrast, the test for equality of two lists (or even NEST graphs) is much more complex.
Therefore, hasSameValueAsIn()
makes recursive calls for nested data objects, analogous to copy()
.
The following code gives an example, using the objects from the example of the copy
method above:
copiedStringObject.hasSameValueAsIn(stringObject); // returns true
This method call would return true
, because both objects contain the same value.
Using a String object, that contains another value, the method would return false
.
When using the hasSameValueAsIn
method for objects, that instantiate different data classes, it returns false
.
The following list contains a few more specific behaviors of some data classes:
- When comparing Numeric objects, it is checked, if the values are identical.
So, for example a Integer object with the value
1
and a Double object with the value1.0
are identified as equal by this method. - When comparing Numeric objects with String objects, it is checked, if the String object contains numeric values.
So, for example a Integer object with the value
1
and a String object with the value"1"
are identified as equal by this method. - When comparing Collection objects, it is checked, if they contain the same elements. Each element must appear at the same position in both collections (if the data structure supports positional information). So, for example a List object and a Set object, that both contain the same five values, are identified as equal by this method. Since it is possible that a List object contains an object more than once, this would lead to the method identifying both objects as unequal.
Object IDs #
Each data object in ProCAKE, can have an objectId. This id is necessary to identify an object in an object pool. For this purpose, the id must be unique.
The id can be set manually in the case base or afterwards by using the method setId(String objectId)
. In this case, the user is responsible for ensuring that this id is unique. Otherwise, an exception would be thrown, when this object is read to an object pool.
For example, an id can be set in a casebase as follows:
<cdol:V id="exampleId" c="Void"/>
This represents a Void object, which can be identified by the id exampleId. It’s important to ensure, that no other object has this id.
During runtime, the id can be set as follows:
voidObject.setId("exampleId");
Note, that this requires an object voidObject, that has to be created before.
If an object without an id is read into an object pool, an id is generated. This id consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is a unique id that must be unique within the object pool and is managed by the object pool itself. This type of new objectIds can be created manually using the method ObjectPoolFactory.newObjectId(String objectPoolId, String offset)
. This method is also called automatically by the method WriteableObjectPool.store(DataObject dataObject)
, if no id is set.
Readers and Writers #
This page contains a short overview of the (de-)serialization utilities provided within the ProCAKE framework which offer object IO capabilities for external projects.
IOUtil #
ProCAKE provides several implementations for reading and writing objects. All the available reader and writer implementation are registered in the ‘composition.xml’ configuration file. See Configuration page for more details.
Reading Objects #
For instance, to read an object pool, the following method can be used:
WriteableObjectPool pool = IOUtil.readFile(filePath, WriteableObjectPool.class);
For reading an object with a specific reader, the name of the desired reader can be given as an argument as follows:
WriteableObjectPool pool = IOUtil.readFile(filePath, ObjectPoolParser.PARSERNAME);
To read a single data object, the ‘ObjectParser’ can be used:
DataObject object = IOUtil.readFile(filePath, ObjectParser.PARSERNAME);
The ‘StringReader’ can be used for reading arbitrary text files:
String stringObject = IOUtil.readFile(filePath, StringReader.READERNAME);
Writing Objects #
To write an object pool to a file, the following method can be used:
WriteableObjectPool poolToWrite = ObjectPoolFactory.newObjectPool();
IOUtil.writeFile(poolToWrite, filePath);
When writing an object, also a specific writer can be used. The name of the desired writer has to be given for this purpose:
IOUtil.writeFile(poolToWrite, filePath, ObjectPoolWriterImpl.WRITERNAME);
To write a data object as XML, the ‘ObjectPoolWriterImpl’ can be used:
IOUtil.writeFile(dataObject, filePath, ObjectWriterImpl.WRITERNAME);
To write a data object or Strings to a TXT file, the ‘StringWriter’ can be used:
IOUtil.writeFile(dataObject, filePath, StringWriter.WRITERNAME);
XStreamUtil #
ProCAKE provides a utility class for using XStream, which supports both XML and JSON serialization. The utility class XStreamUtil
is located in the package de.uni_trier.wi2.procake.utils.io.xstream
and contains the respective methods toXML()
toJSON()
and fromXML()
, fromJSON()
.
To serialize a RetrievalResultList
object to XML for instance, simply call
RetrievalResultList rrl;
String serializedList = XStreamUtil.toXML(rrl);
To deserialize a Query
in XML representation to a Query
object, call
String queryObjectXML;
QueryImpl query = XStreamUtil.fromXML(queryObjectXML);
Inheritance Structure #
The data objects are organized in an object-oriented tree. Except for Union the object hierarchy is the same as for the Data Classes.