Configuration

Configuration #

This page contains the following content:

Configuration at build-time via XML #

The configuration via XML files is meant to be used to provide an initial configuration for the start-up of ProCAKE.

General Configuration #

A benefit of the pattern driven architecture of ProCAKE is extensibility. For instance, the various implementations for persistence, retrieval, and adaptation are organized in factories. Implementations can be registered and configured in an XML file (usually named composition.xml) that is read once at start-up.

Using the method CakeInstance.start, several configuration files can be provided. By default, the pre-defined composition file located under /de/uni_trier/wi2/composition.xml is used. The method CakeInstance.start(String composition) can be used as follows:

CakeInstance.start("/path/composition.xml");

Data Model #

In addition to the system classes, custom-structured classes (referred to as user classes) can be defined as sub-types of system classes. At start-up, the system can initialize the default model based on a configuration file. Even though several custom models can be defined, the usual practice is to use a single model definition as default (usually named model.xml). The definition of custom user data classes in the model.xml is described on page Data Classes. When only using system data classes, a custom model definition is not necessary. However, some abstract system data classes such as collections require the definition of user classes. By default, ProCAKE is instantiated with an empty model. If you define custom classes in the ‘model.xml’, make sure that the name of the new class is unique and that a referenced class is defined in the XML file before its reference.

The method CakeInstance.start(String composition, String model, String simModel, String casebase) can be used as to specify an initial data model and similarity model as follows:

CakeInstance.start("/path/composition.xml", "/path/model.xml", "/path/simModel.xml", "/path/casebase.xml");

Defining a model and simModel parameter also requires to specify a composition and casebase parameter. However, it is possible to set parameters to null. In this event, the default files are used instead.

Similarity Model #

To allow for comparing system and user classes, similarity measures have to be defined. Analogous to the data model several similarity models can be defined while usually a single model (named sim.xml) is used as default. A similarity measure can be selected as default for a data class, so that the measure is applied for all sub-classes of that class. By default, without specifying a custom similarity model, a pre-defined similarity model file located under /de/uni_trier/wi2/sim.xml is used. When providing a custom model you have to ensure that corresponding measures exist for all data classes used in the case representation. The definition of similarity measures in an XML file is described on page Similarity Measures.

The method CakeInstance.start(String composition, String model, String simModel, String casebase) can be used as to specify an initial data model and similarity model as follows:

CakeInstance.start("/path/composition.xml", "/path/model.xml", "/path/simModel.xml", "/path/casebase.xml");

Defining a model and simModel parameter also requires to specify a composition and casebase parameter. However, it is possible to set parameters to null. In this event, the default files are used instead.

Case Bases #

ProCAKE can be initialized with an existing case base. The contained objects are parsed from the given XML file and are written to an object pool, which is returned by the CakeInstance.start method. Please note, that a case base requires a corresponding model file if custom data classes have been used.

Using the casebase parameter, also requires the definition of the composition parameter in method CakeInstance.start(String composition, String casebase). If the composition parameter is set to null, the default composition configuration is used.

CakeInstance.start("/path/composition.xml", "/path/casebase.xml");

In turn, if the path to a case base is set, the method returns an instance of WriteableObjectPool. This pool contains the objects that have been parsed from the given file.

Configuration at run-time via Java #

While the XML configuration is usually used for the initial instantiation of ProCAKE, the run-time configuration allows for further modifications.

General Configuration #

The pre-defined composition file located under /de/uni_trier/wi2/composition.xml contains several factory implementations whose factory objects can be used at run-time to bind or unbind implementations. Please refer to the interface cake.utils.composition.Factory for more details. Regarding factory implementations and IO objects, we highly recommend you to configure them in the XML file and let them be registered during ProCAKE start. Registering instances during run-time might lead to side effects.

Data Model #

Data classes can be created dynamically during run-time via Java code. Fur this purpose, it is necessary to access an existing data class of ProCAKE. These can easily referenced by using methods of the default model, which can be accessed by using ModelFactory.getDefaultModel. Every data class can be called by the method getClass(String name), where name must refer to the class name of an existing class. For the system data classes, these names can be found in the class interfaces. For example, the name of Data Class can be accessed by DataClass.CLASS_NAME. For the system data classes, it’s also possible to use specific methods of the model, for example getDataSystemClass().

Using this existing class, the method createSubclass(String name) can be used. The string name is the name of the subclass, wich has to be unique. Otherwise an exception would be thrown. The method returns a instance of the new data class. The createSubclass method adds the class to the model. Before it can be used, the method finishEditing() has to be called. After this is done, the class can’t be edited anymore. When trying this, an exception is thrown accordingly.

The following code lines give an example of a data class creation. The created class is a subclass of the String class:

StringClass stringClass = ModelFactory.getDefaultModel().getClass(StringClass.CLASS_NAME);
StringClass customStringClass = (StringClass) stringClass.createSubclass("customStringClass");
customStringClass.finishEditing();

The instance customStringClass of the newly created subclass of StringClass can be used inside this specific code block. However, if an instance of the same class is required at some later point in time in another code block, the instance customStringClass might not be referenceable anymore. For this case, the newly created data class is also added to the model and can be retrieved later on via its unique class name.

StringClass customStringClass = (StringClass) ModelFactory.getDefaultModel().getClass("customStringClass");

Similarity Model #

Similarity measures can be also created or extended dynamically during run-time via Java code. For this purpose, it is necessary to access an existing similarity model of ProCAKE. A new similarity measure is created based on an existing template. A new instance can be created by specifying a name of a template and a data class the measure is responsible for. System data classes can be easily referenced by using methods of the default model, which can be accessed via ModelFactory.getDefaultModel().

After the creation of the measure, specific configurations can be made using the measure methods. All the available configurations can be found in the descriptions of the respective measures on page Similarity Measures.

Finally, the measure can be added to the similarity model using a unique name as identifier. Measures can override existing measures with the same name by using the method setForceOverride. If no override flag is set and there is already a measure with the same name, an exception is thrown.

The following gives an example instantiation of String Equals measure:

SimilarityModel simModel = SimilarityModelFactory.getDefaultSimilarityModel();
SMStringEqual simMeasure = (SMStringEqual) simModel.createSimilarityMeasure(SMStringEqual.NAME, ModelFactory.getDefaultModel().getStringSystemClass());
simMeasure.setCaseInsensitive();
simMeasure.setForceOverride(true);
simModel.addSimilarityMeasure(simMeasure, "MyCustomStringEqual");

New similarity measure templates can be also registered at run-time. For this purpose, it is required to create a Java class that implements the interface SimilarityMeasure. Example implementations of such classes can be found in all system similarity measures (e.g., SMStringEqualImpl and SMAggregateAverageImpl). The most important part of this implementation is the definition of the method compute(DataObject queryObject, DataObject caseObject, SimilarityValuator valuator), which computes a similarity from two data objects. The fully defined similarity measure can be registered in the similarity model as described above. In fact, defining a similarity measure at run-time via Java Code is the only way to use fully custom measures in an application since the configuration via XML only allows to configure existing system measures.

Factories #

The Factory classes are the connection between the interfaces and the implementations, that should be used. If an object of an interface is needed, a request to the corresponding factory is sent which returns the object.

Beside the methods of the factory interface, each factory contains two kinds of static methods:

  • get-methods: Each method, that starts with get reuses an existing factory object (e.g., a registered instance of a reader or writer). Therefore, most of these methods begin with getDefault to emphasize this issue.
  • new-methods: Each method, that starts with new creates a new object. In some factories, it is necessary to specify the newly created object as the default object, thus the new-method exists with a boolean parameter.

The Factory interface defines four methods that are implemented by each factory implementation:

  • addParameter(AbstractParameter parameter): This method adds a parameter to the current factory.
  • boolean bind (Object implementation): This method binds an implementation to the factory. It returns true if binding was successful. It is usually called in the start-up process of ProCAKE.
  • reset(): This method has to be implemented individually by every factory in order to allow a proper restart of ProCAKE. For example, this method might reset the default factory object or other stateful variables, i.e., every variable that should be put into an initial state in order to allow a proper restart.
  • boolean unbind(Object implementation): This method unbinds an implementation from the factory. It returns true if unbinding was successful.

There are several implementations of the Factory class already integrated in ProCAKE:

  • AdaptationFactory: This factory creates several adaptation instances.
  • IOFactory: This factory contains all registered Readers, Writers, and ContentHandlers of ProCAKE. To get such a component, first the names of the readers have to be requested and afterwards a request to the component with the name has to be made.
  • LoggerFactory: This factory contains the default Logger of ProCAKE. The logger provides a unified method to log messages and inform available logging listeners.
  • MessageFormatterFactory: This factory contains the default MessageFormatter. The MessageFormatter loads and formats localized messages.
  • ModelFactory: This factory allows access to the implementations of the data model interfaces. ProCAKE contains a default model, which can be accessed by using the method getDefaultModel(). The factory can also manage several other models, which can be created by the method newModel(String name).
  • ObjectPoolFactory: This factory is used to create ObjectIds and object pools, i.e., case bases.
  • OntologyFactory: This factory is used to load an ontology from an .owl file and give it a name to be referenced from a URIClass in the model. Example loading the pizza.owl file via the composition.xml:
<Factory name="ontology" class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactory">
  <Implementation class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactoryObjectImpl">
    <Parameter name="ontologyName" value="pizza"/>
    <Parameter name="ontologyPath" value="/de/uni_trier/procake/test/ontology/pizza.owl"/>
  </Implementation>
</Factory>

Example loading the pizza.owl file via Java:

OntologyFactory.newOntology("pizza", "/de/uni_trier/procake/test/ontology/pizza.owl");
  • RetrievalFactory: This factory creates several retriever instances. A description of the available system retrievers can be found in the section about retrieval in this wiki.
  • SimilarityModelFactory: This factory provides the implementations of the SimilarityModel interfaces. ProCAKE contains a default similarity model, which can be accessed by using the method getDefaultSimilarityModel().

Also, an interface FactoryConfiguration exists. It contains the configurations of the factories including the implementations and the corresponding parameters. It only has the method List<FactoryInformation> getFactories() that returns a list of all factories which are specified in the configuration. The class FactoryInformation contains the name of the factory class and two lists: one list contains parameters of this factory, the other list contains implementations of it.

There is also the class FactoryObjectImplementation, which must be extended by every factory object. The initialization of each factory object is executed by the CompositionManager in three steps: First, the preInit method is called with the initialization parameters for all implementations. Then, the implementation is bind to the factory. Afterwards, the postInit method is called, which establishes connections of the implementation to other factories.

Data Objects, Case Bases, and Object Pools #

Creating Instances of Data Classes (Data Objects) #

Instances of data classes can be created during run-time using a loaded model (default model can be retrieved using ModelFactory.getDefaultModel()). It has a method createObject which requires the name of the desired data class. After the creation of the object/instance, class-specific values can be set by using the corresponding methods:

StringObject stringObject = ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObject.setNativeString("test");

Object Pools #

In ProCAKE, data objects can be collected in pools. These pools exist in two variants, as readable object pools and as writeable object pools. A pool can contain arbitrary DataObjects. That means that the objects can belong to different DataClasses.

Each object pool must have an identifier and this identifier must be unique in complete ProCAKE during run-time. If the pool is created using the method ObjectPoolFactory.newObjectPool(), this is guaranteed. If custom object pools are created, it must be taken care, that the identifier is different to ObjectPoolFactory.POOL_NAME.

To access a data object in an object pool, each data object has to be identified. A data object can be element of several pools, so a simple identification number is not sufficient, because it must be unique in all pools. Therefore, a data object has a specific objectId. This id can be set manually, for example in the casebase. In this case, the user is responsible for ensuring that this id is unique. If an object without an id is read into an object pool, an id is generated. This id consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is an unique id that must be unique within the object pool and is managed by the object pool itself. This is just a necessary pre-condition to realize the synchronization and identification of objects, not the complete synchronization technique.

New objectIds can be created manually using the method ObjectPoolFactory.newObjectId(String objectPoolId, String offset) or are created automatically using the method WriteableObjectPool.store(DataObject dataObject). To access the objects of a pool without using the id, the DataObjectIterator can be used. It extends the standard Iterator and contains one additional method nextDataObject, that returns a DataObject.

Readable Object Pools #

A ReadableObjectPool pool is a container for data objects. The pool provides special access methods that are interpreting the ObjectIds of the data objects. This class is only an interface, so it’s not possible, to create any instances of it.

To check, if two object pools have the same objects in, the method hasSameValueAsIn(ReadableObjectPool objectPool) can be used. If two pools contain the same objects, true is returned.

Writeable Object Pools #

A WritableObjectPool extends the ReadableObjectPool and contains additional methods to modify the pool. Therefore, remove and store methods are provided.

The remove method needs any kind of data object or the offset of the object, to remove this data object from the pool. The objectId is automatically removed from the object. It is also possible to use the method removeAll() to clear the complete pool.

The store method needs any kind of data object, which will be stored in the pool. Thereby, a new unique objectId is automatically created, if the object does not have one already. If the object is already a member of the pool, nothing will happen. It’s also possible to use the method storeAll with a collection or a ReadableObjectPool, which will be stored in the WritableObjectPool.

When using the method CAKEInstance.start, a WritableObjectPool is returned. If the method had a casebase given, the objects of the casebase are all stored in the pool. To write a pool to a casebase file, the ObjectPoolWriterImpl can be used. For this, a ProCAKE writer has to be accessed and the name of the ObjectPoolWriterImpl has to be given. The following code can be used:

IOUtil.writeFile(pool, "casebase.xml", ObjectPoolWriterImpl.WRITERNAME);

Using a CSV Case Base #

It’s also possible, to use a CSV file as case base or parse it into an existing one. In a CSV file, only structural cases can be represented. So, the parser can be used to create aggregate objects. To read and use such a file, the class CakeCSVParser can be used. It requires the path of the CSV file, which can be set by using the method setFilename. This looks like:

CakeCSVParser csvParser = new CakeCSVParser();
csvParser.setFilename("casebase.csv");

For using the parser, a declaration containing the corresponding classes is necessary. It’s important, that every name in the CSV file is unique, otherwise an exception will be thrown. If the column names in the CSV are equal to the attribute names of the aggregate class to map each CSV example to, the CSV parser can be used without any more configurations.

For example, a simple csv file can look like that:

"attribute1","attribute2"
"value1","valueA"
"value2","valueB"

In the above example, it is necessary, that the model contains an aggregate class that contains attributes with the names attribute1 and attribute2. Otherwise, the file cannot be read without further information.

If there are different names in the CSV columns and the model class or the file doesn’t contain any header information at all, a mapping file is required. This file also has to be in the CSV format. It contains the name of the column in the CSV file as the first column and the name in the model as the second column. It might look like this:

"nameInCSV","nameInModell"
"attribute1","attributeA"
"attribute2","attributeB"

Here, attribute1 will correspond to attributeA in the model, as well as attribute2 will be mapped to attributeB.

Alternatively, the column number can be the first argument and the attribute name in the model the second one. This can look like:

"rowInCSV","nameInModell"
1,"attributeA"
2,"attributeB"

Here, the values of the first column of the CSV file containing the examples will correspond to attributeA in the model, and the values of the second column to attributeB.

To use such a mapping file, the method setMappingFile can be used. This might look like this:

csvParser.setMappingFile("mapping.csv");

When the configuration of the parser is finished, it can be started via the method createAggregateObjects(String className). The parameter className must be the name of the corresponding aggregate class in the model. The method will return a list of aggregate objects, filled with the values of the CSV file as aggregate attributes.

Because the creation of the objects runs in parallel, the order of the objects in the object pool isn’t deterministic.

It’s also possible to use aggregate classes, which contain nested aggregate classes in any depth. In this case, the classes need to be defined in the model file. The attribute names need to be unique, so they or their mapping partners can be found in the CSV file. This way, the information of the CSV file, which contains no hierarchical relations, can be mapped to an aggregate object that might contain nested attributes, thus expressing hierarchical relationships.