Configuration

Configuration #

This page contains the following content:

Configuration at build-time via XML #

The configuration via XML files is meant to be used to provide an initial configuration for the start-up of ProCAKE.

General Configuration #

A benefit of the pattern driven architecture of ProCAKE is extensibility. For instance, the various implementations for persistence, retrieval, and adaptation are organized in factories. Implementations can be registered and configured in an XML file (usually named composition.xml) that is read once at start-up.

Using the method CakeInstance.start, several configuration files can be provided. By default, the pre-defined composition file located under /de/uni_trier/wi2/composition.xml is used. The method CakeInstance.start(String composition) can be used as follows:

CakeInstance.start("/path/composition.xml");

Data Model #

In addition to the system classes, custom-structured classes (referred to as user classes) can be defined as sub-types of system classes. At start-up, the system can initialize the default model based on a configuration file. Even though several custom models can be defined, the usual practice is to use a single model definition as default (usually named model.xml). The definition of custom user data classes in the model.xml is described on page Data Classes. When only using system data classes, a custom model definition is not necessary. However, some abstract system data classes such as collections require the definition of user classes. By default, ProCAKE is instantiated with an empty model. If you define custom classes in the ‘model.xml’, make sure that the name of the new class is unique and that a referenced class is defined in the XML file before it is referenced by another class. Such references are however currently supported for aggregate classes.

The method CakeInstance.start(String composition, String model, String simModel, String casebase) can be used as to specify an initial data model and similarity model as follows:

CakeInstance.start("/path/composition.xml","/path/model.xml","/path/simModel.xml","/path/casebase.xml");

Defining a model and simModel parameter also requires to specify a composition and casebase parameter. However, it is possible to set parameters to null. In this event, the default files are used instead.

Similarity Model #

To allow for comparing data objects, similarity measures have to be defined for the corresponding system and user classes used in the cases etc. Analogous to the data model several similarity models can be defined while usually a single model (named sim.xml) is used.

To avoid defining a measure for all specific classes, measures can also be defined for super classes. The parent class of all classes is named Data.

When computing the similarity between two objects, the system first determines the least common class type of the objects and searches for a similarity measure that is defined for that class type. If no such measure is defined, it searches for available measures for super classes of the least common class. If no measure can be found at all, an error is thrown.

A similarity measure can be set as default for a data class, so that the measure is automatically selected for a class or its sub-classes in the event that several applicable measures exist.

If no custom similarity model is given at system start, a pre-defined similarity model file located under /de/uni_trier/wi2/sim.xml is loaded with some basic similarity measures as a fallback. The definition of similarity measures in an XML file is described on page Similarity Measures.

Three pre-defined similarity measures are instantiated in that event:

  1. ObjectEqual: If query and case object are of the type Atomic, their values are checked for equality.
  2. AggregateAverage: If query and case object are of the type Aggregate, it uses the arithmetic mean to aggregate all local similarities between the contained values.
  3. TableDataClass: If none of the previous measures is applicable, this measure is used and assesses the similarity as 0.0. Only for the comparison of a Void object as query to any other case object, the similarity is always 1.0.

Case Bases #

ProCAKE can be initialized with an existing case base. The contained objects are parsed from the given XML file and are written to an object pool, which is returned by the CakeInstance.start method. Please note, that a case base requires a corresponding model file if custom data classes have been used.

Using the casebase parameter, also requires the definition of the composition parameter in method CakeInstance.start(String composition, String casebase). If the composition parameter is set to null, the default composition configuration is used.

CakeInstance.start("/path/composition.xml","/path/casebase.xml");

In turn, if the path to a case base is set, the method returns an instance of WriteableObjectPool. This pool contains the objects that have been parsed from the given file.

Configuration at run-time via Java #

While the XML configuration is usually used for the initial instantiation of ProCAKE, the run-time configuration allows for further modifications.

General Configuration #

The pre-defined composition file located under /de/uni_trier/wi2/composition.xml contains several factory implementations whose factory objects can be used at run-time to bind or unbind implementations. Please refer to the interface cake.utils.composition.Factory for more details. Regarding factory implementations and IO objects, we highly recommend you to configure them in the XML file and let them be registered during ProCAKE start. Registering instances during run-time might lead to side effects.

Data Model #

Data classes can be created dynamically during run-time via Java code. Fur this purpose, it is necessary to access an existing data class of ProCAKE. These can easily referenced by using methods of the default model, which can be accessed by using ModelFactory.getDefaultModel. Every data class can be called by the method getClass(String name), where * name* must refer to the class name of an existing class. For the system data classes, these names can be found in the class interfaces. For example, the name of Data Class can be accessed by DataClass.CLASS_NAME. For the system data classes, it’s also possible to use specific methods of the model, for example getDataSystemClass().

Using this existing class, the method createSubclass(String name) can be used. The string name is the name of the subclass, wich has to be unique. Otherwise an exception would be thrown. The method returns a instance of the new data class. The createSubclass method adds the class to the model. Before it can be used, the method finishEditing() has to be called. After this is done, the class can’t be edited anymore. When trying this, an exception is thrown accordingly.

The following code lines give an example of a data class creation. The created class is a subclass of the String class:

StringClass stringClass = ModelFactory.getDefaultModel().getClass(StringClass.CLASS_NAME);
StringClass customStringClass = (StringClass) stringClass.createSubclass("customStringClass");
customStringClass.finishEditing();

The instance customStringClass of the newly created subclass of StringClass can be used inside this specific code block. However, if an instance of the same class is required at some later point in time in another code block, the instance customStringClass might not be referenceable anymore. For this case, the newly created data class is also added to the model and can be retrieved later on via its unique class name.

StringClass customStringClass= (StringClass) ModelFactory.getDefaultModel().getClass("customStringClass");

Similarity Model #

Similarity measures can be also created or extended dynamically during run-time via Java code. For this purpose, it is necessary to access an existing similarity model of ProCAKE. A new similarity measure is created based on an existing template. A new instance can be created by specifying a name of a template and a data class the measure is responsible for. System data classes can be easily referenced by using methods of the default model, which can be accessed via ModelFactory.getDefaultModel().

After the creation of the measure, specific configurations can be made using the measure methods. All the available configurations can be found in the descriptions of the respective measures on page Similarity Measures.

Finally, the measure can be added to the similarity model using a unique name as identifier. Measures can override existing measures with the same name by using the method setForceOverride. If no override flag is set and there is already a measure with the same name, an exception is thrown.

The following gives an example instantiation of String Equals measure:

SimilarityModel simModel = SimilarityModelFactory.getDefaultSimilarityModel();
SMStringEqual simMeasure = (SMStringEqual) simModel.createSimilarityMeasure(SMStringEqual.NAME, ModelFactory.getDefaultModel().getStringSystemClass());
simMeasure.setCaseInsensitive();
simMeasure.setForceOverride(true);
simModel.addSimilarityMeasure(simMeasure, "MyCustomStringEqual");

New similarity measure templates can be also registered at run-time. For this purpose, it is required to create a Java class that implements the interface SimilarityMeasure. Example implementations of such classes can be found in all system similarity measures (e.g., SMStringEqualImpl and SMAggregateAverageImpl). The most important part of this implementation is the definition of the method compute(DataObject queryObject, DataObject caseObject, SimilarityValuator valuator), which computes a similarity from two data objects. The fully defined similarity measure can be registered in the similarity model as described above. In fact, defining a similarity measure at run-time via Java Code is the only way to use fully custom measures in an application since the configuration via XML only allows to configure existing system measures.

Factories #

The Factory classes are the connection between the interfaces and the implementations, that should be used. If an object of an interface is needed, a request to the corresponding factory is sent which returns the object.

Beside the methods of the factory interface, each factory contains two kinds of static methods:

  • get-methods: Each method, that starts with get reuses an existing factory object (e.g., a registered instance of a reader or writer). Therefore, most of these methods begin with getDefault to emphasize this issue.
  • new-methods: Each method, that starts with new creates a new object. In some factories, it is necessary to specify the newly created object as the default object, thus the new-method exists with a boolean parameter.

The Factory interface defines four methods that are implemented by each factory implementation:

  • addParameter(AbstractParameter parameter): This method adds a parameter to the current factory.
  • boolean bind (Object implementation): This method binds an implementation to the factory. It returns true if binding was successful. It is usually called in the start-up process of ProCAKE.
  • reset(): This method has to be implemented individually by every factory in order to allow a proper restart of ProCAKE. For example, this method might reset the default factory object or other stateful variables, i.e., every variable that should be put into an initial state in order to allow a proper restart.
  • boolean unbind(Object implementation): This method unbinds an implementation from the factory. It returns true if unbinding was successful.

There are several implementations of the Factory class already integrated in ProCAKE:

  • AdaptationFactory: This factory creates several adaptation instances.
  • IOFactory: This factory contains all registered Readers , Writers, and ContentHandlers of ProCAKE. To get such a component, first the names of the readers have to be requested and afterwards a request to the component with the name has to be made.
  • LoggerFactory: This factory contains the default Logger of ProCAKE. The logger provides a unified method to log messages and inform available logging listeners.
  • MessageFormatterFactory: This factory contains the default MessageFormatter. The MessageFormatter loads and formats localized messages.
  • ModelFactory: This factory allows access to the implementations of the data model interfaces. ProCAKE contains a default model, which can be accessed by using the method getDefaultModel(). The factory can also manage several other models, which can be created by the method newModel(String name).
  • ObjectPoolFactory: This factory is used to create ObjectIds and object pools, i.e., case bases.
  • OntologyFactory: This factory is used to load an ontology from an .owl file and give it a name to be referenced from a URIClass in the model. Example loading the pizza.owl file via the composition.xml:
<Factory name="ontology" class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactory">
  <Implementation class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactoryObjectImpl">
    <Parameter name="ontologyName" value="pizza"/>
    <Parameter name="ontologyPath" value="/pizza.owl"/>
  </Implementation>
</Factory>

It is also possible to provide a second path as a backup to ontologyPath. If the ontology could not be loaded from the first path the backup path is used instead.

<Parameter name="ontologyPathBackup" value="/path/pizza.owl"/>

Example loading the pizza.owl file via Java:

OntologyFactory.newOntology("pizza", "/pizza.owl");
  • RetrievalFactory: This factory creates several retriever instances. A description of the available system retrievers can be found in the section about retrieval in this wiki.
  • SimilarityModelFactory: This factory provides the implementations of the SimilarityModel interfaces. ProCAKE contains a default similarity model, which can be accessed by using the method getDefaultSimilarityModel().

Also, an interface FactoryConfiguration exists. It contains the configurations of the factories including the implementations and the corresponding parameters. It only has the method List<FactoryInformation> getFactories() that returns a list of all factories which are specified in the configuration. The class FactoryInformation contains the name of the factory class and two lists: one list contains parameters of this factory, the other list contains implementations of it.

There is also the class FactoryObjectImplementation, which must be extended by every factory object. The initialization of each factory object is executed by the CompositionManager in three steps: First, the preInit method is called with the initialization parameters for all implementations. Then, the implementation is bind to the factory. Afterwards, the postInit method is called, which establishes connections of the implementation to other factories.

Data Objects, Case Bases, and Object Pools #

Creating Instances of Data Classes (Data Objects) #

Instances of data classes can be created during run-time using a loaded model (default model can be retrieved using ModelFactory.getDefaultModel()). It has a method createObject which requires the name of the desired data class. After the creation of the object/instance, class-specific values can be set by using the corresponding methods:

StringObject stringObject = ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
stringObject.setNativeString("test");

For system data classes, the model has methods to return them directly. Thus, the String class used above can also be queried with the getStringSystemClass() method.

Object Pools #

In ProCAKE, data objects can be collected in pools. These pools exist in two variants, as readable object pools and as writeable object pools. A pool can contain arbitrary DataObjects. That means that the objects can belong to different DataClasses.

Each object pool must have an identifier and this identifier must be unique in complete ProCAKE during run-time. If the pool is created using the method ObjectPoolFactory.newObjectPool(), this is guaranteed. If custom object pools are created, it must be taken care, that the identifier is different to ObjectPoolFactory.POOL_NAME.

To access a data object in an object pool, each data object has to be identified. A data object can be element of several pools, so a simple identification number is not sufficient, because it must be unique in all pools. Therefore, a data object has a specific objectId. This id can be set manually, for example in the casebase. In this case, the user is responsible for ensuring that this id is unique. If an object without an id is read into an object pool, an id is generated. This id consists of two parts: the base and the offset. The base is a namespace and depends on the location, where the object is stored. Mainly this is an object pool. The offset is an unique id that must be unique within the object pool and is managed by the object pool itself. This is just a necessary pre-condition to realize the synchronization and identification of objects, not the complete synchronization technique.

New objectIds can be created manually using the method ObjectPoolFactory.newObjectId(String objectPoolId, String offset) or are created automatically using the method WriteableObjectPool.store(DataObject dataObject). To access the objects of a pool without using the id, the DataObjectIterator can be used. It extends the standard Iterator and contains one additional method nextDataObject, that returns a DataObject.

Readable Object Pools #

A ReadableObjectPool pool is a container for data objects. The pool provides special access methods that are interpreting the ObjectIds of the data objects. This class is only an interface, so it’s not possible, to create any instances of it.

To check, if two object pools have the same objects in, the method hasSameValueAsIn(ReadableObjectPool objectPool) can be used. If two pools contain the same objects, true is returned.

Writeable Object Pools #

A WritableObjectPool extends the ReadableObjectPool and contains additional methods to modify the pool. Therefore, remove and store methods are provided.

The remove method needs any kind of data object or the offset of the object, to remove this data object from the pool. The objectId is automatically removed from the object. It is also possible to use the method removeAll() to clear the complete pool.

The store method needs any kind of data object, which will be stored in the pool. Thereby, a new unique objectId is automatically created, if the object does not have one already. If the object is already a member of the pool, nothing will happen. It’s also possible to use the method storeAll with a collection or a ReadableObjectPool, which will be stored in the WritableObjectPool.

When using the method CAKEInstance.start, a WritableObjectPool is returned. If the method had a casebase given, the objects of the casebase are all stored in the pool. To write a pool to a casebase file, the ObjectPoolWriterImpl can be used. For this, a ProCAKE writer has to be accessed and the name of the * ObjectPoolWriterImpl* has to be given. The following code can be used:

IOUtil.writeFile(pool, "casebase.xml", ObjectPoolWriterImpl.WRITERNAME);

Using a CSV Case Base #

It’s also possible, to use a CSV file as case base or parse it into an existing one. In a CSV file, only structural cases can be represented. So, the parser can be used to create aggregate objects. To read and use such a file, the class CakeCSVParser can be used. It requires the path of the CSV file, which can be set by using the method setFilename. This looks like:

CakeCSVParser csvParser = new CakeCSVParser();
csvParser.setFilename("casebase.csv");

For using the parser, a declaration containing the corresponding classes is necessary. It’s important, that every name in the CSV file is unique, otherwise an exception will be thrown. If the column names in the CSV are equal to the attribute names of the aggregate class to map each CSV example to, the CSV parser can be used without any more configurations.

For example, a simple csv file can look like that:

"attribute1","attribute2"
"value1","valueA"
"value2","valueB"

In the above example, it is necessary, that the model contains an aggregate class that contains attributes with the names attribute1 and attribute2. Otherwise, the file cannot be read without further information.

If there are different names in the CSV columns and the model class or the file doesn’t contain any header information at all, a mapping file is required. This file also has to be in the CSV format. It contains the name of the column in the CSV file as the first column and the name in the model as the second column. It might look like this:

"nameInCSV","nameInModell"
"attribute1","attributeA"
"attribute2","attributeB"

Here, attribute1 will correspond to attributeA in the model, as well as attribute2 will be mapped to attributeB.

Alternatively, the column number can be the first argument and the attribute name in the model the second one. This can look like:

"rowInCSV","nameInModell"
1,"attributeA"
2,"attributeB"

Here, the values of the first column of the CSV file containing the examples will correspond to attributeA in the model, and the values of the second column to attributeB.

To use such a mapping file, the method setMappingFile can be used. This might look like this:

csvParser.setMappingFile("mapping.csv");

When the configuration of the parser is finished, it can be started via the method createAggregateObjects(String className). The parameter className must be the name of the corresponding aggregate class in the model. The method will return a list of aggregate objects, filled with the values of the CSV file as aggregate attributes.

Because the creation of the objects runs in parallel, the order of the objects in the object pool isn’t deterministic.

It’s also possible to use aggregate classes, which contain nested aggregate classes in any depth. In this case, the classes need to be defined in the model file. The attribute names need to be unique, so they or their mapping partners can be found in the CSV file. This way, the information of the CSV file, which contains no hierarchical relations, can be mapped to an aggregate object that might contain nested attributes, thus expressing hierarchical relationships.

ProCAKE (re-)starting process #

The start of a CakeInstance passes through the following 5 Steps:

  1. Check for existing CakeInstanceCache and resetting of old factories
  2. Initialisation of factories by the CompositionManager
  3. Initialisation of the data model (if provided)
  4. Initialisation of the similarity model (if provided)
  5. Optional re-integration of data classes from preceding CakeInstance
  6. CakeInstanceCache reference retainment
  7. Validation of similarity model accordance with data model
  8. Transformation configuration
  9. Case base initialisation (optionally from file)

In the following a few key aspects will be highlighted.

Composition Manager #

The CompositionManager is responsible for initialising all factories and their implementations defined in the provided composition.xml.

Excerpt from CakeInstance.start():

CompositionManager cm = new CompositionManager();  
if (composition != null) {  
   cm.setConfigurationSource(new InputSource(IOUtil.getInputStream(composition)));  
} else {  
   cm.setConfigurationSource(new InputSource(IOUtil.getInputStream(ResourcePaths.PATH_COMPOSITION)));  
}  
CakeInstanceCache newCache = cm.build();

If a path to a composition file is given, the corresponding file is used as input. Otherwise the default path to "/de/uni_trier/procake/composition.xml" is taken.

Within CompositionManager.build():

public void build() {  
  XMLConfigurationParser parser = new XMLConfigurationParser();  
  parser.setConfigurationSource(this.configurationSource);  
  FactoryConfiguration conf = parser.getConfiguration();  
  return build(conf);  
}

The CompositionManager ensures parsing the composition file by the XMLConfigurationParser which returns a FactoryConfiguration. Based on the FactoryConfiguration the build process of the factories is initiated:

public void build(FactoryConfiguration configuration) {  
  List<FactoryInformation> factories = configuration.getFactories();  
  List<FactoryObjectImplementation> implementations = new ArrayList<>();  
  
  for (FactoryInformation factoryInformation : factories) {  
	...
  }  
  
  for (FactoryObjectImplementation implementation : implementations) {  
    implementation.postInit();  
  }  
  return new CakeInstanceCache(factories, implementations, true); 
}

The FactoryConfiguration provides a list of FactoryInformations which represent all factories defined in the composition file. These will be stored in the CakeInstanceCache to facilitate a rigouros reset of all factories on a potential subsequent restart, even if the restart initializes different factories. For each FactoryInformation a Factory is initialized as follows:

Class<?> clazz = Class.forName(factoryInformation.getClassName());  
  
// create new information  
Constructor<?> declaredConstructor = clazz.getDeclaredConstructor();  
declaredConstructor.setAccessible(true);  
Factory factory = (Factory) declaredConstructor.newInstance();  
  
// reset old factory information  
factory.reset();  
  
// set factory parameters  
for (AbstractParameter factoryParameter : factoryInformation.getFactoryParameters()) {  
  factory.addParameter(factoryParameter);
}

for (FactoryImplementationInformation implementationInformation : factoryInformation.getImplementations()) {  
  ...
}  

First, an instance of the factory is obtained via reflection to access the Factory-Interface methods. Then each factory gets reseted, which usually clears all static members of the corresponding Factory-class. Then the factory parameters are set. After that, the concrete implementations of the factory (i.e. instances of classes which derive from FactoryObjectImplementation) will get initialized and bound to the factory. The FactoryInformation provides a list of FactoryImplementaionInformation based on which the FactoryObjectImplementations are initialized in three steps:

  • The preInit method is called with the initialization parameters for all implementations.
  • The implementation is bind to the factory which usually means that the factory holds static references to its implementations. An example usage would be ModelFactory.getDefaultModel() which as an intermediate step gets the stored reference to a ModelFactoryObjectImpl-Instance which holds a reference to the data model (or creates a new one on request). That model will get returned.
  • The postInit method is called after all implementations are bound. The implementation can now establish connections to other factories.

So for each FactoryImplementaionInformation the following code gets executed:

FactoryObjectImplementation implementation =  
    (FactoryObjectImplementation)  
        Class.forName(implementationInformation.getClassName())  
            .getDeclaredConstructor()  
            .newInstance();  
implementation.preInit(implementationInformation.getParameters());  
if (factory.bind(implementation)) {  
  ...// logging 
} else {  
  ...// logging 
}  
implementations.add(implementation);

Note that the implementation instance gets added to a list for a later call to the postInit-method (see the build-method above).

IOFactory and io components #

The IOFactory contains all registered Reader, Writer, and ContentHandler of the ProCAKE. By default these are:

  • contentHandlers:
    • WorkflowHandler
    • AdaptionConfigHandler
    • NESTWorkflowHandler
    • DataObjectContentHandler
    • ObjectPoolHandler
  • readers:
    • ObjectSAXParser
    • TransformationConfigParser
    • SimilarityModelParser
    • Deserializer
    • StringReader
    • ObjectPoolParser
    • AdaptionConfigParser
    • ModelParser
  • writers:
    • XercesSaxWorkflowWriter
    • StringWriter
    • ObjectPoolSaxWriter
    • XercesSaxNESTWorkflowWriter
    • ObjectSaxWriter
    • Serializer

All listed io components can be created using the IOFactory.newIO(String). The name of the io component must be either known or requested with one of the methods

  • IOFactory.getContentHandlerNamesFor(Class),
  • IOFactory.getReaderNamesFor(Class), or
  • IOFactory.getWriterNamesFor(Class).

When invoking IOFactory.newIO(String), a copy of a pre-initialized io component is created:

public static IO newIO(String name) {  
 IO io = getIOInternalObject(name);  
 if (io == null) {  
    return null;  
 }  
 try {  
    // We create a new instance by copying the original instance cached at system start.  
    // This ensures, that all temporary references cached during last use are discarded.
    return io.copy();  
 } catch (Exception e) {  
    e.printStackTrace();  
 }  
  
  return null;  
}

With getIOInternalObject, the IOFactory checks its static maps for an entry with the corresponding name and returns an priorly registered instance of an io component. The copy-method of the interface IO creates and returns a new instance of the corresponding class and sets its familyName to the one of the instance which gets copied. The implementation of this method usually overwrites the property familyName declared in the class IOImpl from which each implementation-class typically derives from. The initialization of the io components happens when the IOFactory gets instanciated by the CompositionManager (see above). Every io component from the above mentioned list is a defined implementation of the IOFactory. Each of them derive from IOImpl and therefore also derive from FactoryObjectImplementation and will get bound to the IOFactory. Here are two examples of the default IOFactory-implementations present in the default composition.xml:

<Factory  name="io_factory"  class="de.uni_trier.wi2.procake.utils.io.IOFactory">
	<Implementation  class="de.uni_trier.wi2.procake.data.io.xml.xerces_saxImpl.ModelParserImpl"/>
	<Implementation  class="de.uni_trier.wi2.procake.data.io.xml.xerces_saxImpl.ObjectParser">
		<Parameter  name="family"  value="DataObjects_CAKE_Standard"/>
	</Implementation>
</Factory>

The preInit-method of each FactoryObjectImplementation that derives from IOImpl checks if there is a factory parameter of name "family" and if so, the static attribute FACTORY_PARAMETER_FAMILYdeclared in IOImpl will be set accordingly and also the property familyName will be set to its value. Otherwise both will be set to "undefined". Note that a call to setFamily(String) only changes the property of the instance but not the static attribute FACTORY_PARAMETER_FAMILY nor its value.

Data Model #

Within CakeInstance.start():

if (model != null) {  
  // overwrites the default model set in composition.xml  
  initDefaultModel(model);  
  newCache.setModelFileHash(model);
}  

If a path to a data model is given, it will be loaded as the default model and a reference will be retained in the CakeInstanceCache. The initialisation takes place as follows:

private static void initDefaultModel(String path) {  
  ModelParserImpl modelParser = (ModelParserImpl) IOFactory.newIO(ModelParserImpl.PARSERNAME);  
  modelParser.setModelToBeInitialized(ModelFactory.newModel(ModelFactory.DEFAULT_MODEL_NAME));  
  modelParser.setInputStream(IOUtil.getInputStream(path));  
  modelParser.read();  
}

First, an io component of type ModelParserImpl is obtained by using the newIO-method of the IOFactory. Then the parser is configured to use a newly created data model (ModelImpl) whose reference gets stored in the ModelFactoryObjectImpl which can be accesed via the ModelFactory by using ModelFactory.getDefaultModel(). Note that on creation of the ModelImpl-instance the system class tree gets created and added to the data model. This includes the root class (DataClassImpl) as well as the following implementations of DataClass (which will be all defined as subclasses of the root class within the data model):

  • VoidClassImpl
  • AtomicClassImpl
  • CollectionClassImpl
  • UnionClassImpl
  • AggregateClassImpl
  • IntervalClassImpl
  • AbstractWorkflowItemClassImpl
  • NESTWorkflowClassImpl
  • NESTGraphItemClassImpl

Next, the input stream is set to the file specified for the data model. The read()-method initiates the parsing process as follows:

public Object read() throws CakeIOException {  
  
  XMLParser parser = new XMLSchemaBasedParser(this.getClass().getClassLoader());  
  ModelHandler handler = new ModelHandler(model);  
  handler.setFinishElements(true);  
  parser.setContentHandler(handler);  
  
  if (this.filename != null) {  
    parser.setFilename(this.filename);  
  }  
  if (this.inputStream != null) {  
    parser.setInputStream(this.inputStream);  
  }  
  
  parser.read();  
  
  return handler.getObject();  
}

The main objective here is to connect the XMLSchemaBasedParser to the ModelHandler which extends the org.xml.sax.ContentHandler of the SAX parser API and handles the creation of the ModelImpl-instance based on the xml-file content. The getObject()-method of the parser returns the data model.

The main methods of the ModelHandler are the implemented callbacks startElement and endElementwhich get called by the SAX parser. Taken from the SAX docs for startElement: The Parser will invoke this method at the beginning of every element in the XML document; there will be a corresponding endElement event for every startElement event (even when the element is empty). Depending on the name of the xml tag a different subclass of the predefined system classes (see list above) gets created and added to the model. Remember that all classes defined in the data model at some point always derive from a system class. As an example, lets look at an excerpt from deviationModel.xml from the CakeFlexibility project:

<AggregateClass  name="Constraint"  superClass="Aggregate">
	<Attribute  name="firstTask"  class="String"/>
	<Attribute  name="optional"  class="Boolean"/>
	<Attribute  name="status"  class="ConstraintStatus"/>
	<Attribute  name="origin"  class="ConstraintOrigin"/>
</AggregateClass>

<CollectionClass  name="ConstraintSet"  superClass="Set">
	<ElementClass  name="Constraint"/>
</CollectionClass>

<AggregateClass  name="PrecedenceConstraint"  superClass="Constraint">
	<Attribute  name="secondTask"  class="String"/>
</AggregateClass>

Here, a subclass of Aggregate with four attributes will be created. Note that two attributes are of a type which are non-system classes and are defined prior to "Constraint". Then there is also a "ConstraintSet" which uses the Constraint data class and finally a subclass of Constraint with an additional attribute.

Similarity Model #

Within CakeInstance.start:

  if (simModel != null) {  
    // overwrites the default similarity model set in composition.xml  
    initDefaultSimModel(simModel);  
  } 

If a path to a similarity model is given, it will be loaded as the default similarity model:

private static void initDefaultSimModel(String path) {  
  SimilarityModelParserImpl modelParser = (SimilarityModelParserImpl) IOFactory.newIO(SimilarityModelParserImpl.PARSERNAME);  
  modelParser.setModelDependency(ModelFactory.getDefaultModel());  
  modelParser.setSimilarityModelToBeInitialized(SimilarityModelFactory.newSimilarityModel(SimilarityModelFactory.DEFAULT_SIM_MODEL_NAME));  
  modelParser.setInputStream(IOUtil.getInputStream(path));  
  modelParser.read();  
}

At first, a SimilarityModelParserImpl instance is obtained via the IOFactory. Then the default data model is provided which will later be used by the SimilarityModelHandler. After that, a new similarity model is created analogous to the data model initialization by creating a new instance of SimilarityModelImpl whose reference gets stored indirectly by the SimilarityModelFactory wrapped inside an instance ofSimilarityModelFactoryObjectImpl. The default similarity model can be obtained by SimilarityModelFactory.getDefaultSimilarityModel(). Note that on the creation of the SimilarityModelImpl the similarity measure cache gets initialized. Internally, this is a static map attribute of the SimilarityModelImpl which gets populated by all available similarity measures (SimilarityMeasureImpl) referenced by their name.

During parsing, the occurrence of a tag name listed in the SimilarityTags interface will initiate the creation of an instance of the corresponding similarity measure with the following code:

private SimilarityMeasure createSimilarityMeasure(String simMeasureName, Attributes attributes)      throws ClassNotFoundException, NameAlreadyExistsException, NameNotFoundException {  
  String className = attributes.getValue(ATT_CLASS);  
  DataClass dataClass = this.dataModel.getClass(className);  
  if (dataClass == null) {  
    logger.warn("Could not create similarity measure - class {} not found!", className);  
  return null;  }  
  SimilarityMeasure sm = simModel.createSimilarityMeasure(simMeasureName, dataClass);  
  String name = attributes.getValue(ATT_NAME);  
  
  sm.setForceOverride(Boolean.parseBoolean(attributes.getValue(ATT_FORCEOVERRIDE)));  
  simModel.addSimilarityMeasure(sm, name);  
  
  boolean isDefault = Boolean.parseBoolean(attributes.getValue(ATT_DEFAULT));  
  if (isDefault) {  
    simModel.setDefaultSimilarityMeasure(dataClass, name);  
  }  
  
  return sm;  
}

At first, the corresponding data class is obtained via the default data model. Next, in the method call simModel.createSimilarityMeasure(simMeasureName, dataClass), a lookup in the static similarityMeasureTemplateCache attribute of SimilarityModelImpl is performed and a new instance of the class which the entry refers to will be created and returned. Note that all similarity measures have to be added to this map prior to this step during the initialization of SimilarityModelImpl as mentioned above. The returned SimilarityMeasureImpl will then be added to the similarity model and if the default flag is set in the xml file, it will also get added to the static classDefaultMeasure attribute of the SimilarityModelImpl. An example entry within the similarity file looks like this:

<GraphAStarOne  class="NESTWorkflow"  name="GraphAStarOne"  default="false"/>

Re-integration of data classes on a restart #

When working with data models, especially with provided data model files, the need for triggering a restart to reset factory values might arise. ProCAKE is capable of re-integrating DataClasses from a preceeding start, so that objects from different ProCAKE instances share the same DataClass reference as long as the name-property stays the same.

Note that the process of determining classes for re-integration is a name-based string-comparison

By default, a re-integration of previously existing DataClass references will take place if either:

  • The hash value of the provided model file on a start is the same as the previous one
  • No model path has been provided after a start with likewise no model file

This results in no recycling of DataClass references on a start in following scenarios:

  • Start with no model file -> start with model file xy
  • Start with model file xy -> start with no model file
  • Start with model file xy -> start with model file yz (with the exception of the hash values of both xy and yz being the same)

even if there are overlapping (=name-matching) DataClass references.

The re-integration of data classes can be omitted by providing false for the reuseDataClassReferencesAtRestart-flag on a restart.