Configuration

Configuration #

This page contains the following content:

Configuration at build-time via XML #

The configuration via XML files is meant to be used to provide an initial configuration for the start-up of ProCAKE.

General Configuration #

A benefit of the pattern driven architecture of ProCAKE is extensibility. For instance, the various implementations for persistence, retrieval, and adaptation are organized in factories. Implementations can be registered and configured in an XML file (usually named composition.xml) that is read once at start-up.

Using the method CakeInstance.start, several configuration files can be provided. By default, the pre-defined composition file located under /de/uni_trier/wi2/composition.xml is used. The method CakeInstance.start(String composition) can be used as follows:

CakeInstance.start("/path/composition.xml");

or

CakeInstance.getInstance().withComposition("/path/composition.xml").run();

Data Model #

In addition to the system classes, custom-structured classes (referred to as user classes) can be defined as sub-types of system classes. At start-up, the system can initialize the default model based on a configuration file. Even though several custom models can be defined, the usual practice is to use a single model definition as default (usually named model.xml). The definition of custom user data classes in the model.xml is described on page Data Classes. When only using system data classes, a custom model definition is not necessary. However, some abstract system data classes such as collections require the definition of user classes. By default, ProCAKE is instantiated with an empty model. If you define custom classes in the ‘model.xml’, make sure that the name of the new class is unique and that a referenced class is defined in the XML file before it is referenced by another class. Such references are however currently supported for aggregate classes.

The method CakeInstance.start(String composition, String model, String simModel, String casebase) can be used to specify an initial data model and similarity model as follows:

CakeInstance.start("/path/composition.xml","/path/model.xml","/path/simModel.xml","/path/casebase.xml");

Defining a model and simModel parameter also requires to specify a composition and casebase parameter. However, it is possible to set parameters to null. In this event, the default files are used instead.

Similarity Model #

To allow for comparing data objects, similarity measures have to be defined for the corresponding system and user classes used in the cases etc. Analogous to the data model several similarity models can be defined while usually a single model (named sim.xml) is used.

To avoid defining a measure for all specific classes, measures can also be defined for super classes. The parent class of all classes is named Data.

When computing the similarity between two objects, the system first determines the least common class type of the objects and searches for a similarity measure that is defined for that class type. If no such measure is defined, it searches for available measures for super classes of the least common class. If no measure can be found at all, an error is thrown.

A similarity measure can be set as default for a data class, so that the measure is automatically selected for a class or its sub-classes in the event that several applicable measures exist.

If no custom similarity model is given at system start, a pre-defined similarity model file located under /de/uni_trier/wi2/sim.xml is loaded with some basic similarity measures as a fallback. The definition of similarity measures in an XML file is described on page Similarity Measures.

Three pre-defined similarity measures are instantiated in that event:

  1. ObjectEqual: If query and case object are of the type Atomic, their values are checked for equality.
  2. AggregateAverage: If query and case object are of the type Aggregate, it uses the arithmetic mean to aggregate all local similarities between the contained values.
  3. TableDataClass: If none of the previous measures is applicable, this measure is used and assesses the similarity as 0.0. Only for the comparison of a Void object as query to any other case object, the similarity is always 1.0.

If multiple similarity models are used, it is important that they consist of similarity measures with different names from each other.

WriteableObjectPool pool = CakeInstance.start("/path/composition.xml","/path/model.xml",
  new String[]{"/path/simModel.xml","/path/empty_sim"},"/path/casebase.xml");

Case Bases #

ProCAKE can be initialized with an existing case base. The contained objects are parsed from the given XML file and are written to an object pool, which is returned by the CakeInstance.start method. Please note, that a case base requires a corresponding model file if custom data classes have been used.

Using the casebase parameter, also requires the definition of the composition parameter in method CakeInstance.start(String composition, String casebase). If the composition parameter is set to null, the default composition configuration is used.

CakeInstance.start("/path/composition.xml","/path/casebase.xml");

In turn, if the path to a case base is set, the method returns an instance of WriteableObjectPool. This pool contains the objects that have been parsed from the given file.

InputStream Configuration #

The configuration files can also be provided by InputStreams, that contain the configuration XML files.

The method CakeInstance.start(InputStream composition) can be used as follows:

String composition = "/path/composition.xml";
InputStream compositionAsInputStream = IOUtil.getInputStream(composition);

WriteableObjectPool pool = CakeInstance.start(compositionAsInputStream);

or alternatively:

String composition = "/path/composition.xml";
InputStream compositionAsInputStream = getClass().getResourceAsStream(composition);

WriteableObjectPool pool = CakeInstance.start(compositionAsInputStream);

Configuration at run-time via Environment Variables #

ProCAKE configuration values can be provided as environment variables. Environment variables always take precedence over other configuration options and overwrite them. This method of configuration is particularly useful in scenarios in which ProCAKE is packed and the program code can no longer be changed.

KeyValue
PROCAKE_PATH_COMPOSITIONpath to composition.xml, e.g. /de/uni_trier/wi2/composition.xml
PROCAKE_PATH_MODELpath to model.xml, e.g. /de/uni_trier/wi2/model.xml
PROCAKE_PATH_CASEBASEpath to casebase.xml, e.g. /de/uni_trier/wi2/casebase.xml
PROCAKE_PATH_TRANSFORMATIONpath to transformation.xml, e.g. /de/uni_trier/wi2/transformation.xml
PROCAKE_PATH_SIMpath to one or multiple sim.xml separated by ::, e.g. /de/uni_trier/wi2/sim.xml or /de/uni_trier/wi2/sim.xml::/de/uni_trier/wi2/other-sim.xml
PROCAKE_REUSE_DATACLASSEStrue or false

Configuration at run-time via Java #

While the XML configuration is usually used for the initial instantiation of ProCAKE, the run-time configuration allows for further modifications.

General Configuration #

The pre-defined composition file located under /de/uni_trier/wi2/composition.xml contains several factory implementations whose factory objects can be used at run-time to bind or unbind implementations. Please refer to the interface cake.utils.composition.Factory for more details. Regarding factory implementations and IO objects, we highly recommend you to configure them in the XML file and let them be registered during ProCAKE start. Registering instances during run-time might lead to side effects.

Data Model #

Data classes can be created dynamically during run-time via Java code. Fur this purpose, it is necessary to access an existing data class of ProCAKE. These can easily referenced by using methods of the default model, which can be accessed by using ModelFactory.getDefaultModel. Every data class can be called by the method getClass(String name), where * name* must refer to the class name of an existing class. For the system data classes, these names can be found in the class interfaces. For example, the name of Data Class can be accessed by DataClass.CLASS_NAME. For the system data classes, it’s also possible to use specific methods of the model, for example getDataSystemClass().

Using this existing class, the method createSubclass(String name) can be used. The string name is the name of the subclass, wich has to be unique. Otherwise an exception would be thrown. The method returns a instance of the new data class. The createSubclass method adds the class to the model. Before it can be used, the method finishEditing() has to be called. After this is done, the class can’t be edited anymore. When trying this, an exception is thrown accordingly.

The following code lines give an example of a data class creation. The created class is a subclass of the String class:

Wiki_ConfigurationTest.java

    StringClass stringClass = ModelFactory.getDefaultModel().getClass(StringClass.CLASS_NAME);
    StringClass customStringClass = (StringClass) stringClass.createSubclass("customStringClass");
    customStringClass.finishEditing();

The instance customStringClass of the newly created subclass of StringClass can be used inside this specific code block. However, if an instance of the same class is required at some later point in time in another code block, the instance customStringClass might not be referenceable anymore. For this case, the newly created data class is also added to the model and can be retrieved later on via its unique class name.

Wiki_ConfigurationTest.java

    customStringClass = (StringClass) ModelFactory.getDefaultModel().getClass("customStringClass");

Similarity Model #

Similarity measures can be also created or extended dynamically during run-time via Java code. For this purpose, it is necessary to access an existing similarity model of ProCAKE. A new similarity measure is created based on an existing template. A new instance can be created by specifying a name of a template and a data class the measure is responsible for. System data classes can be easily referenced by using methods of the default model, which can be accessed via ModelFactory.getDefaultModel().

After the creation of the measure, specific configurations can be made using the measure methods. All the available configurations can be found in the descriptions of the respective measures on page Similarity Measures.

Finally, the measure can be added to the similarity model using a unique name as identifier. Measures can override existing measures with the same name by using the method setForceOverride. If no override flag is set and there is already a measure with the same name, an exception is thrown.

The following gives an example instantiation of String Equals measure:

Wiki_ConfigurationTest.java

    SimilarityModel simModel = SimilarityModelFactory.getDefaultSimilarityModel();
    SMStringEqual simMeasure = (SMStringEqual) simModel.createSimilarityMeasure(SMStringEqual.NAME,
        ModelFactory.getDefaultModel().getStringSystemClass());
    simMeasure.setCaseInsensitive();
    simMeasure.setForceOverride(true);
    simModel.addSimilarityMeasure(simMeasure, "MyCustomStringEqual");

New similarity measure templates can be also registered at run-time. For this purpose, it is required to create a Java class that implements the interface SimilarityMeasure. Example implementations of such classes can be found in all system similarity measures (e.g., SMStringEqualImpl and SMAggregateAverageImpl). The most important part of this implementation is the definition of the method compute(DataObject queryObject, DataObject caseObject, SimilarityValuator valuator), which computes a similarity from two data objects. The fully defined similarity measure can be registered in the similarity model as described above. In fact, defining a similarity measure at run-time via Java Code is the only way to use fully custom measures in an application since the configuration via XML only allows to configure existing system measures.

Factories #

The Factory classes are the connection between the interfaces and the implementations, that should be used. If an object of an interface is needed, a request to the corresponding factory is sent which returns the object.

Beside the methods of the factory interface, each factory contains two kinds of static methods:

  • get-methods: Each method, that starts with get reuses an existing factory object (e.g., a registered instance of a reader or writer). Therefore, most of these methods begin with getDefault to emphasize this issue.
  • new-methods: Each method, that starts with new creates a new object. In some factories, it is necessary to specify the newly created object as the default object, thus the new-method exists with a boolean parameter.

The Factory interface defines four methods that are implemented by each factory implementation:

  • addParameter(AbstractParameter parameter): This method adds a parameter to the current factory.
  • boolean bind (Object implementation): This method binds an implementation to the factory. It returns true if binding was successful. It is usually called in the start-up process of ProCAKE.
  • reset(): This method has to be implemented individually by every factory in order to allow a proper restart of ProCAKE. For example, this method might reset the default factory object or other stateful variables, i.e., every variable that should be put into an initial state in order to allow a proper restart.
  • boolean unbind(Object implementation): This method unbinds an implementation from the factory. It returns true if unbinding was successful.

There are several implementations of the Factory class already integrated in ProCAKE:

  • AdaptationFactory: This factory creates several adaptation instances.
  • IOFactory: This factory contains all registered Readers , Writers, and ContentHandlers of ProCAKE. To get such a component, first the names of the readers have to be requested and afterwards a request to the component with the name has to be made.
  • ModelFactory: This factory allows access to the implementations of the data model interfaces. ProCAKE contains a default model, which can be accessed by using the method getDefaultModel(). The factory can also manage several other models, which can be created by the method newModel(String name).
  • ObjectPoolFactory: This factory is used to create ObjectIds and object pools, i.e., case bases.
  • OntologyFactory: This factory is used to load an ontology from an .owl file and give it a name to be referenced from a URIClass in the model. Example loading the pizza.owl file via the composition.xml:

composition.xml

    <Factory name="ontology" class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactory">
        <Implementation class="de.uni_trier.wi2.procake.utils.ontology.OntologyFactoryObjectImpl">
            <Parameter name="ontologyName" value="pizza"/>
            <Parameter name="ontologyPath" value="https://protege.stanford.edu/ontologies/pizza/pizza.owl"/>
        </Implementation>
    </Factory>

It is also possible to provide a second path as a backup to ontologyPath. If the ontology could not be loaded from the first path the backup path is used instead.

<Parameter name="ontologyPathBackup" value="/path/pizza.owl"/>

Example loading the pizza.owl file via Java:

Wiki_ConfigurationTest.java

    OntologyFactory.newOntology("pizza", "/pizza.owl");

  • RetrievalFactory: This factory creates several retriever instances. A description of the available system retrievers can be found in the section about retrieval in this wiki.
  • SimilarityModelFactory: This factory provides the implementations of the SimilarityModel interfaces. ProCAKE contains a default similarity model, which can be accessed by using the method getDefaultSimilarityModel().

Also, an interface FactoryConfiguration exists. It contains the configurations of the factories including the implementations and the corresponding parameters. It only has the method List<FactoryInformation> getFactories() that returns a list of all factories which are specified in the configuration. The class FactoryInformation contains the name of the factory class and two lists: one list contains parameters of this factory, the other list contains implementations of it.

There is also the class FactoryObjectImplementation, which must be extended by every factory object. The initialization of each factory object is executed by the CompositionManager in three steps: First, the preInit method is called with the initialization parameters for all implementations. Then, the implementation is bind to the factory. Afterwards, the postInit method is called, which establishes connections of the implementation to other factories.

ProCAKE (re-)starting process #

The start of a CakeInstance passes through the following 5 Steps:

  1. Check for existing CakeInstanceCache and resetting of old factories
  2. Initialisation of factories by the CompositionManager
  3. Initialisation of the data model (if provided)
  4. Initialisation of the similarity model (if provided)
  5. Optional re-integration of data classes from preceding CakeInstance
  6. CakeInstanceCache reference retainment
  7. Validation of similarity model accordance with data model
  8. Transformation configuration
  9. Case base initialisation (optionally from file)

In the following a few key aspects will be highlighted.

Composition Manager #

The CompositionManager is responsible for initialising all factories and their implementations defined in the provided composition.xml.

Excerpt from CakeInstance.start():

CompositionManager cm = new CompositionManager();  
if (composition != null) {  
   cm.setConfigurationSource(new InputSource(IOUtil.getInputStream(composition)));  
} else {  
   cm.setConfigurationSource(new InputSource(IOUtil.getInputStream(ResourcePaths.PATH_COMPOSITION)));  
}  
CakeInstanceCache newCache = cm.build();

If a path to a composition file is given, the corresponding file is used as input. Otherwise the default path to "/de/uni_trier/procake/composition.xml" is taken.

Within CompositionManager.build():

public void build() {  
  XMLConfigurationParser parser = new XMLConfigurationParser();  
  parser.setConfigurationSource(this.configurationSource);  
  FactoryConfiguration conf = parser.getConfiguration();  
  return build(conf);  
}

The CompositionManager ensures parsing the composition file by the XMLConfigurationParser which returns a FactoryConfiguration. Based on the FactoryConfiguration the build process of the factories is initiated:

public void build(FactoryConfiguration configuration) {  
  List<FactoryInformation> factories = configuration.getFactories();  
  List<FactoryObjectImplementation> implementations = new ArrayList<>();  
  
  for (FactoryInformation factoryInformation : factories) {  
	...
  }  
  
  for (FactoryObjectImplementation implementation : implementations) {  
    implementation.postInit();  
  }  
  return new CakeInstanceCache(factories, implementations, true); 
}

The FactoryConfiguration provides a list of FactoryInformations which represent all factories defined in the composition file. These will be stored in the CakeInstanceCache to facilitate a rigouros reset of all factories on a potential subsequent restart, even if the restart initializes different factories. For each FactoryInformation a Factory is initialized as follows:

Class<?> clazz = Class.forName(factoryInformation.getClassName());  
  
// create new information  
Constructor<?> declaredConstructor = clazz.getDeclaredConstructor();  
declaredConstructor.setAccessible(true);  
Factory factory = (Factory) declaredConstructor.newInstance();  
  
// reset old factory information  
factory.reset();  
  
// set factory parameters  
for (AbstractParameter factoryParameter : factoryInformation.getFactoryParameters()) {  
  factory.addParameter(factoryParameter);
}

for (FactoryImplementationInformation implementationInformation : factoryInformation.getImplementations()) {  
  ...
}  

First, an instance of the factory is obtained via reflection to access the Factory-Interface methods. Then each factory gets reseted, which usually clears all static members of the corresponding Factory-class. Then the factory parameters are set. After that, the concrete implementations of the factory (i.e. instances of classes which derive from FactoryObjectImplementation) will get initialized and bound to the factory. The FactoryInformation provides a list of FactoryImplementaionInformation based on which the FactoryObjectImplementations are initialized in three steps:

  • The preInit method is called with the initialization parameters for all implementations.
  • The implementation is bind to the factory which usually means that the factory holds static references to its implementations. An example usage would be ModelFactory.getDefaultModel() which as an intermediate step gets the stored reference to a ModelFactoryObjectImpl-Instance which holds a reference to the data model (or creates a new one on request). That model will get returned.
  • The postInit method is called after all implementations are bound. The implementation can now establish connections to other factories.

So for each FactoryImplementaionInformation the following code gets executed:

FactoryObjectImplementation implementation =  
    (FactoryObjectImplementation)  
        Class.forName(implementationInformation.getClassName())  
            .getDeclaredConstructor()  
            .newInstance();  
implementation.preInit(implementationInformation.getParameters());  
if (factory.bind(implementation)) {  
  ...// logging 
} else {  
  ...// logging 
}  
implementations.add(implementation);

Note that the implementation instance gets added to a list for a later call to the postInit-method (see the build-method above).

IOFactory and io components #

The IOFactory contains all registered Reader, Writer, and ContentHandler of the ProCAKE. By default these are:

  • contentHandlers:
    • WorkflowHandler
    • AdaptionConfigHandler
    • NESTWorkflowHandler
    • DataObjectContentHandler
    • ObjectPoolHandler
  • readers:
  • ObjectSAXParser
  • TransformationConfigParser
  • SimilarityModelParser
  • Deserializer
  • StringReader
  • ObjectPoolParser
  • AdaptionConfigParser
  • ModelParser
  • writers:
  • XercesSaxWorkflowWriter
  • StringWriter
  • ObjectPoolSaxWriter
  • XercesSaxNESTWorkflowWriter
  • ObjectSaxWriter
  • Serializer
  • ModelWriter
  • SimilarityModelWriter

All listed io components can be created using the IOFactory.newIO(String). The name of the io component must be either known or requested with one of the methods

  • IOFactory.getContentHandlerNamesFor(Class),
  • IOFactory.getReaderNamesFor(Class), or
  • IOFactory.getWriterNamesFor(Class).

When invoking IOFactory.newIO(String), a copy of a pre-initialized io component is created:

public static IO newIO(String name) {  
 IO io = getIOInternalObject(name);  
 if (io == null) {  
    return null;  
 }  
 try {  
    // We create a new instance by copying the original instance cached at system start.  
    // This ensures, that all temporary references cached during last use are discarded.
    return io.copy();  
 } catch (Exception e) {  
    e.printStackTrace();  
 }  
  
  return null;  
}

With getIOInternalObject, the IOFactory checks its static maps for an entry with the corresponding name and returns an priorly registered instance of an io component. The copy-method of the interface IO creates and returns a new instance of the corresponding class and sets its familyName to the one of the instance which gets copied. The implementation of this method usually overwrites the property familyName declared in the class IOImpl from which each implementation-class typically derives from. The initialization of the io components happens when the IOFactory gets instanciated by the CompositionManager (see above). Every io component from the above mentioned list is a defined implementation of the IOFactory. Each of them derive from IOImpl and therefore also derive from FactoryObjectImplementation and will get bound to the IOFactory. Here are two examples of the default IOFactory-implementations present in the default composition.xml:


<Factory name="io_factory" class="de.uni_trier.wi2.procake.utils.io.IOFactory">
    <Implementation class="de.uni_trier.wi2.procake.data.io.xml.xerces_saxImpl.ModelReader"/>
    <Implementation class="de.uni_trier.wi2.procake.data.io.xml.xerces_saxImpl.ObjectReader">
        <Parameter name="family" value="DataObjects_CAKE_Standard"/>
    </Implementation>
</Factory>

The preInit-method of each FactoryObjectImplementation that derives from IOImpl checks if there is a factory parameter of name "family" and if so, the static attribute FACTORY_PARAMETER_FAMILYdeclared in IOImpl will be set accordingly and also the property familyName will be set to its value. Otherwise both will be set to "undefined". Note that a call to setFamily(String) only changes the property of the instance but not the static attribute FACTORY_PARAMETER_FAMILY nor its value.

Data Model #

Within CakeInstance.start():

if (model != null) {  
  // overwrites the default model set in composition.xml  
  initDefaultModel(model);  
  newCache.setModelFileHash(model);
}  

If a path to a data model is given, it will be loaded as the default model and a reference will be retained in the CakeInstanceCache. The initialisation takes place as follows:

private static void initDefaultModel(String path) {  
  ModelParserImpl modelParser = (ModelParserImpl) IOFactory.newIO(ModelParserImpl.PARSERNAME);  
  modelParser.setModelToBeInitialized(ModelFactory.newModel(ModelFactory.DEFAULT_MODEL_NAME));  
  modelParser.setInputStream(IOUtil.getInputStream(path));  
  modelParser.read();  
}

First, an io component of type ModelParserImpl is obtained by using the newIO-method of the IOFactory. Then the parser is configured to use a newly created data model (ModelImpl) whose reference gets stored in the ModelFactoryObjectImpl which can be accesed via the ModelFactory by using ModelFactory.getDefaultModel(). Note that on creation of the ModelImpl-instance the system class tree gets created and added to the data model. This includes the root class (DataClassImpl) as well as the following implementations of DataClass (which will be all defined as subclasses of the root class within the data model):

  • VoidClassImpl
  • AtomicClassImpl
  • CollectionClassImpl
  • UnionClassImpl
  • AggregateClassImpl
  • IntervalClassImpl
  • AbstractWorkflowItemClassImpl
  • NESTWorkflowClassImpl
  • NESTGraphItemClassImpl

Next, the input stream is set to the file specified for the data model. The read()-method initiates the parsing process as follows:

public Object read() throws CakeIOException {  
  
  XMLParser parser = new XMLSchemaBasedParser(this.getClass().getClassLoader());  
  ModelHandler handler = new ModelHandler(model);  
  handler.setFinishElements(true);  
  parser.setContentHandler(handler);  
  
  if (this.filename != null) {  
    parser.setFilename(this.filename);  
  }  
  if (this.inputStream != null) {  
    parser.setInputStream(this.inputStream);  
  }  
  
  parser.read();  
  
  return handler.getObject();  
}

The main objective here is to connect the XMLSchemaBasedParser to the ModelHandler which extends the org.xml.sax.ContentHandler of the SAX parser API and handles the creation of the ModelImpl-instance based on the xml-file content. The getObject()-method of the parser returns the data model.

The main methods of the ModelHandler are the implemented callbacks startElement and endElementwhich get called by the SAX parser. Taken from the SAX docs for startElement: The Parser will invoke this method at the beginning of every element in the XML document; there will be a corresponding endElement event for every startElement event (even when the element is empty). Depending on the name of the xml tag a different subclass of the predefined system classes (see list above) gets created and added to the model. Remember that all classes defined in the data model at some point always derive from a system class. As an example, lets look at an excerpt from deviationModel.xml from the CakeFlexibility project:

<AggregateClass  name="Constraint"  superClass="Aggregate">
	<Attribute  name="firstTask"  class="String"/>
	<Attribute  name="optional"  class="Boolean"/>
	<Attribute  name="status"  class="ConstraintStatus"/>
	<Attribute  name="origin"  class="ConstraintOrigin"/>
</AggregateClass>

<CollectionClass  name="ConstraintSet"  superClass="Set">
	<ElementClass  name="Constraint"/>
</CollectionClass>

<AggregateClass  name="PrecedenceConstraint"  superClass="Constraint">
	<Attribute  name="secondTask"  class="String"/>
</AggregateClass>

Here, a subclass of Aggregate with four attributes will be created. Note that two attributes are of a type which are non-system classes and are defined prior to "Constraint". Then there is also a "ConstraintSet" which uses the Constraint data class and finally a subclass of Constraint with an additional attribute.

Model Writer #

To serialise a created model with all contained classes and associated attributes by saving it to a xml file, the following lines must be added to the composition.xml file used to call CakeInstance.start():


<Factory name="io_factory" class="de.uni_trier.wi2.procake.utils.io.IOFactory">
    <Implementation class="de.uni_trier.wi2.procake.data.io.xml.xerces_writerImpl.ModelWriter">
        <Parameter name="family" value="DataObjects_CAKE_Standard"/>
    </Implementation>
</Factory>

Once the Model Writer has been added to composition.xml, the model can be serialised using the function call IOUtil.writeFile(). The first parameter of the function is the model itself, the second parameter must be the path where the xml-file is to be saved.

IOUtil.writeFile(model, filePath);

IOUtil provides functions for serialising a given Data Model to a xml file.

Similarity Model #

Within CakeInstance.start:

  if (simModel != null) {  
    // overwrites the default similarity model set in composition.xml  
    initDefaultSimModel(simModel);  
  } 

If a path to a similarity model is given, it will be loaded as the default similarity model:

private static void initDefaultSimModel(String path) {  
  SimilarityModelParserImpl modelParser = (SimilarityModelParserImpl) IOFactory.newIO(SimilarityModelParserImpl.PARSERNAME);  
  modelParser.setModelDependency(ModelFactory.getDefaultModel());  
  modelParser.setSimilarityModelToBeInitialized(SimilarityModelFactory.newSimilarityModel(SimilarityModelFactory.DEFAULT_SIM_MODEL_NAME));  
  modelParser.setInputStream(IOUtil.getInputStream(path));  
  modelParser.read();  
}

At first, a SimilarityModelParserImpl instance is obtained via the IOFactory. Then the default data model is provided which will later be used by the SimilarityModelHandler. After that, a new similarity model is created analogous to the data model initialization by creating a new instance of SimilarityModelImpl whose reference gets stored indirectly by the SimilarityModelFactory wrapped inside an instance ofSimilarityModelFactoryObjectImpl. The default similarity model can be obtained by SimilarityModelFactory.getDefaultSimilarityModel(). Note that on the creation of the SimilarityModelImpl the similarity measure cache gets initialized. Internally, this is a static map attribute of the SimilarityModelImpl which gets populated by all available similarity measures (SimilarityMeasureImpl) referenced by their name.

During parsing, the occurrence of a tag name listed in the SimilarityTags interface will initiate the creation of an instance of the corresponding similarity measure with the following code:

private SimilarityMeasure createSimilarityMeasure(String simMeasureName, Attributes attributes)      throws ClassNotFoundException, NameAlreadyExistsException, NameNotFoundException {  
  String className = attributes.getValue(ATT_CLASS);  
  DataClass dataClass = this.dataModel.getClass(className);  
  if (dataClass == null) {  
    logger.warn("Could not create similarity measure - class {} not found!", className);  
  return null;  }  
  SimilarityMeasure sm = simModel.createSimilarityMeasure(simMeasureName, dataClass);  
  String name = attributes.getValue(ATT_NAME);  
  
  sm.setForceOverride(Boolean.parseBoolean(attributes.getValue(ATT_FORCEOVERRIDE)));  
  simModel.addSimilarityMeasure(sm, name);  
  
  boolean isDefault = Boolean.parseBoolean(attributes.getValue(ATT_DEFAULT));  
  if (isDefault) {  
    simModel.setDefaultSimilarityMeasure(dataClass, name);  
  }  
  
  return sm;  
}

At first, the corresponding data class is obtained via the default data model. Next, in the method call simModel.createSimilarityMeasure(simMeasureName, dataClass), a lookup in the static similarityMeasureTemplateCache attribute of SimilarityModelImpl is performed and a new instance of the class which the entry refers to will be created and returned. Note that all similarity measures have to be added to this map prior to this step during the initialization of SimilarityModelImpl as mentioned above. The returned SimilarityMeasureImpl will then be added to the similarity model and if the default flag is set in the xml file, it will also get added to the static classDefaultMeasure attribute of the SimilarityModelImpl. An example entry within the similarity file looks like this:

<GraphAStarOne  class="NESTWorkflow"  name="GraphAStarOne"  default="false"/>

Similarity Model Writer #

In order to serialise a created similarity model with all contained similarity measures and the associated attributes by saving it into a xml file, the composition file used to call CakeInstance.start() must first be extended by the following lines:


<Factory name="io_factory" class="de.uni_trier.wi2.procake.utils.io.IOFactory">
    <Implementation class="de.uni_trier.wi2.procake.data.io.xml.xerces_writerImpl.SimilarityModelWriter">
        <Parameter name="family" value="DataObjects_CAKE_Standard"/>
    </Implementation>
</Factory>

After adding the Similarity Model Writer to composition.xml, the similarity model can now be serialised using the function IOUtil.writeFile(). The first parameter of the function is the similarity model itself, the second parameter must contain the path where the xml-file is to be saved.

IOUtil.writeFile(similarityModel, filePath);

IOUtil contains functions for serialising a given Similarity Model to a xml file.

Re-integration of data classes on a restart #

When working with data models, especially with provided data model files, the need for triggering a restart to reset factory values might arise. ProCAKE is capable of re-integrating DataClasses from a preceeding start, so that objects from different ProCAKE instances share the same DataClass reference as long as the name-property stays the same.

Note that the process of determining classes for re-integration is a name-based string-comparison

By default, a re-integration of previously existing DataClass references will NOT take place.

The re-integration of data classes can be enabled by setting the reuseDataClassReferencesAtRestart flag to true on a restart, for example:

CakeInstance.getInstance().withReuseDataClassReferencesAtRestart(true).run();

Taxonomy Updater #

The taxonomy updater allows integrating the manual revised taxonomies into the ProCAKE domain model. It allows to create a taxonomy inside an XML file and to generate from it the model and similarity model files to use this taxonomy.

In the following, the example of the graphics card taxonomy is used, which is also described here. Such an XML file can look like this:

combinedTaxonomy.xml

<node value="Graphics Card" weight="0.0">
    <node value="S3 Graphics Card" weight="0.5">
        <node value="S3 Virge Card" weight="0.7">
            <node value="ELSA 2000"/>
            <node value="Stealth 3D200"/>
        </node>
        <node value="S3 Trio Card" weight="0.9">
            <node value="Miro Video"/>
            <node value="VGA V64"/>
        </node>
    </node>
    <node value="MGA Graphics Card" weight="0.8">
        <node value="Matrox Mill. 220"/>
        <node value="Matrox Mystique"/>
    </node>
</node>

This file consists entirely of nodes and nested nodes. Each node must contain a value and can also have a weight.

To read this file, a new instance of the class TaxonomyUpdater must be created. This can be done as follows:

Wiki_TaxonomyUpdaterTest.java

    TaxonomyUpdater taxonomyUpdater = new TaxonomyUpdater(
      "/de/uni_trier/wi2/procake_demos/wiki/utils/taxonomyUpdater/combinedTaxonomy.xml"
    );

Each constructor of the class TaxonomyUpdater has a String parameter pathCombined that points to the location of the XML file. If a path is specified in the project, as in the example, it is automatically detected. However, other paths, e.g. to the desktop, can also be specified. In this case, the string must then contain the complete path.

In addition to pathCombined, two other String variables can be specified: pathSim and pathModel. These specify the paths to output XML files. If only file names are used, these files are stored in the folder where the file with the combined taxonomy is located. This can look like this.

Wiki_TaxonomyUpdaterTest.java

    taxonomyUpdater = new TaxonomyUpdater(
      "/de/uni_trier/wi2/procake_demos/wiki/utils/taxonomyUpdater/combinedTaxonomy.xml",
      "sim.xml",
      "model.xml"
    );

However, the output files are stored in the target folder and not in the resources folder. If the output files are to be located in a different path, this can be specified directly. Below is an example where the output files are placed on the desktop.

Wiki_TaxonomyUpdaterTest.java

    String PATH_DESKTOP = FileSystemView.getFileSystemView().getHomeDirectory().getAbsolutePath() + separator;
    taxonomyUpdater = new TaxonomyUpdater(
      "/de/uni_trier/wi2/procake_demos/wiki/utils/taxonomyUpdater/combinedTaxonomy.xml",
      PATH_DESKTOP + "sim.xml",
      PATH_DESKTOP + "model.xml"
    );

Analogous to the four different taxonomy similarity measures, there are four methods that can now be used to create the taxonomy model and similarity model definitions:

  1. updateTaxonomyClassic(String className, String measureName)
  2. updateTaxonomyClassicUserWeights(String className, String innerNodeInQueryStrategy, String innerNodeInCaseStrategy, String measureName)
  3. updateTaxonomyNodeHeight(String className, String strategy, String measureName)
  4. updateTaxonomyClassicPath(String className, double weightUp, double weightDown, String measureName)

All of them contain a String parameter className, which specifies the name of the class in the model, and a String parameter measureName, which specifies the name of the similarity measure.

The selection of methods only affects the sim.xml, the model.xml is output equally for each method.

The method call for a taxonomy classic measure looks like this:

Wiki_TaxonomyUpdaterTest.java

    taxonomyUpdater.updateTaxonomyClassic("GraphicsCard", "SMTaxonomyClassic");

After method call, the two files sim.xml and model.xml exist in the specified path. These are complete files that can be read in when CakeInstance is started, but only contain the taxonomy classes. To get the paths of the model and the similarity model files, the methods getModelPath() and getSimModelPath() can be used. This can look like this, for example:

Wiki_TaxonomyUpdaterTest.java

    CakeInstance.start(
      PATH_COMPOSITION,
      taxonomyUpdater.getModelPath(),
      taxonomyUpdater.getSimModelPath(),
      null
    );

To create a taxonomy classic user weights measure, the following method call can be used:

Wiki_TaxonomyUpdaterTest.java

    taxonomyUpdater.updateTaxonomyClassicUserWeights(
      "GraphicsCard",
      "optimistic",
      "optimistic",
      "SMTaxonomyClassicUserWeights"
    );

The innerNodeInQueryStrategy and the innerNodeInCaseStrategy can be specified as String parameters. The parameters "optimistic", "pessimistic" and "average" are provided for this purpose.

The creation of a taxonomy node height measure TaxonomyNodeHeight is done with:

Wiki_TaxonomyUpdaterTest.java

    taxonomyUpdater.updateTaxonomyNodeHeight("GraphicsCard", "optimistic", "SMTaxonomyNodeHeight");

The String parameter strategy can be set here, for which the three values "optimistic", "pessimistic" and "average" are also possible.

A taxonomy path measure measure can be created with the following:

Wiki_TaxonomyUpdaterTest.java

    taxonomyUpdater.updateTaxonomyPath("GraphicsCard", 1.0, 1.0, "SMTaxonomyPath");

Here the Double parameters weightUp and weightDown are to be set.

The output similarity models can be found in the description of the respective similarity measures. The corresponding model file is also listed there, which is the same for each of the measures.