Getting Started

Getting Started #

This page contains the following content:

Getting Started with ProCAKE #

ProCAKE is pre-configured and can be started with a single line of code


For initializing the framework with custom configuration, please refer to the Configuration page.

The initialized domain model already contains several system classes and corresponding similarity measures. The available classes and measures can be retrieved as follows:


Create atomic objects and calculate similarity #

Objects of data classes (e.g. a String object) can be created as follows:

Model model = ModelFactory.getDefaultModel();
StringObject stringObjectA = model.createObject(StringClass.CLASS_NAME);
StringObject stringObjectB = model.createObject(StringClass.CLASS_NAME);

For creating custom data objects, please refer to the Data Classes page.

The similarity between two data objects can be calculated using a SimilarityValuator:

SimilarityValuator simVal = SimilarityModelFactory.newSimilarityValuator();
Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB);

The result of the similarity calculation is returned as a Similarity object and can be printed to the console with the following line


By default, without a custom similarity model, strings will be compared using the equals method. The resulting similarity is:

ObjectEqual: hello (String) -> world (String): 0.0

At run-time, we can also create a new similarity measure. In the following, we create a measure for comparing strings with levenshtein distance:

SimilarityMeasure measure = SimilarityModelFactory.getDefaultSimilarityModel().createSimilarityMeasure(SMStringLevenshtein.NAME, ModelFactory.getDefaultModel().getStringSystemClass());

To calculate the similarity of StringObjects using levenshtein distance provide the name of the measure as an additional argument to the similarity valuator:

Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB, measure.getName());

If we add this measure to the similarity model, it will be automatically invoked by the similarity valuator:

SimilarityModelFactory.getDefaultSimilarityModel().addSimilarityMeasure(measure, SMStringLevenshtein.NAME);
Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB);

The console output of the similarity object is:

StringLevenshtein: hello (String) -> world (String): 0.19999999999999996

Create aggregate objects and calculate similarity #

Let’s create more complex data objects. We first have to create a custom class as follows:

AggregateClass aggregateClass = (AggregateClass) model.getAggregateSystemClass().createSubclass("MyAggregate");
aggregateClass.addAttribute("attA", model.getStringSystemClass());
aggregateClass.addAttribute("attB", model.getIntegerSystemClass());

Then we can create objects as follows:

AggregateObject aggregateObjectA = model.createObject("MyAggregate");
StringObject stringObject = (StringObject) model.getStringSystemClass().newObject();
stringObject.setNativeString("hello world");
aggregateObjectA.setAttributeValue("attA", stringObject);
IntegerObject integerObjectA = (IntegerObject) model.getIntegerSystemClass().newObject();
aggregateObjectA.setAttributeValue("attB", integerObjectA);

AggregateObject aggregateObjectB = model.createObject("MyAggregate");
aggregateObjectB.setAttributeValue("attA", model.getVoidSystemClass().newObject());
IntegerObject integerObjectB = (IntegerObject) model.getIntegerSystemClass().newObject();
aggregateObjectB.setAttributeValue("attB", integerObjectB);

For the similarity computation we use the pre-initialized system measures:

Similarity similarity = simVal.computeSimilarity(aggregateObjectA, aggregateObjectB);

The console output of the similarity is:

AggregateAverage:aggregate (MyAggregate) -> aggregate (MyAggregate): 0.5
	ObjectEqual:42 (Integer) -> 42 (Integer): 1.0
	DataClass:hello world (String) -> : (Void): 0.0

The output shows all the computed local similarities as well as the resulting global similarity for the aggregate object. For comparing the attribute ‘attB’ the measure ‘ObjectEqual’ was used as no specific measure for integers was defined in the similarity model. For comparing attribute ‘attA’ the measure ‘DataClass’ was used, as this is the only available measure in the model that is capable of comparing a String and Void value. For computing the overall similarity, the default measure for aggregate objects ‘AggregateAverage’ was used.

Please note that, by default, this similarity computation is asymmetric:

Similarity similarity = simVal.computeSimilarity(aggregateObjectB, aggregateObjectA);

The console output of the similarity is:

AggregateAverage:aggregate (MyAggregate) -> aggregate (MyAggregate): 1.0
	ObjectEqual:42 (Integer) -> 42 (Integer): 1.0
	DataClass:: (Void) -> hello world (String): 1.0

Input and output #

File paths in the project #

There are several useful paths for reading and writing files:

// external resources
String PATH_USER_HOME = System.getProperty("user.home") + File.separator+ "custom_folder" + File.separator;
String PATH_USER_DESKTOP = FileSystemView.getFileSystemView().getHomeDirectory().getAbsolutePath() + File.separator+ "custom_folder" + File.separator;

// classpath resources
String PATH_COMPOSITION = "/de/uni_trier/wi2/composition.xml";
String PATH_SIM_MODEL = "/de/uni_trier/wi2/sim.xml";

Please note that during project builds all classpath resources, i.e., all files under src/main/resources and src/test/resources, are copied to target/classes. At run-time, these files are always read from this target folder when using relative paths!

As a general rule of thumb it is recommended to read files from the classpath using the relative path to the resources folders and to read and write files to the local file system using absolute paths. It is also recommended to use File.separator instead of slashes to be independent of the operating system.

Readers #

ProCAKE provides several implementations for reading and writing objects. All the available reader and writer implementation are registered in the ‘composition.xml’ configuration file. See Configuration page for more details.

For instance, to read an object pool, the following method can be used:

WriteableObjectPool pool = IOUtil.readFile(filePath, WriteableObjectPool.class);

For reading an object with a specific reader, the name of the desired reader can be given as an argument as follows:

WriteableObjectPool pool = IOUtil.readFile(filePath, ObjectPoolParser.PARSERNAME);

To read a single data object, the ‘ObjectParser’ can be used:

DataObject object = IOUtil.readFile(filePath, ObjectParser.PARSERNAME);

The ‘StringReader’ can be used for reading arbitrary text files:

String stringObject = IOUtil.readFile(filePath, StringReader.READERNAME);

Writers #

To write an object pool to a file, the following method can be used:

WriteableObjectPool poolToWrite = ObjectPoolFactory.newObjectPool();
IOUtil.writeFile(poolToWrite, filePath);

When writing an object, also a specific writer can be used. The name of the desired writer has to be given for this purpose:

IOUtil.writeFile(poolToWrite, filePath, ObjectPoolWriterImpl.WRITERNAME);

To write a data object as XML, the ‘ObjectPoolWriterImpl’ can be used:

IOUtil.writeFile(dataObject, filePath, ObjectWriterImpl.WRITERNAME);

To write a data object or Strings to a txt file, the ‘StringWriter’ can be used:

IOUtil.writeFile(dataObject, filePath, StringWriter.WRITERNAME);

Cloning and comparing objects #

ProCAKE provides particular methods for cloning data objects, i.e. DataObject copy(), and comparing them, i.e. boolean hasSameValueAsIn(DataObject object), in the class DataObject. The method copy() makes a deep copy of the data object. This means that the method is called recursively for any nested data object, e.g., list elements or aggregate attributes. The following code gives an example, where a String object is created and copied:

StringObject stringObject = (StringObject) ModelFactory.getDefaultModel().createObject(StringClass.CLASS_NAME);
StringObject copiedStringObject = (StringObject) stringObject.copy();

The object copiedStringObject is an instance of the same class (StringObject) and contains the same value as the original object ("test").

In order to compare if the values of two objects are equal, the method hasSameValueAsIn can be used. It performs no reference check but instead checks all relevant information of two data objects for equality. The definition of relevant information is different for each data object. For instance, two integer objects are simply equal if their values are equal. In contrast, the test for equality of two lists (or even NEST graphs) is much more complex. Therefore, hasSameValueAsIn() makes recursive calls for nested data objects, analogous to copy(). The following code gives an example, using the objects from the example of the copy method above:

copiedStringObject.hasSameValueAsIn(stringObject);  // returns true

This method call would return true, because both objects contain the same value. Using a String object, that contains another value, the method would return false.

When using the hasSameValueAsIn method for objects, that instantiate different data classes, it returns false. The following list contains a few more specific behaviours of some data classes:

  • When comparing Numeric objects, it is checked, if the values are identical. So, for example a Integer object with the value 1 and a Double object with the value 1.0 are identified as equal by this method.
  • When comparing Numeric objects with String objects, it is checked, if the String object contains numeric values. So, for example a Integer object with the value 1 and a String object with the value "1" are identified as equal by this method.
  • When comparing Collection objects, it is checked, if they contain the same elements. Each element must appear at the same position in both collections (if the data structure supports positional information). So, for example a List object and a Set object, that both contain the same five values, are identified as equal by this method. Since it is possible that a List object contains an object more than once, this would lead to the method identifying both objects as unequal.