Getting Started

Getting Started #

This page contains the following content:

Getting Started with ProCAKE #

ProCAKE is pre-configured and can be started with a single line of code


For initializing the framework with custom configuration, please refer to the Configuration page.

The initialized domain model already contains several system classes and corresponding similarity measures. The available classes and measures can be retrieved as follows:


Create atomic objects and calculate similarity #

Objects of data classes (e.g. a String object) can be created as follows:

Model model = ModelFactory.getDefaultModel();
StringObject stringObjectA = model.createObject(StringClass.CLASS_NAME);
StringObject stringObjectB = model.createObject(StringClass.CLASS_NAME);

For creating custom data objects, please refer to the Data Classes page.

The similarity between two data objects can be calculated using a SimilarityValuator:

SimilarityValuator simVal = SimilarityModelFactory.newSimilarityValuator();
Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB);

The result of the similarity calculation is returned as a Similarity object and can be printed to the console with the following line


By default, without a custom similarity model, strings will be compared using the equals method. The resulting similarity is:

ObjectEqual: hello (String) -> world (String): 0.0

At run-time, we can also create a new similarity measure. In the following, we create a measure for comparing strings with levenshtein distance:

SimilarityMeasure measure = SimilarityModelFactory.getDefaultSimilarityModel().createSimilarityMeasure(SMStringLevenshtein.NAME, ModelFactory.getDefaultModel().getStringSystemClass());

To calculate the similarity of StringObjects using levenshtein distance provide the name of the measure as an additional argument to the similarity valuator:

Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB, measure.getName());

If we add this measure to the similarity model, it will be automatically invoked by the similarity valuator:

SimilarityModelFactory.getDefaultSimilarityModel().addSimilarityMeasure(measure, SMStringLevenshtein.NAME);
Similarity similarity = simVal.computeSimilarity(stringObjectA, stringObjectB);

The console output of the similarity object is:

StringLevenshtein: hello (String) -> world (String): 0.19999999999999996

Create aggregate objects and calculate similarity #

Let’s create more complex data objects. We first have to create a custom class as follows:

AggregateClass aggregateClass = (AggregateClass) model.getAggregateSystemClass().createSubclass("MyAggregate");
aggregateClass.addAttribute("attA", model.getStringSystemClass());
aggregateClass.addAttribute("attB", model.getIntegerSystemClass());

Then we can create objects as follows:

AggregateObject aggregateObjectA = model.createObject("MyAggregate");
StringObject stringObject = (StringObject) model.getStringSystemClass().newObject();
stringObject.setNativeString("hello world");
aggregateObjectA.setAttributeValue("attA", stringObject);
IntegerObject integerObjectA = (IntegerObject) model.getIntegerSystemClass().newObject();
aggregateObjectA.setAttributeValue("attB", integerObjectA);

AggregateObject aggregateObjectB = model.createObject("MyAggregate");
aggregateObjectB.setAttributeValue("attA", model.getVoidSystemClass().newObject());
IntegerObject integerObjectB = (IntegerObject) model.getIntegerSystemClass().newObject();
aggregateObjectB.setAttributeValue("attB", integerObjectB);

For the similarity computation we use the pre-initialized system measures:

Similarity similarity = simVal.computeSimilarity(aggregateObjectA, aggregateObjectB);

The console output of the similarity is:

AggregateAverage:aggregate (MyAggregate) -> aggregate (MyAggregate): 0.5
	ObjectEqual:42 (Integer) -> 42 (Integer): 1.0
	DataClass:hello world (String) -> : (Void): 0.0

The output shows all the computed local similarities as well as the resulting global similarity for the aggregate object. For comparing the attribute ‘attB’ the measure ‘ObjectEqual’ was used as no specific measure for integers was defined in the similarity model. For comparing attribute ‘attA’ the measure ‘DataClass’ was used, as this is the only available measure in the model that is capable of comparing a String and Void value. For computing the overall similarity, the default measure for aggregate objects ‘AggregateAverage’ was used.

Please note that, by default, this similarity computation is asymmetric:

Similarity similarity = simVal.computeSimilarity(aggregateObjectB, aggregateObjectA);

The console output of the similarity is:

AggregateAverage:aggregate (MyAggregate) -> aggregate (MyAggregate): 1.0
	ObjectEqual:42 (Integer) -> 42 (Integer): 1.0
	DataClass:: (Void) -> hello world (String): 1.0

Input and output #

File paths in the project #

There are several useful paths for reading and writing files:

// external resources
String PATH_USER_HOME = System.getProperty("user.home") + File.separator+ "custom_folder" + File.separator;
String PATH_USER_DESKTOP = FileSystemView.getFileSystemView().getHomeDirectory().getAbsolutePath() + File.separator+ "custom_folder" + File.separator;

// classpath resources
String PATH_COMPOSITION = "/de/uni_trier/wi2/composition.xml";
String PATH_SIM_MODEL = "/de/uni_trier/wi2/sim.xml";

Please note that during project builds all classpath resources, i.e., all files under src/main/resources and src/test/resources, are copied to target/classes. At run-time, these files are always read from this target folder when using relative paths!

As a general rule of thumb it is recommended to read files from the classpath using the relative path to the resources folders and to read and write files to the local file system using absolute paths. It is also recommended to use File.separator instead of slashes to be independent of the operating system.