Similarity Measures

Similarity Measures #

This page contains the following content:

Similarity Measures in ProCAKE #

ProCAKE provides several similarity measures. Please refer to the following sections for more details:

The similarity measure hierarchy is as follows:

Similarity-Measure-Hierarchy

Measure parameters #

There are a few parameters, which can be set for every measure:

  • name: Every measure requires a unique name. This name is used afterwards, to apply a measure for a similarity computation. If two measures share the same name, an exception will be thrown.
  • class: For every measure a data class has to be set. The measure can only be applied to objects of this data class and its subclasses. For example, a String measure requires objects the String class. A numeric measure can work on Integer or Double objects. If an unsuitable class is set, for example Integer for a String measure, an exception will be thrown, when trying to create the measure.
  • default: For every measure a default value can be set. If it is set to true, this measure will be used as default measure for similarity computations for the specified class. This will only happen, if there is no explicit measure set for the computation. If more than one measure for the same class is set as default, the first one will be taken. If the default value is not explicitly set, it is per default true.
  • forceOverride: In case, that a defined measure should be overridden, this value has to be set to true. Only in this case, the name of the measure does not need to be unique. Per default, this value is set to false.

These parameters can be set in the XML similarity file. This can look like:

<SMNumericMeasure name="SMNumericMeasure" class="Integer" default="true" forceOverride="false"/>

In this case, a numeric similarity measure is created. It has a name, which has to be unique, because it cannot be overridden. It can be applied to Integer objects and is the default measure for this class.

To set these parameters during runtime, the following code can be used:

SMNumericMeasure smNumericMeasure = (SMNumericMeasure) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericMeasure.NAME, ModelFactory.getDefaultModel().getIntegerSystemClass());
smNumericMeasure.setForceOverride(false);
simVal.getSimilarityModel().addSimilarityMeasure(smNumericMeasure, "SMNumericMeasure");
simVal.getSimilarityModel().setDefaultSimilarityMeasure(ModelFactory.getDefaultModel().getIntegerSystemClass(), "SMNumericMeasure");

Here, simVal refers to a SimilarityValuator (described here).

Pre-initialized measures #

ProCAKE requires the definition of similarity measures for all system and user classes. It is pre-initialized with some basic measures as a fallback when the system is started with an empty similarity model.

There are three basic similarity measures:

  1. The default measure SMTableDataClass is a DataClass measure. It sets the default similarity for all comparisons to 0.0. Only for the comparison of a Void object to any other object, the similarity is always 1.0.
  2. The default measure SMObjectEqual is a ObjectEqual measure. It works on Atomic objects and checks, if the query object has the same value as the case object.
  3. The default measure SMAggregateAverage is a AggregateAverage measure. It works on Aggregate objects and has a default weight for each entry of 1.0.

To avoid defining a measure for all specific classes, measures can also be defined for super classes. The parent class of all classes is named Data.

When computing the similarity between two objects, the system first determines the least common class type of the objects and searches for the most specific measure that is defined for the given class type. If no such measure is defined, it searches for available measures for super classes of the type. If no measure can be found at all, an error is thrown.