Basic #

The following basic measures are implemented:

These measures can be applied to all types of data objects.

TableDataClass #

The most generic similarity measure in ProCAKE SMTableDataClass compares the data classes between objects. This measure needs an upper class, because it can only compare objects of this class and it’s subclasses. A default similarity value can be set for all of these classes. But it’s also possible to define exceptions for this default similarity. Therefore, it’s necessary to set the data classes for query and case and a weight for this comparison. It’s also possible, to define, if this measure should be symmetric.

The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
defaultSimilaritySimilarity ([0,1])0.0The parameter expects a similarity value given. For every comparison, which wasn’t defined before in the table, the similarity corresponds to this value.
symmetricFlag (boolean)falseThe parameter is used to define, whether similarity computations should be symmetric or not. If the value is set to false, an extra similarity value needs to be defined for each comparison.

Similarity values for data classes to each other can be defined in the entries. If the measure is symmetrical, a corresponding entry must be created only once. If it is not symmetrical, similarity values that deviate from the default value must be set, taking into account, which class is in the query and which is in the case. If no corresponding entry is found for two classes, but the classes are identical, a similarity of 1.0 is assumed. If the classes are different and no entry is found, it is checked whether an entry exists for the superclasses. In this case, the similarity value of the superclasses is used, otherwise the default value.

For example, a XML definition of this measure can look like:


    <TableDataClass name="SMTableDataClass" class="Data" defaultSimilarity="0.0" symmetric="false">
        <Entry query="Void" case="Data" value="1.0"/>
        <Entry query="String" case="String" value="1.0"/>

This measure fits for all classes, because it refers to the highest upper class Data. Its default similarity is set to 0.0, so for each comparison, which isn’t specified, the similarity value is 0.0. It’s not symmetric, so each case must be defined explicit. A comparison between a Void object and another object of a subclass of Data will have the similarity 1.0. A comparison between a Void and a String object will also do so, because String is a subclass of Data and because there is no class in the hierarchy between String and Data, this similarity is taken.

To create this measure during runtime, the following code would be used:

    SMTableDataClass smTableDataClass = (SMTableDataClass) simVal.getSimilarityModel().createSimilarityMeasure(
    simVal.getSimilarityModel().addSimilarityMeasure(smTableDataClass, "SMTableDataClass");

ObjectEqual #

The similarity assessment between two arbitrary objects is performed by an equality comparison with this measure. This similarity measure uses the hasSameValueAsIn method that is implemented for any data object. For atomic objects this is equal to the equals method. For sets, lists, aggregates, and others a deeper comparison is performed.

In the example below arbitrary objects can be compared for equal values. For instance, two Strings containing the value "test" or two Integer objects with the value 123 would be considered as equal by this measure.


    <ObjectEqual name="SMObjectEqual" class="Atomic"/>

To create this measure during runtime, the following code would be used:

    SMObjectEqual smObjectEqual = (SMObjectEqual) simVal.getSimilarityModel().createSimilarityMeasure(
    simVal.getSimilarityModel().addSimilarityMeasure(smObjectEqual, "SMObjectEqual");