Custom Similarity Measures

Custom measures #

This page contains the following content:

Creation of a custom similarity measure #

This page shows an example of a very simple custom similarity measures. Here, only the implementation class is shown. Usually, an interface class (SMTestMeasure.java in this example) should be provided for variables such as the measure name (NAME) and the JavaDoc annotations.

public class SMTestMeasureImpl extends SimilarityMeasureImpl implements SMTestMeasure {
    
    @Override
    public boolean isSimilarityFor(DataClass dataClass, String orderName) {
        return dataClass.isAtomic();
    }

    @Override
    public Similarity compute(DataObject queryObject, DataObject caseObject, SimilarityValuator similarityValuator) {
        if(queryObject.isAtomic() && caseObject.isAtomic())
            return new SimilarityImpl(this, queryObject, caseObject, 1.0);
        return new SimilarityImpl(this, queryObject, caseObject, 0.0);
    }

    @Override
    public String getSystemName() {
        return NAME;
    }
}

Every measure inherits from the class SimilarityMeasureImpl. Thus, it is necessary to implement the methods isSimilarityFor, compute and getSystemName. The function getSystemName() returns the name of the similarity measure as a string, which is usually (like in the above example) located in the corresponding interface. The method isSimilarityFor(DataClass dataClass, String orderName) returns a boolean value, indicating that the measure can be applied to the given DataClass. Some data classes contain an order of elements, which can be used through using the string parameter orderName. In this simple example, the measure can only be applied for atomic objects. The function compute(DataObject queryObject, DataObject caseObject, SimilarityValuator valuator) computes the similarity value for the given query and case object. A SimilarityValuator object is given, which can be used to get the current similarity model (for example when using other measures for subobjects of query and case). The function returns an object of the class SimilarityImpl that contains the used similarity measure, query and case objects, and the computed similarity value. In this simple example, the similarity is 1.0, if the query and case object are atomic objects, otherwise 0.0.

Reusability of similarity measures #

One may also define whether an instance of a measure is reusable or not by overriding the corresponding method:

    @Override
    public boolean isReusable() {
        return true;
    }

Reusability may be set to false if the measure uses some sort of internal cache that can be reused for several invocations of the measure. This is the case for SMGraphAStarImpl.java for instance.

If reusability is set to false, a new instance is automatically created by the framework for each invocation of the measure. To ensure that the previous instance is correctly cloned including all previous parameter settings please also override the following method:

    @Override
    protected void initializeBasedOn(SimilarityMeasure base) {
        super.initializeBasedOn(base);
        this.setTimeout(((SMGraphAStarImpl) base).getTimeout());
        [...]
    }

In the given example (taken from SMGraphAStarImpl.java), a parameter timeout is copied between the previous and new instance using the corresponding getter and setter methods.

Implementation of custom parameters #

A measure could need custom parameters e.g. to determine whether it should include a certain calculation.

Since this parameter can be seen as internal cache which is needed for the following similarity computation, reusability must be set to true. Furthermore there has to be a method to set the parameter to the desired value.

Usage of custom parameters #

Before computation of the similarity the custom parameter can be set to own liking by invoking the set method of the parameter using simVal as a SimilarityValuator. This can look like following example:

((SMNESTGraphItem) simVal.getSimilarityModel().getSimilarityMeasure(itemA.getDataClass(), SMNESTGraphItem.NAME)).setRequireEqualDataClass(true);

Adding the similarity measure to the similarity model during runtime #

To use a custom measure, it is necessary, to start ProCAKE first and use the data and the similarity model of this instance. For this, the following code can be used:

CakeInstance.start(PATH_COMPOSITION, PATH_MODEL, PATH_SIM_MODEL, PATH_CASEBASE);
Model model = ModelFactory.getDefaultModel();
SimilarityValuator simVal = SimilarityModelFactory.newSimilarityValuator();
SimilarityModel similarityModel = simVal.getSimilarityModel();

For the method CakeInstance.start the string values for the path of composition and the casebase has to be set (usually to e.g. "/cake/composition.xml"). The paths for the data and the similarity model can be null. The data model can be retrieved using the ModelFactory. To retrieve the similarity model, first a SimilarityValuator has to be created by using the SimilarityModelFactory. This can be used to retrieve the similarity model.

Afterwards, three steps are required to use the similarity measure:

  1. A new instance of the similarity measure must be registered in the similarity model. For this, the method registerSimilarityMeasureTemplate(SimilarityMeasure similarityMeasure) is used. Here, the constructor of the similarity measure can be used as parameter.
  2. The measure can be created by using the method createSimilarityMeasure(String similarityMeasureName, DataClass measureDataClass) of the similarity model. Here, similarityMeasureName must be the name, which is either set in the measure class itself or its interface. The measureDataClass is the data class, to which the measure can be applied (or for its subclasses). To get the data class, a method of the data model has to be used.
  3. The created measure is added to the similarity model by using its method addSimilarityMeasure(SimilarityMeasure similarityMeasure, String similarityMeasureName). Here, the string similarityMeasureName can be any string, which has not been used in the model, yet.
similarityModel.registerSimilarityMeasureTemplate(new SMTestMeasureImpl());

SMTestMeasure testMeasure = (SMTestMeasure) similarityModel.createSimilarityMeasure(SMTestMeasure.NAME, model.getAtomicSystemClass());

similarityModel.addSimilarityMeasure(testMeasure, "TestMeasure");

This is the easiest way to add a new created similarity measure to the similarity model. But in this approach, an XML instantiation of a similarity measure is not supported. If this is required, the following approach has to be used. Otherwise, the measure can be directly used.

Adding the similarity measure to the similarity model before runtime #

Before a similarity measure can be used in the XML files, it has to be added to the XML schema cdsm.xsd (which can be found in src/main/resources/cake/schema/). Here, a new element entry for the measure has to be created. There is an extension provided for the similarity measures, that is called baseSim. The creation of this element is shown in the following:

<xs:element name="TestMeasure">
    <xs:complexType>
        <xs:complexContent>
            <xs:extension base="baseSim"/>
        </xs:complexContent>
    </xs:complexType>
</xs:element>

After the creation of the new element, it has to be added to the element SimilarityModel. This element contains a choice of elements, where the new element can be added by using the following reference:

<xs:element ref="TestMeasure"/>

Now, the new measure can be instantiated in an XML file:

<TestMeasure name="SMTestMeasure" class="Atomic"/>

Before this XML file can be read and the measure created, the registration of the measure in the similarity model has to be performed. When using the XML variance, this has to be done manually in the class SimilarityModelImpl. In the method initializeSimilarityMeasureCache() every similarity measure of ProCAKE is created and put into a cache, so that it can be reused later. To register the measure, it has to be put into the cache with its name and an instantiation of it.

similarityMeasureTemplateCache.put(SMTestMeasure.NAME, new SMTestMeasureImpl());

Afterwards, the measure can be created by using the createSimilarityMeasure function of the similarity model. To do this for XML files, the class SimilarityModelHandler has to be used. It provides a function startElement(String uri, String localName, String qName, Attributes attributes), that is used for each XML element. To identify the measures, tags from the class SimilarityTags are used. The tag for the new measure can be added there. In the startElement method a switch case statement is added, which refers to a method, that creates the new measure.

case TAG_TESTMEASURE -> startElementTestMeasure(attributes);

Here, TAG_TESTMEASURE refers to a tag of SimilarityTags. Alternatively, a normal string can be used as input. The method startElementTestMeasure(attributes) can look like:

private void startElementTestMeasure(Attributes attributes) {
    this.currentSimilarityMeasure = createSimilarityMeasure(SMTestMeasure.NAME, attributes);
}

For more complex measures, values of the attributes can be called by the method getValue(String attributeName) of the class Attribute.

After these steps, the measure can be used in every class. A later usage of the functions of SimilarityModel is not required.

Writing Similarity Measures to XML Format #

Similarity measures can be serialized to an XML format using the class SimilarityModelWriter.

There, a store method is provided, that differentiates each similarity measure in a switch-case statement by checking the name of the similarity. Then, a method must be created that writes the similarity to an XML format. This can look like:

case SIM_MEASURE.NAME -> storeMeasure((SimMeasure) measure, defaultMeasure, writer);

The storeMeasure method can look like:

private void storeTestMeasure(SimMeasure testMeasure, boolean defaultMeasure,GenericXMLSchemaBasedWriter writer) throws IOException {
    writer.appendElement(PREFIX_CDSM, TAG_TEST_MEASURE);
    writer.addAttribute(ATT_DEFAULT_SIMILARITY_MEASURE, testMeasure.getDefaultSimilarityMeasure());
    storeDefaultAttributes(testMeasure, defaultMeasure, writer);
    writer.addAttribute(ATT_DEFAULT_SIMILARITY_MEASURE, testMeasure.getDefaultSimilarityMeasure());
    writer.finishElement();
}

There is also a sort method provided that takes a list of similarity measures and sorts them by their class. To extend this method with a custom similarity measure, the following steps need to be added to the sort method:

  1. Create a list for the desired similarity measure:
List<SimilarityMeasure> testMeasureList = new ArrayList<>();
  1. Add an if-statement to the loop, and add the similarity measure to the created list:
if (measure instanceof TestMeasureDataClass) {
  testMeasureList.add(measure);
  }
  1. Add the created list to the sorted list called goal:
goal.addAll(testMeasureList);

Usage of the new similarity measure #

To use this measure for a similarity computation, the method computeSimilarity(DataObject query, dataObject case, String similarityMeasureName) from the SimilarityValuator can be used. Here, similarityMeasureName must be the name, that was used in the method addSimilarityMeasure or in the XML schema. (If the measure was created before runtime, ProCAKE has to be started before the computation.)

This can look like:

Similarity sim = simVal.computeSimilarity(testObject1, testObject2, "TestMeasure");

The objects testObject1 and testObject2 have to be created before and must be instances of an atomic object datatype in the example.