Uris

URI #

This page contains the following content:

graph TD; URIMeasures[URI Measures] --> PairMeasures[Pair Measures]; PairMeasures --> StructuralPathMeasures[Structural Path Measures]; StructuralPathMeasures --> Rada; StructuralPathMeasures --> WuPalmer[Wu and Palmer]; StructuralPathMeasures --> Slimani; PairMeasures --> FeatureBasedMeasures[Feature Based Measures]; FeatureBasedMeasures --> MaedcheStaab[Maedche and Staab]; FeatureBasedMeasures --> Sanchez; URIMeasures --> GroupMeasures[Group measures]; GroupMeasures --> DirectGroup[Direct Group]; GroupMeasures --> IndirectGroup[Indirect Group]; URIMeasures --> IsEquivalent[Is Equivalent]; URIMeasures --> Path; URIMeasures --> AggregateAverage[Aggregate Average];

Basically, the ontology-based similarity measures can be divided based on the entities to be compared. The comparison for pairs of entities can be done directly using pairwise similarity measures. These are again subdivided into individual classes. To directly compare multiple entities in terms of two sets, group wise similarity measures are described, which can also be further subdivided.

Example #

For a better understanding of the different similarity measures, the Pizza ontology is used to illustrate them. A small excerpt can be seen below.

graph TD; Food -- type --> IceCream; Food -- subClassOf --> PizzaTopping; Food -- subClassOf --> Pizza; Pizza -- type --> InterestingPizza; Pizza -- type --> MeatyPizza; Pizza -- type --> CheesyPizza; Pizza -- type --> VegetarianPizza; Pizza -- subClassOf --> NamedPizza; NamedPizza -- type --> American; NamedPizza -- type --> FourSeasons; NamedPizza -- type --> Margherita; Food -- subClassOf --> PizzaBase; PizzaBase -- type --> DeepPanBase; PizzaBase -- type --> ThinAndCrispyBase;

rdf:type: The subject is an instance of a class.

rdfs:subClassOf: The subject is a subclass of a class.

The creation of URI objects is described here.

Pairwise measures #

In the following section, we will describe the similarity measures for pairs of entities from ontology. This is fundamentally based on the description of Haripse et. al 1. For this purpose, the similarity measures fundamentally refer to the pairwise comparison of concepts in an ontology. However, the similarity measures can also be used for instances by adapting the relations under consideration. The pairwise similarity measures can be divided into further, subordinate classes, which pursue different approaches in the context of the similarity computation. The structural path measures deal with the longness of paths between entities. In the context of feature-based measures, a set of connected broader entities is considered and similarity is calculated via the ratio of shared features. The information-content measures use information content for each entity to be able to determine similarity. These three types of similarity measures are described in more detail and example similarity features are shown in the following subsections.

The measures in this section are based on the comparison of two entities of an ontology. There are two different approaches that can be used to perform the comparison: feature based and path based.

Structural path measures #

The structural path measures determine similarity based on the connections of concepts through specific relations, such as the inheritance hierarchy. Basically, the distance based on the number of edges is considered in the similarity calculation to obtain a similarity value at a semantic level. To determine the Shortest Path between two entities \(q\) and \(c\) , the function \(sp(q,isa*,c)\) can be used. The shortest path can be determined with this function within simple taxonomies using the isa relation and its inverse relation expressed by the asterisk. For deviation from the inverse relations and use in graphs with unique roots, the shortest path can be defined by the Least Common Ancestor ( \(LCA\) ). The \(LCA\) represents the lowest common parent node from which inverse relations are used starting from an object. A special feature for this kind of similarity computation are multiple inheritances, which can be created for example by a reasoning process in the ontology. It is possible that such a multiple inheritance results in several potential \(LCA\) s from which a unique \(LCA\) must be selected. In principle, the most specific \(LCA\) with the largest distance to the root can be chosen. In case of multiple potential \(LCA\) s with identical distance to the root, the differentiation can be performed based on the shortest distance to both entities under consideration.

The following measures calculate the similarity between query and case based on the shortest path between query and case entity in an ontology.

Rada #

Based on the \(LCA\) , the definition of the shortest path between two concepts can also be expressed as the Rada distance \(distRada(q,c)\) and is shown in the following formula.

\(dist_{Rada}(q, c) = sp(q, LCA(q, c)) + sp(c, LCA(q, c))\)

The similarity function \(sim_{Rada}(q,c) \rightarrow (0,1]\) according to Rada et al 2, which converts the distance function into a similarity between \(0\) and \(1\) , is presented in formula below.

\(sim_{Rada}(q, c) = \frac{1}{1 + dist_{Rada}(q, c)}\)

In this context, the function \(LCA(q,c)\) determines the entity of the \(LCA\) within the inheritance hierarchy.

sim.xml

    <OntologySpRada name="SM_Rada" class="URI">
        <OntoRelation name="subClassOf"/>
        <OntoRelation name="type"/>
    </OntologySpRada>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologySpRada simMeasureRada = (SMOntologySpRada) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologySpRada.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simMeasureRada.addRelation(PROPERTY_RDFS_SUBCLASSOF);
    simMeasureRada.addRelation(PROPERTY_RDF_TYPE);
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureRada, "SM_Rada");

It is also possible, to define the parameter “OntoRelation” multiple times. PROPERTY_RDFS_SUBCLASSOF is a constant that represents the string http://www.w3.org/2000/01/rdf-schema#subClassOf and defines the subClassOf-relation for the rada similarity measure.

Example:

We now compare CheesyPizza with IceCream based on the Pizza ontology to visualize the Rada distance and the Rada similarity. The first step is to identify the \(LCA\) , in this case the node Food. Starting from this common parent node, there are exactly 2 edges to reach the CheesyPizza and 1 edge to reach the IceCream. Substituting these values into the formula yields to the following.

  • q = CheesyPizza
  • c = IceCream
\(dist_{Rada}(q, c) = sp(q, Food) + sp(c, Food)\)

Substituting these information in numeric values leads to the following rada distance.

\(dist_{Rada}(q, c) = 2 + 1 = 3\)

In the second step, it is mandatory to insert the value from the Rada distance (3) into the similarity formula to get the final value of the Rada similarity.

\(sim_{Rada}(q, c) = \frac{1}{1 + 3}\) = \( \frac{1}{4} = 0.25\)

Thus, the Rada similarity for CheesyPizza (query) and IceCream (case) is 0.25.

Wu and Palmer #

As an extension of simple, structural similarities, further, more extensive similarity measures exist on the basis of the length of the shortest path. By including further information from the ontology, it shall be tried to consider further aspects of the entities in the context of the similarity computation. For this purpose, Wu and Palmer3 include, besides the distance according to Rada, additionally the depth of the \(LCA\) in the inheritance hierarchy. This is to include the specificity of the \(LCA\) over its depth. The corresponding similarity function \(sim_{WP}(q, c)\) according to Wu and Palmer is shown in the following formula.

\(sim_{WP}(q, c) = \frac{2 * d(LCA(q,c))}{2 * d(LCA(q, c)) + dist_{Rada}(q, c)}\)

The function \(d(q) = sp(q,isa,r)\) determines the depth of the query entity in relation to the root \(r\) of the inheritance hierarchy.

sim.xml

    <OntologySpWuPalmer name="SM_WuPalmer" class="URI"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologySpWuPalmer simMeasureWP = (SMOntologySpWuPalmer) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologySpWuPalmer.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureWP, "SM_WuPalmer");

Example:

To visualize this measure with an example, we compare the CheesyPizza with the MeatyPizza to additionally show the influence of the depth of the \(LCA\) in the inheritance hierarchy. In the given example the root \(r\) is the node Food, so the \(LCA\) of CheesyPizza and MeatyPizza is the node Pizza and has a depth of 2.

  • q = CheesyPizza
  • c = MeatyPizza
\(sim_{WP}(q, c) = \frac{2 * 2}{(2 * 2) + 2} = 0.667\)

Slimani #

As an extension of the similarity measure of Wu and Palmer, Slimani et. al 4 also integrated the neighborhood ratios of entities into the computation. The goal is to weight the direct inheritance between entities higher than the neighborhood. For this purpose, a penalty factor was added, which distinguishes whether the two entities are in a neighborhood or direct inheritance relationship. The corresponding similarity function \(sim_{Slimani}(q,c)\) and the penalty factor \(pf(q,c)\) are shown in the following formula. Within \(pf(q,c)\) , the parameter \(\lambda = \{0,1\}\) indicates whether the entities are in a neighbor relationship ( \(\lambda = 1\) ) or direct inheritance relationship ( \(\lambda = 0\) ).

\(sim_{Slimani}(q,c)=sim_{WP}(q,c) * pf(q,c)\)

with

\(pf(q,c)= \left\{\begin{array}{ll} 1, & \text{if } \lambda = 0 \\ {(|d(q)-d(c)|+1)}^{ -1 } , & \text{if } \lambda = 1 \\ \end{array}\right.\)

sim.xml

    <OntologySpSlimani name="SM_Slimani" class="URI"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologySpSlimani simMeasureSl = (SMOntologySpSlimani) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologySpSlimani.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureSl, "SM_Slimani");

Example:

To visualize Slimani’s similarity measure and the influence of the new penalty parameter lambda, the two nodes CheesyPizza and PizzaTopping can be chosen to represent a neighborhood (λ = 1).

  • q = CheesyPizza
  • c = PizzaTopping
\(dist_{Rada}(q, c) = 3\)

Beside the rada distance, the Wu and Palmer similarity must also be calculated.

\(sim_{WP}(q, c) = \frac{2}{5} = 0.4\)

After the two known steps the calculation of the penalty for the neighborhood of the two nodes is mandatory ( \(pf(q,c)\) ).

\(pf(q,c) = {(|3 - 2|+1)}^{ -1 } = 0.5\)

Right after this computation is done, the penalty factor of 0.5 can be inserted into the formula to get the final result for the Slimani similarity.

\(sim_{Slimani}(q,c)= 0.4 * 0.5 = 0.2\)

Feature based measures #

Feature-based similarity computation represents another approach to determine a semantic similarity within an ontology between two entities. Basically, this approach extracts a set of features for each entity to be compared from the ontology and computes a similarity based on the number of common as well as distinct features. The various similarity measures differ in the selection of features as well as the use of ratios of common and distinct features. The computation of a similarity value can be based on classical approaches from the set-based similarity computation, so-called set-based measures.

The following measures calculate the similarity between query and case based on the equal and different features of the query and case entity in an ontology. Features are determined by the respective ancestors in the ontology.

The configuration of the measures is done via the InstanceOntologyOrderPredicate of the URIClass of the case and query. The predicate offers the following parameters:

ParameterType/RangeDefault ValueDescription
root nodeURI (String)noneDefines the root node which is used in combination with the relations of the predicate to span a sub-ontology which is then used by the similarity measure in its calculations. Example: "http://www.w3.org/2002/07/owl#Thing".
relationsSet of StringsnoneDefines the relations that should be used to collect the ancestors of the given entities. For example {"rdfs:subClassOf", "rdf:type"}.

The function \(A(x)\) returns the set of all ancestors of the entity \(x\) .

Maedche and Staab #

Maedche and Staab 5 use the features of all ancestors of the entity from the inheritance hierarchy for assessing the similarity, which is a very common approach. For the formation of a similarity value they resort to the Jaccard-index, which comes from the field of set-based similarity computation. The similarity function \(sim_{MS}\) according to Maedche and Staab is shown in the following.

\(sim_{MS}(q, c) = \frac{\vert A(q) \ \cap \ A(c) \vert}{\vert A(q) \ \cup \ A(c) \vert}\)

sim.xml

    <OntologyFbMaedcheStaab name="SMMaedcheStaabPair" class="URI"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyFbMS simMeasureMS = (SMOntologyFbMS) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyFbMS.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureMS, "SMMaedcheStaabPair");

Example: In this example the query is DeepPanBase and the case is CheesyPizza. At first, define all ancestors of the two entities.

  • DeepPanBase(q): PizzaBase, Food
  • CheesyPizza(c): Pizza, Food

After defining the ancestors of query and case, one have to figure out, what the values of the intersection and the union are.

  • Intersection: Food
  • Union: PizzaBase, Food, Pizza

Substituting the values in the formula leads to the following outcome of 0.333 for the Maedche and Staab similarity.

\(sim_{MS}(q, c) = \frac{\vert {Food} \vert}{\vert {PizzaBase, Food, Pizza} \vert}\)

Inserting these information in numeric values leads to the following Maedche and Staab similarity.

\(sim_{MS}(q, c) = \frac{\vert 1 \vert}{\vert 3 \vert} = 0.333\)

Sanchez #

A more complex calculation of the relationship between the shared and distinct features is used by Sanchez et al. 6. In their similarity function \(sim_{Sanchez}(q, c)\) , they consider the difference between the different sets instead of the set union. The function is shown in the following formula.

\(sim_{Sanchez}(q, c) = log_2(1 + \frac{\vert A(q) \ \setminus \ A(c) \vert + \vert A(c) \ \setminus \ A(q) \vert}{\vert A(q) \ \setminus \ A(c) \vert + \vert A(c) \ \setminus \ A(q) \vert + \vert A(q) \ \cap \ A(c) \vert})\)

sim.xml

    <OntologyFbSanchez name="SMSanchezPair" class="URI"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyFbSanchez simMeasureSanchez = (SMOntologyFbSanchez) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyFbSanchez.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureSanchez, "SMSanchezPair");

Example: For a better understanding of the Sanchez similarity we use the same two entities as before and define their ancestors.

  • DeepPanBase(q): PizzaBase, Food
  • CheesyPizza(c): Pizza, Food

After that, defining the set differences and the intersection is mandatory.

  • Set difference A(q) \ A(c): PizzaBase
  • Set difference A(c) \ A(q): Pizza
  • Intersection: Food

Substituting this in the given formula leads to a Sanchez similarity of 0.737.

\(sim_{Sanchez}(q, c) = log_2(1 + \frac{\vert {PizzaBase} \vert + \vert {Pizza} \vert}{\vert {PizzaBase} \vert + \vert {Pizza} \vert + \vert {Food} \vert})\)

Inserting these information in numeric values leads to the following Sanchez similarity.

\(sim_{Sanchez}(q, c) = log_2(1 + \frac{\vert 2 \vert}{\vert 3 \vert}) = 0.737\)

Group measures #

The following section and subsections were fundamentally based on the description of Haripse et al. 1. The group wise similarity measures can again be divided into two approaches. In the first approach of direct group measures, the individual entities are compared using classical set-based approaches. In the second approach of indirect group measures, pairs of entities are formed between the two sets, which are compared based on pairwise similarity. The resulting similarity values are combined into one value by using an aggregation function.

ParameterType/RangeDefault ValueDescription
measureStringnoneSets the set-based measure by its name from the similarity model.

DirectGroup #

The direct group measures can use classical set-based approaches that determine similarity based on an equality of the individual elements. In this context, for example, the Jaccard index, as used in the feature-based similarity measures of Maedche and Staab, can be used. Instead of determining the corresponding feature set, the passed sets of entities can be used directly, as shown in the formula below.

\(sim_{DirJac}(Q,C)=\frac{ |Q \cap C| }{ |Q \cup C| }\)

Here, \(Q\) and \(C\) are both collections.

sim.xml

    <OntologyDirectGroup name="SM_DirectGroupMS" class="Set" measure="SMMaedcheStaabPair"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyDirectGroup simDirectGroupMS = (SMOntologyDirectGroup) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyDirectGroup.NAME,
      ModelFactory.getDefaultModel().getSetSystemClass()
    );
    simDirectGroupMS.setSimMeasure("SMMaedcheStaabPair");
    simVal.getSimilarityModel().addSimilarityMeasure(simDirectGroupMS, "SMDirectMSGroup");

Analogously, the similarity calculation for sets of entities can also be applied to the feature-based similarity measure of Sanchez et al.

<OntologyDirectGroup name="SM_DirectGroup" class="Set" measure="SM_Sanchez"/>

Example:

  • Q = {MeatyPizza, DeepPanBase}
  • C = {DeepPanBase, ThinAndCrispBase}
\(sim_{DirJac}(Q,C)=\frac{ DeepPanBase }{ MeatyPizza, DeepPanBase, ThinAndCrispBase }\)

Substituting these information in numeric values leads to the following value.

\(sim_{DirJac}(Q,C)=\frac{ 1 }{ 3 } = 0.333\)

IndirectGroup #

The indirect group measures attempt to determine similarity between sets via the use of pairwise similarity measures. For example, pair formation can be performed in the form of a set product by forming all possible pair combinations between the two sets. Then, a pairwise similarity measure can be used for pairs. The resulting set of pairwise similarities can be combined into a similarity value by using an aggregation function. For example, the average value can be formed over all pairwise similarities, as seen in the following formula.

\( sim_{IndirAvg}(Q,C)=\frac{ \sum_{ q \in Q }^{} \sum_{ c \in C } sim_{Pair}(q,c) }{ |Q| * |C| } \)

sim.xml

    <ObjectEqual name="SMObjectEquals" class="URI"/>

    <OntologyIndirectGroup name="SMIndirectGroupAvg" class="Set" measure="SMObjectEquals" aggFunction="AVG"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyIndirectGroup simIndirectGroupAvg = (SMOntologyIndirectGroup) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyIndirectGroup.NAME,
      ModelFactory.getDefaultModel().getSetSystemClass()
    );
    simIndirectGroupAvg.setSimMeasure("SMObjectEquals");
    simIndirectGroupAvg.setAggFunction(OntologyIndirectMeasureAggEnum.AVG);
    simVal.getSimilarityModel().addSimilarityMeasure(
      simIndirectGroupAvg, "SMIndirectGroupAvg"
    );

ParameterType/RangeDefault ValueDescription
aggFunctionOntologyIndirectMeasureAggEnumnoneDefines the aggregation function for the calculated similarities.

Example:

  • Q = {MeatyPizza, DeepPanBase}
  • C = {VegetarianPizza, ThinAndCrispBase}

For simplification, we use the Rada similarity as the measure for the pairwise similarity calculation. The first step is to define all combinations between the two sets:

  • (MeatyPizza, VegetarianPizza), (MeatyPizza, ThinAndCrispBase), (DeepPanBase, VegetarianPizza), (DeepPanBase, ThinAndCrispBase)

The second step is to calculate the Rada similarity for every pair

  • \(sim_{Rada}(MeatyPizza, VegetarianPizza) = \frac{1}{3} = 0.333\)
  • \(sim_{Rada}(MeatyPizza, ThinAndCrispBase) = \frac{1}{5} = 0.2\)
  • \(sim_{Rada}(DeepPanBase, VegetarianPizza) = \frac{1}{} = 0.2\)
  • \(sim_{Rada}(DeepPanBase, ThinAndCrispBase) = \frac{1}{3} = 0.333\)

By summing up and dividing through |Q| * |C| the final formula is

\( sim_{IndirAvg}(Q,C) = \frac{ 0.333 + 0.2 + 0.2 + 0.333 }{ 4 } = \frac{1.066}{4} = 0.2665 \)

IsEquivalent #

Equivalences can be represented in ontologies by so-called equivalence relations, such as the OWL relation “sameAs” between instances ( \(isEquivalent(u,v) → {true,false}\) ). As an extension of the relations from OWL, additional specific relations within a domain can express an equivalence. Therefore, within the framework of the function isEquivalent(q, c) it shall be possible to define a set of relations which can be interpreted as equivalence relations. The function is used to check within the ontology whether one of the defined relations between the entities q and c exists. Complementary, the relations can also be considered transitive, which would consider an indirect connection of the two entities via a relation type.

This measure calculates the similarity between query and case based on the existence of specific relations between query and case entity in an ontology. Different relations can be defined, and the check of existence can also be done transitive (transitive=“true”). If there is a connection between the query and case entity with one of the defined relations the similarity is 1.0, otherwise 0.

sim.xml

    <OntologyIsEquivalent name="SimIsEquivalentSubClass" class="URI">
        <OntoEquivalenceRelation relation="subClassOf" transitive="true"/>
    </OntologyIsEquivalent>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyIsEquivalent simIsEquivalentSubClassOf = (SMOntologyIsEquivalent) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyIsEquivalent.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simIsEquivalentSubClassOf.addEquivalenceRelation("rdfs:subClassOf", true);
    simVal.getSimilarityModel().addSimilarityMeasure(
      simIsEquivalentSubClassOf, "SimIsEquivalentSubClass"
    );

ParameterType/RangeDefault ValueDescription
relationStringnoneThe name of the relation.
transitiveBooleanfalseDefines whether the relation should be considered transitive.

Path #

This measure is analogous to the Taxonomy Path measure, but is applicable to URIs. To use this measure, the ontology of the URIClass of the query and the case has to be restricted with an Instance Ontology Order Predicate. This is necessary as the calculation of the OntologyPath measure is height based and thus needs a distinct root node which an unrestricted ontology does not necessarily provide.

Caching: The measure uses several caches during its similarity value calculations to speed up the process: hierarchy height, node parents, intersection cache. These caches are stored in the InstanceOntologyOrderPredicate as they are dependent on the root node URI and the relations which are defined in the predicate.

The similarity of calculated using the path between query and case is the following:

\(sim = \frac{maxPathWeighted - nodePathWeighted }{maxPathWeighted}\)

The used maxPathWeighted is defined as:

\( maxPathWeighted= weightUp * \#maxStepsUpward + weightDown * \#maxStepsDownward \)

The used nodePathWeighted is defined as:

\(nodePathWeighted = weightUp * \#stepsUpward + weightDown * \#stepsDownward\)

sim.xml

    <OntologyPath name="SMOntoPath" class="URI" up="0.7" down="0.7" default="false"/>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyPath smOntoPath = (SMOntologyPath) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyPath.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    smOntoPath.setWeightUp(0.7);
    smOntoPath.setWeightDown(0.7);
    simVal.getSimilarityModel().addSimilarityMeasure(smOntoPath, "SMOntoPath");

ParameterType/RangeDefault ValueDescription
updouble [0.0, 1.0]1Defines the weight for a step upwards in the path between two nodes. Must be between 0 and 1, otherwise the value will be rounded automatically.
downdouble [0.0, 1.0]1Defines the weight for a step downwards in the path between two nodes. Must be between 0 and 1, otherwise the value will be rounded automatically.

Example: To calculate the similarity between the nodes VegetarianPizza and PizzaTopping, the steps to reach one node from the other are counted. These are 2 steps up and 1 steps down in the taxonomy. The similarity depends on the chosen weights for up and down.

If the weight for up is 0.2 and the weight for down is 0.3, both weights will be normalized to 0.4 for up and 0.6 for down. Using these weights, the value for maxPathWeighted is \( maxPathWeighted= 0.4 * 2 + 0.6 * 2 = 2\) . The value for nodePathWeighted is \( nodePathWeighted = 0.4 * 2 + 0.6 * 1 = 1.4\) . Finally, the similarity is \(sim = \frac{2 - 1.4}{2} = 0.3\) .

AggAvg #

To determine a global similarity value from the calculated local similarities, various aggregation functions can be used. These functions are already known in the area of CBR and can be transferred. In the following, selected aggregation functions are transferred to the global similarity computation in this concept. In the following function, a global similarity measure is shown, which determines an average value of the local similarities.

\( sim_{global}(q,c)=\frac{ \sum_{ r \in R_{r} }^{} sim_{local}(Q_{r},C_{r}, sim_{r}) }{ |R_{r}| } \)

This formula shows that the global similarity measure can be determined by aggregating all local similarity measures. To do this, all local similarities are calculated, summed, and then divided by the #local similarities \((|R_{r}|)\) . \(Q_{r}, C_{r}\) are the query and the case of one particular local similarity. By adding a weight for each relevant relation with \( w_{r} \) = [0, 1], an additional relevance of the individual, local similarities can be expressed for the similarity calculation. Such a global similarity measure is shown below and represents a weighted mean value.

\( sim_{global}(q,c)={ \sum_{ r \in R_{r} }^{} w_{r} * sim_{local}(Q_{r},C_{r}, sim_{r})} \)

All weights of the different local similarity measures must add up to 1.

\( { \sum_{ r \in R_{r} }^{} w_{r} = 1 } \)

Furthermore, simple aggregation functions that determine the minimum or maximum local similarity value can be used. These and further, existing aggregation functions are summarized under the general operator Φ. The use of the Φ operator in the global similarity calculation of this concept is possible with the similarity measure presented as follows.

\( sim_{global}(q,c)= {\phi_{r \in R_{r}} ( sim_{lokal}(Q_{r},C_{r}, sim_{r}))} \)

sim.xml

    <OntologyAggregateAvg name="AggAvg" class="URI">
        <OntoAggWeight relation="wi2:hasName" measure="SMString" weight="1.0"/>
        <OntoAggWeight relation="wi2:hasTopping" measure="SMEquals" weight="2.0"/>
    </OntologyAggregateAvg>

At runtime, the measure can be defined as follows:

Wiki_URITest.java

    SMOntologyAggAvg simMeasureAggAvg = (SMOntologyAggAvg) simVal.getSimilarityModel().createSimilarityMeasure(
      SMOntologyAggAvg.NAME,
      ModelFactory.getDefaultModel().getURISystemClass()
    );
    simMeasureAggAvg.setRelation("hasName", "SMString", 1.0);
    simMeasureAggAvg.setRelation("hasTopping", "SMEquals", 2.0);
    simVal.getSimilarityModel().addSimilarityMeasure(simMeasureAggAvg, "AggAvg");

ParameterType/RangeDefault ValueDescription
relationStringnoneDefines the name of the relation to be inferred.
measureStringnoneDefines the similarity measure to be used.
weightdoublenoneDefines the weight of the relation.
sparqlStringnoneSPARQL string (relationQuery).
sparqlParamNameStringnoneDefines the name of the parameter used in relationQuery (relationQueryParamName).

The parameter sparql-String defines a relationQuery that contains for example the PREFIX for the rdf types, the ontology and the query itself. The query language is similar to SQL. The parameter sparqlParamName defines the parameter for the relationQuery. For the pizza ontology it can be the following: “?pizza”.

Comparison of pairwise URI Measures #

For a comparison of the pairwise similarity measures, a small excerpt of the pizza ontology is used:

graph TD; Food --> IceCream; Food --> Pizza; Pizza --> NamedPizza; Pizza --> InterestingPizza; NamedPizza --> American; NamedPizza --> FourSeasons;

Pair (Query, Case)RadaWuPalmerSlimaniMaedcheStaabSanchez
(NamedPizza, American)0.250.40.20.6660.333
(NamedPizza, Pizza)0.3330.50.50.3330.666


  1. Harispe, Sébastien, et al.: Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies 8.1 (2015): 1-254. ↩︎ ↩︎

  2. Rada, Roy, et al. Development and application of a metric on semantic nets. IEEE transactions on systems, man, and cybernetics 19.1 (1989): 17-30. ↩︎

  3. Wu, Zhibiao, and Martha Palmer. Verb semantics and lexical selection. arXiv preprint cmp-lg/9406033 (1994). ↩︎

  4. Slimani, Thabet, B. Ben Yaghlane, and Khaled Mellouli. A new similarity measure based on edge counting. World academy of science, engineering and technology 23.2006 (2006): 34-38. ↩︎

  5. Maedche, Alexander, and Steffen Staab. Comparing ontologies-similarity measures and a comparison study. AIFB, 2001. ↩︎

  6. Sánchez, David, et al. Ontology-based semantic similarity: A new feature-based approach. Expert systems with applications 39.9 (2012): 7718-7728. ↩︎