This page contains the following content:

graph TD; URIMeasures[URI Measures] --> PairMeasures[Pair Measures]; PairMeasures --> StructuralPathMeasures[Structural Path Measures]; StructuralPathMeasures --> Rada; StructuralPathMeasures --> WuPalmer[Wu and Palmer]; StructuralPathMeasures --> Slimani; PairMeasures --> FeatureBasedMeasures[Feature Based Measures]; FeatureBasedMeasures --> MaedcheStaab[Maedche and Staab]; FeatureBasedMeasures --> Sanchez; URIMeasures --> GroupMeasures[Group measures]; GroupMeasures --> DirectGroup[Direct Group]; GroupMeasures --> IndirectGroup[Indirect Group]; URIMeasures --> IsEquivalent[Is Equivalent]; URIMeasures --> Path; URIMeasures --> AggregateAverage[Aggregate Average];

Basically, the ontology-based similarity measures can be divided based on the entities to be compared. The comparison for pairs of entities can be done directly using pairwise similarity measures. These are again subdivided into individual classes. To directly compare multiple entities in terms of two sets, groupwise similarity measures are described, which can also be further subdivided.

Pairwise measures #

In the following section, we will describe the similarity measures for pairs of entities from ontology will be described in more detail and some similarity functions will be are shown. This is fundamentally based on the description of Haripse et. al 1. To this end, the imilarity measures fundamentally refer to the pairwise comparison of concepts in an ontology. However, the similarity measures can also be used for instances by adapting the relations under consideration. The pairwise similarity measures can be divided into further, subordinate classes, which pursue different Änsatzes in the context of the similarity computation. The structural path measures deal with the longness of paths between entities. In the framework of feature-based measures, a set of connected broader entities is considered and similarity is computed using the ratio of shared features. The information-content measures use information content for each entity to be able to determine similarity. These three types of similarity measures are described in more detail and example similarity features are shown in the following subsections.

The measures in this section are based on the comparison of two entities of an ontology. There are two different approaches that can be used to perform the comparison: feature based and path based.

Structural path measures #

The structural path measures determine similarity based on the connections of concepts through specific relations, such as the inheritance hierarchy. Basically, the distance based on the number of edges is considered in the similarity calculation to obtain a similarity value at a semantic level. To determine the Shortest Path between two entities \(q\) and \(c\) , the function \(sp(q,isa*,c)\) can be used. The shortest path can be determined with this function within simple taxonomies using the isa relation and its inverse relation expressed by the asterisk. For deviation from the inverse relations and use in graphs with unique roots, the shortest path can be defined by the Least Common Ancestor ( \(LCA\) ). The \(LCA\) represents the lowest common parent node from which inverse relations are used starting from an object. A special feature for this kind of similarity computation are multiple inheritances, which can be created for example by a reasoning process in the ontology. It is possible that such a multiple inheritance results in several potential \(LCA\) s from which a unique \(LCA\) must be selected. In principle, the most specific \(LCA\) with the largest distance to the root can be chosen. In case of multiple potential \(LCA\) s with identical distance to the root, the differentiation can be performed based on the shortest distance to both entities under consideration.

The following measures calculate the similarity between query and case based on the shortest path between query and case entity in an ontology.

Rada #

Based on the \(LCA\) , the definition of the shortest path between two concepts can also be expressed as the Rada distance \(distRada(q,c)\) and is shown in the following formula.

\(dist_{Rada}(q, c) = sp(q, LCA(q, c)) + sp(c, LCA(q, c))\)

The similarity function \(sim_{Rada}(q,c) \rightarrow (0,1]\) according to Rada et al 2, which converts the distance function into a similarity between \(0\) and \(1\) , is presented in formula bellow.

\(sim_{Rada}(q, c) = \frac{1}{1 + dist_{Rada}(q, c)}\)

In this context, the function \(LCA(q,c)\) determines the entity of the \(LCA\) within the inheritance hierarchy.

<OntologySpRada name="SM_Rada" class="">
    <OntoRelation name=""/>

Wu and Palmer #

As an extension of simple, structural similarities, further, more extensive similarity measures exist on the basis of the length of the shortest path. By including further information from the ontology, it shall be tried to consider further aspects of the entities in the context of the similarity computation. For this purpose, Wu and Palmer3 include, besides the distance according to Rada, additionally the depth of the \(LCA\) in the inheritance hierarchy. This is to include the specificity of the \(LCA\) over its depth. The corresponding similarity function \(sim_{WP}(q, c)\) according to Wu and Palmer is shown in the following formula.

\(sim_{WP}(q, c) = \frac{2 * d(LCA(q,c))}{2 * d(LCA(q, c)) + dist_{Rada}(q, c)}\)

The function \(d(q) = sp(q,isa,r)\) determines the depth of the query entity in relation to the root \(r\) of the inheritance hierarchy.

<OntologySpWuPalmer name="SM_WuPalmer" class="OntologyRef"/>

Slimani #

As an extension of the similarity measure of Wu and Palmer, Slimani et. al 4 also integrated the neighborhood ratios of entities into the computation. The goal is to weight the direct inheritance between entities higher than the neighborhood. For this purpose, a penalty factor was added, which distinguishes whether the two entities are in a neighborhood or direct inheritance relationship. The corresponding similarity function \(sim_{Slimani}(q,c)\) and the penalty factor \(pf(q,c)\) are shown in the following formula. Within \(pf(q,c)\) , the parameter \(\lambda = \{0,1\}\) indicates whether the entities are in a neighbor relationship ( \(\lambda = 1\) ) or direct inheritance relationship ( \(\lambda = 0\) ).

\(sim_{Silmani}(q,c)=sim_{WP}(q,c) * pf(q,c)\)


\(pf(q,c)= \left\{\begin{array}{ll} 1, & \text{if } \lambda = 0 \\ {(|d(q)-d(c)|+1)}^{ -1 } , & \text{if } \lambda = 1 \\ \end{array}\right.\)
<OntologySpSlimani name="SM_Slimani" class="URI"/>

Feature based measures #

Feature-based similarity computation represents another approach to determine a semantic similarity within an ontology between two entities. Basically, this approach extracts a set of features for each entity to be compared from the ontology and computes a similarity based on the number of common as well as distinct features. The various similarity measures differ in the selection of features as well as the use of ratios of common and distinct features. The computation of a similarity value can be based on classical approaches from the set-based similarity computation, so-called set-based measures.

The following measures calculate the similarity between query and case based on the equal and different features of the query and case entity in an ontology. Features are determined by the respective ancestors in the ontology.

The configuration of the measures is done via the InstanceOntologyOrderPredicate of the URIClass of the case and query. The predicate offers the following parameters:

Parameter Type/Range Default Value Description
root node URI (String) none Defines the root node which is used in combination with the relations of the predicate to span a sub-ontology which is then used by the similarity measure in its calculations. Example: "http://www.w3.org/2002/07/owl#Thing"
relations Set of Strings none Defines the relations that should be used to collect the ancestors of the given entities. For example {"rdfs:subClassOf", "rdf:type"}

The function \(A(x)\) returns the set of all ancestors of the entity \(x\) .

Maedche and Staab #

Maedche and Staab 5 use to define the features all ancestors of the entity from the inheritance hierarchy, which is a very common approach. approach. For the formation of a similarity value they resort to the Jaccard-index, which comes from the field of set-based similarity computation. The similarity function \(sim_{MS}\) according to Maedche and Staab is shown in the following. The function \(A(q)\) determines the set of all ancestors of the query entity \(q\) .

\(sim_{MS}(q, c) = \frac{\vert A(q) \ \cap \ A(c) \vert}{\vert A(q) \ \cup \ A(c) \vert}\)
<OntologyFbMaedcheStaab name="SMJaccard" class="Set"/>

Sanchez #

A more complex calculation of the relationship between the shared and distinct features is used by Sanchez et al 6. In their similarity function \(sim_{Sanchez}(q, c)\) , they consider the difference between the different sets instead of the set union. The function is shown in the following formula.

\(sim_{Sanchez}(q, c) = log_2(1 + \frac{\vert A(q) \ \setminus \ A(c) \vert + \vert A(c) \ \setminus \ A(q) \vert}{\vert A(q) \ \setminus \ A(c) \vert + \vert A(c) \ \setminus \ A(q) \vert + \vert A(q) \ \cap \ A(c) \vert})\)
<OntologyFbSanchez name="SM_Sanchez" class=""/>

Group measures #

The following section and subsections were fundamentally based on the description of Haripse et. al 1. The groupwise similarity measures can again be divided into two approaches. In the first approach of direct group measures, the individual entities are compared using classical set-based approaches. In the second approach of indirect group measures, pairs of entities are formed between the two sets, which are compared based on pairwise similarity. The resulting similarity values are combined into one value by using an aggregation function.

DirectGroup #

The direct group measures can use classical set-based approaches that determine similarity based on an equality of the individual elements. In this context, for example, the Jaccard index, as used in the feature-based similarity measures of Maedche and Staab, can be used. Instead of determining the corresponding feature set, the passed sets of entities can be used directly, as shown in the formula below.

\(sim_{DirJac}(Q,C)=\frac{ |Q \cap C| }{ |Q \cup C| }\)

Here, \(Q\) and \(C\) are both collections.

<OntologyDirectGroup name="SM_DirectGroup" class="Set" measure="SM_MaedcheStaab"/>

Analogously, the similarity calculation for sets of entities can also be applied to the feature-based similarity measure of Sanchez et al.

<OntologyDirectGroup name="SM_DirectGroup" class="Set" measure="SM_Sanchez"/>

IndirectGroup #

The indirect group measures, on the other hand, attempt to determine similarity between sets via the use of pairwise similarity measures. For example, the pairing can be done in the form of a set product by forming all possible pair combinations between the two sets. are formed. Then, a pairwise similarity measure can be used for pairs. The resulting set of pairwise similarities can be combined into a similarity value by using an aggregation function. For example, the average value can be formed over all pairwise similarities, as seen in the following formula.

\( sim_{IndirAvg}(Q,C)=\frac{ \sum_{ q \in Q }^{} \sum_{ c \in C } sim_{Pair}(q,c) }{ |Q| * |C| } \)
<OntologyIndirectGroup name="SMPort" class="Set" measure="SMPortSemantic" aggFunction="Avg"/>

IsEquivalent #

Calculates the similarity between query and case based on the existence of specific relations between query and case entity in an ontology. Different relations can be defined, and the check of existence can also be done transitive. If there is a connection between the query and case entity with one of the defined relations the similarity is 1.0, otherwise 0.

<OntologyIsEquivalent name="" class="">
    <OntoEquivalenceRelation relation=""/>

Path #

This measure is analogous to the Taxonomy Path measure, but is applicable to URIs. To use this measure, the ontology of the URIClass of the query and the case has to be restricted with an Instance Ontology Order Predicate. This is necessary as the calculation of the OntologyPath measure is height based and thus needs a distinct root node which an unrestricted ontology does not necessarily provide.

Caching: The measure uses several caches during its similarity value calculations to speed up the process: hierarchy height, node parents, intersection cache. These caches are stored in the InstanceOntologyOrderPredicate as they are dependent on the root node URI and the relations which are defined in the predicate.

<OntologyPath name="SMOntoPath" class="OntoOperator" up="0.7" down="0.7" default="false"/>

AggAvg #

<OntologyAggregateAvg name="SMPortSemantic" class="URI">
    <OntoAggWeight relation="wi2:hasName" weight="1.0" measure="SMString"/>
    <OntoAggWeight relation="wi2:hasPortType" weight="2.0" measure="SMEquals"/>

  1. Harispe, Sébastien, et al.: Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies 8.1 (2015): 1-254. ↩︎

  2. Rada, Roy, et al. Development and application of a metric on semantic nets. IEEE transactions on systems, man, and cybernetics 19.1 (1989): 17-30. ↩︎

  3. Wu, Zhibiao, and Martha Palmer. Verb semantics and lexical selection. arXiv preprint cmp-lg/9406033 (1994). ↩︎

  4. Slimani, Thabet, B. Ben Yaghlane, and Khaled Mellouli. A new similarity measure based on edge counting. World academy of science, engineering and technology 23.2006 (2006): 34-38. ↩︎

  5. Maedche, Alexander, and Steffen Staab. Comparing ontologies-similarity measures and a comparison study. AIFB, 2001. ↩︎

  6. Sánchez, David, et al. Ontology-based semantic similarity: A new feature-based approach. Expert systems with applications 39.9 (2012): 7718-7728. ↩︎