Numeric #

This page contains the following content:

The following measures are implemented for numeric types (Integer, Double) according to the definitions by Ralph Bergmann¹. Except for the Fuzzy measure, these measures are distance-based and therefore, these distance-based measures can also be applied to the chronological types (Time, Timestamp, Date).

The use of the numeric measures for chronological types is analog to the use of the numeric types. However, it is not possible to compare chronological objects of different types with each other.

Restricting value ranges #

It is possible to restrict the possible value ranges for numeric and chronological types. All here listed similarity measures can work with the restriction. For further information, have a look at Restricting Value Ranges.

Distance value #

For all measures except the Fuzzy measure, the similarity between two numbers (here: query value \(q\) and case value \(c\) ) is usually calculated by using the so called distance between those two numbers. In the formulas the letter \(d\) stands for distance. The smaller \(d\) is, the more similar are the two values. \(d=0\) will lead to the similarity value \(sim=1\) .

The formula for the calculation of \(d\) is \(d(q,c) = |q-c|\) . Some measures use normalized version (Exponential and Sigmoid) \(d_{norm}(q,c) = \frac{d(q,c)}{max-min}\) . It does not matter whether the objects are of numeric or chronological type. The values of \(min\) and \(max\) depend on whether \(q\) and \(c\) are of type Integer, Double or Chronological:

case 1 (query and case are of type Integer): \(max-min = Integer.MAX\_VALUE - Integer.MIN\_VALUE\) .
case 2 (query and case are of type Double): In this case \(min = Double.MIN\_VALUE\) and \(max = Double.MAX\_VALUE\) . However, since \(max - min\) is larger than \(Double.MAX\_VALUE\) , the value could not be represented by the Double type in Java. Therefore, the result is defined as \(max-min = Double.MAX\_VALUE\) .
case 3 (query and case are of the Chronological type): In this case the minimum and maximum points in time based on \(min = Long.MIN\_VALUE\) and \(max = Long.MAX\_VALUE\) are used that can be mapped in Java are used. For all three inherited classes Date, Time and Timestamp are \(min\) and \(max\) the same.

Strategies for numeric measures #

In ProCAKE there are three asymmetric strategies for the numeric measures, which influence the similarity value:

CaseHigherThanQuery: If the case value is higher than the query value, the similarity will always be \(0.0\) .
QueryHigherThanCase: If the query value is higher than the case value, the similarity will always be \(0.0\) .
None: This is the default strategy. When using this one, no asymmetric zero out will happen.

The following parameters exist for all numeric similarity measures.

Parameter	Type/Range	Default Value	Description
strategy	Strategy (String)	`none`	The parameter is used to set the strategy for the similarity computation. Three different types exists: CaseHigherThanQuery, QueryHigherThanCase and none.

The values can be set in the XML file. For example:

<NumericMeasure name="NumericMeasure" class="Double" alpha="6" asymmetricZeroOutStrategy="query_higher_than_case"/>

Otherwise, the strategy can be set during runtime by using the method setAsymmetricStrategy(ZERO_OUT_STRATEGIES strategy). This can look like:

smNumericMeasure.setAsymmetricStrategy(ZERO_OUT_STRATEGIES.CASEHIGHERTHANQUERY);

Linear #

In most cases the similarity between two numeric values can be described by a linear function. The idea is that the similarity decreases linearly in an interval \([min,max]\) with the increase of difference between the two values.

If the distance is smaller than the minimum, the value for the similarity is \(1.0\) . If the distance is higher than the maximum, the value for the similarity is \(0.0\) . For the values between, the similarity is calculated by the formula \(\frac{max-d(q,c)}{max-min}\) .

Formalized, the similarity calculation works as follows:

\(sim(q,c) = \begin{cases} 1 &\text{, if } d(q,c) \lt min \\ \frac{max-d(q,c)}{max-min} &\text{, if } min \leq d(q,c) \leq max \\ 0 &\text{, if } d(q,c) \gt max \end{cases}\)

The following graph shows the similarity depending on the calculated distance.

abstract_linear_smaller

The following parameters can be set for this similarity measure.

Numeric parameters:

Parameter	Type/Range	Default Value	Description
min	Bound (int)	`0.0`	The parameter is used to set the minimum bound of the interval.
max	Bound (int)	`Double.MAX_VALUE`/`Integer.MAX_VALUE`/Highest upper bound (interval range)	The parameter is used to set the maximum bound of the interval. If it wasn’t set, a warning is issued by the logger when the measure is initialized.

Chronological parameters:

Parameter	Type/Range	Default Value	Description
min	Bound (int)	`DateObject.MIN_VALUE`/`TimeObject.MIN_VALUE`/`TimestampObject.MIN_VALUE`	The parameter is used to set the minimum bound of the interval.
max	Bound (int)	`DateObject.MAX_VALUE`/`TimeObject.MAX_VALUE`/`TimestampObject.MAX_VALUE` /Highest upper bound (interval range)	The parameter is used to set the maximum bound of the interval. If it wasn’t set, a warning is issued by the logger when the measure is initialized.

The values for \(min\) and \(max\) can be set, when creating the measure. Either in the sim.xml:

sim.xml

    <NumericLinear name="SMNumericLinear" class="Double" min="0" max="10"/>

Or during runtime:

Wiki_NumericTest.java

    SMNumericLinear smNumericLinear = (SMNumericLinear) simVal.getSimilarityModel().createSimilarityMeasure(
      SMNumericLinear.NAME,
      ModelFactory.getDefaultModel().getDoubleSystemClass()
    );
    try {
      smNumericLinear.setMaximum(10);
      smNumericLinear.setMinimum(0);
    } catch (InvalidAttributeValueException exception) {
      exception.printStackTrace();
    }
    simVal.getSimilarityModel().addSimilarityMeasure(smNumericLinear, "SMNumericLinear");

Here it must be noted that the methods setMinimum and setMaximum can each throw an InvalidAttributeValueException. Therefore, these must be handled with a try-catch clause.

If no value is set for \(min\) , it is set to the default value of \(0.0\) . Furthermore, it is not possible to set \(min\) to a value below \(0.0\) .

If no value is set for \(max\) , the maximum value for the data class of case and query is chosen. So, if they are double objects, the maximum Double value is set as maximum. If they are integer objects, the maximum Integer value is set as maximum.

It is also possible, to use this measure with numeric objects, that contain an interval range. If no value for \(max\) is set, the highest possible distance is used. This is calculated as the distance of the highest upper bound and the smallest lower bound.

Threshold #

Threshold functions should be used, if the contribution of an attribute to the utility is binary in the sense that up to a certain difference the case is useful and beyond it is not useful.

There is also the Threshold value \(\theta\) , which has to be set. The value for the similarity depends on the \(\theta\) value. If the distance \(d(q,c)\) is smaller than \(\theta\) , the similarity is \(1.0\) . Otherwise, the similarity is \(0.0\) .

The formula for the similarity calculation is as follows:

\(sim(q,c) = \begin{cases} 1 &\text{, if } d(q,c) \lt \theta \\ 0 &\text{, if } d(q,c) \geq \theta \end{cases}\)

The following figure illustrates the similarity function:

abstract_threshold_smaller

The following parameters can be set for this similarity measure.

Parameter	Type/Range	Default Value	Description
threshold	threshold (double)	`1.0`	The parameter expects a Double value given, that defines a threshold value \(\theta\) . If the distance \(d(q,c)\) is smaller than \(\theta\) , the similarity is \(1.0\) . Otherwise, the similarity is \(0.0\) .

The similarity measure can be defined in the sim.xml like below:

sim.xml

    <NumericThreshold name="SMNumericThreshold" class="Double" threshold="1"/>

To create this measure during runtime, use the following code:

Wiki_NumericTest.java

    SMNumericThreshold smNumericThreshold = (SMNumericThreshold) simVal.getSimilarityModel().createSimilarityMeasure(
      SMNumericThreshold.NAME,
      ModelFactory.getDefaultModel().getDoubleSystemClass()
    );
    smNumericThreshold.setThreshold(1.0);
    simVal.getSimilarityModel().addSimilarityMeasure(smNumericThreshold, "SMNumericThreshold");

Fuzzy #

Note that Fuzzy is not applicable for chronological types, because it is not a distance-based measure, so it can only be used for numerics.

This measure is providing fuzzy similarity functionality, according to the paper by Ahmed et al.² (see p.13 for a visual description). A spread is defined, which allows triangle-areas on those values. If the areas of the compared values overlap, this indicates a similarity. \(max(Overlap/area1, Overlap/area2)\) is then the similarity.

A complete description, how the fuzzy similarity is computed in detail, can be found here.

The following parameters can be set for this similarity measure.

Parameter	Type/Range	Default Value	Description
spread	Length (double)	`1.0`	The parameter expects a Double value given, that specifies the length of a line the x-axis, which depends on the x-value of the point. A detailed explanation can be found here.

The similarity measure can be defined in the sim.xml like below:

sim.xml

    <NumericFuzzy name="SMNumericFuzzy" class="Double" spread="0.5"/>

To create this measure during runtime, use the following code:

Wiki_NumericTest.java

    SMNumericFuzzy smNumericFuzzy = (SMNumericFuzzy) simVal.getSimilarityModel().createSimilarityMeasure(
      SMNumericFuzzy.NAME,
      ModelFactory.getDefaultModel().getDoubleSystemClass()
    );
    smNumericFuzzy.setSpread(0.5);
    simVal.getSimilarityModel().addSimilarityMeasure(smNumericFuzzy, "SMNumericFuzzy");

Exponential #

One possibility to describe the similarity between two numeric values is by an exponential function. The semantics of such a similarity function is that little differences between the two values cause a big decrease of similarity. The larger the parameter \(\alpha\) , the faster is the decrease. Make sure that \(\alpha > 1\) .

\(sim(q,c) = e^{d_{norm}(q,c) \cdot (-\alpha)}\)

The following graph shows the similarity values for two different values of \(\alpha\) , namely \(\alpha_1\) and \(\alpha_2\) with \(\alpha_1 < \alpha_2\) . As already said, the sim value decreases faster the larger \(\alpha\) is.

abstract_exponential_smaller

The following parameters can be set for this similarity measure.

Parameter	Type/Range	Default Value	Description
alpha	Growth (double)	`1.0`	The parameter expects an Integer value given, that controls the growth of the exponential function for the similarity. The larger alpha is, the faster the function grows.

The similarity measure can be defined in the sim.xml like below:

sim.xml

    <NumericExponential name="SMNumericExponential" class="Double" alpha="6"/>

To create this measure during runtime, use the following code:

Wiki_NumericTest.java

    SMNumericExponential smNumericExponential = (SMNumericExponential) simVal.getSimilarityModel().createSimilarityMeasure(
      SMNumericExponential.NAME,
      ModelFactory.getDefaultModel().getDoubleSystemClass()
    );
    smNumericExponential.setAlpha(6.0);
    simVal.getSimilarityModel().addSimilarityMeasure(smNumericExponential, "SMNumericExponential");

Sigmoid #

Sigmoid functions can also be used to compare numeric values. The parameter \(\theta \geq 0\) specifies the distance value at which the similarity is \(0.5\) . The parameter \(\alpha>0\) specifies the steepness of the similarity decrease: the smaller \(\alpha\) the steeper is the decrease.

\(sim(q,c) = \frac{1}{e^{\frac{d_{norm}(q,c)-\theta}{\alpha}}+1}\)

The following graph shows the similarity values for two different values of \(\alpha\) , namely \(\alpha_1\) and \(\alpha_2\) with \(\alpha_1 \lt \alpha_2\) . As already said, the sim value decreases more steeply the larger \(\alpha\) is.

abstract_sigmoid_smaller

The following parameters can be set for this similarity measure.

Parameter	Type/Range	Default Value	Description
alpha	Growth (double)	`1.0`	The parameter expects a Double value given, that specifies the steepness of the similarity decrease. The smaller alpha, the steeper is the decrease.
theta	Growth (double)	`1.0`	The parameter is used to set the distance value, at which the similarity is `0.5`.

The similarity measure can be defined in the sim.xml like below:

sim.xml

    <NumericSigmoid name="SMNumericSigmoid" class="Double" alpha="0.1" theta="0.5"/>

To create this measure during runtime, use the following code:

Wiki_NumericTest.java

    SMNumericSigmoid smNumericSigmoid = (SMNumericSigmoid) simVal.getSimilarityModel().createSimilarityMeasure(
      SMNumericSigmoid.NAME,
      ModelFactory.getDefaultModel().getDoubleSystemClass()
    );
    smNumericSigmoid.setAlpha(0.1);
    smNumericSigmoid.setTheta(0.5);
    simVal.getSimilarityModel().addSimilarityMeasure(smNumericSigmoid, "SMNumericSigmoid");

Ralph Bergmann: Experience Management: Foundations, Development Methodology, and Internet-Based Applications. Lecture Notes in Computer Science 2432, Springer 2002, ISBN 3-540-44191-3 ↩︎
Mobyen Uddin Ahmed, Shahina Begum, Peter Funk, Ning Xiong, Bo von Schéele: Case-based Reasoning for Diagnosis of Stress using Enhanced Cosine and Fuzzy Similarity. Tran. CBR 1(1): 3-19 (2008) ↩︎