Numeric

# Numeric #

The following measures are implemented for numeric types (Integer, Double) according to the definitions by Ralph Bergmann1.

## Distance value #

For some measures, the similarity between two numbers (here: query value $$q$$ and case value $$c$$ ) is usually calculated by using the so called distance between those two numbers. In the formulas the letter $$d$$ stands for distance. The smaller $$d$$ is, the more similar are the two values. $$d=0$$ will lead to the similarity value $$sim=1$$ .

The formula for the calculation of $$d$$ is $$d(q,c) = |q-c|$$ . Some measures use a normalized version $$d_{norm}(q,c) = \frac{d(q,c)}{max-min}$$ . The values of $$min$$ and $$max$$ depend on whether $$q$$ and $$c$$ are of type Integer or Double:

• case 1 (query and case are of type Integer): $$max-min = Integer.MAX\_VALUE - Integer.MIN\_VALUE$$ .

• case 2 (query and case are of type Double): In this case $$min = Double.MIN\_VALUE$$ and $$max = Double.MAX\_VALUE$$ . However, since $$max - min$$ is larger than $$Double.MAX\_VALUE$$ , the value could not be represented by the Double type in Java. Therefore, the result is defined as $$max-min = Double.MAX\_VALUE$$ .

## Strategies for numeric measures #

In ProCAKE there are three asymmetric strategies for the numeric measures, which influence the similarity value:

1. CaseHigherThanQuery: If the case value is higher than the query value, the similarity will always be $$0.0$$ .
2. QueryHigherThanCase: If the query value is higher than the case value, the similarity will always be $$0.0$$ .
3. None: This is the default strategy. When using this one, no asymmetric zero out will happen.

The following parameters exist for all numeric similarity measures.

ParameterType/RangeDefault ValueDescription
strategyStrategy (String)noneThe parameter is used to set the strategy for the similarity computation. Three different types exists: CaseHigherThanQuery, QueryHigherThanCase and none.

The values can be set in the XML file. For example:

<NumericMeasure name="NumericMeasure" class="Double" alpha="6" asymmetricZeroOutStrategy="query_higher_than_case"/>


Otherwise, the strategy can be set during runtime by using the method setAsymmetricStrategy(ZERO_OUT_STRATEGIES strategy). This can look like:

smNumericMeasure.setAsymmetricStrategy(ZERO_OUT_STRATEGIES.CASEHIGHERTHANQUERY);


## Linear #

In most cases the similarity between two numeric values can be described by a linear function. The idea is that the similarity decreases linearly in an interval $$[min,max]$$ with the increase of difference between the two values.

If the distance is smaller than the minimum, the value for the similarity is $$1.0$$ . If the distance is higher than the maximum, the value for the similarity is $$0.0$$ . For the values between, the similarity is calculated by the formula $$\frac{max-d(q,c)}{max-min}$$ .

Formalized, the similarity calculation works as follows:

$$sim(q,c) = \begin{cases} 1 &\text{, if } d(q,c) \lt min \\ \frac{max-d(q,c)}{max-min} &\text{, if } min \leq d(q,c) \leq max \\ 0 &\text{, if } d(q,c) \gt max \end{cases}$$

The following graph shows the similarity depending on the calculated distance. The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
minBound (int)0.0The parameter is used to set the minimum bound of the interval.
maxBound (int)Double.MAX_VALUE/Integer.MAX_VALUE/Highest upper bound (interval range)The parameter is used to set the maximum bound of the interval. If it wasn’t set, a warning is issued by the logger when the measure is initialized.

The values for $$min$$ and $$max$$ can be set, when creating the measure. Either in the sim.xml:

<NumericLinear name="SMNumericLinear" class="Double" min="0" max="10"/>


or during runtime:

SMNumericLinear smNumericLinear = (SMNumericLinear) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericLinear.NAME, ModelFactory.getDefaultModel().getDoubleSystemClass());
try {
smNumericLinear.setMaximum(10);
smNumericLinear.setMinimum(0);
} catch (InvalidAttributeValueException exception) {
exception.printStackTrace();
}


Here it must be noted that the methods setMinimum and setMaximum can each throw an InvalidAttributeValueException. Therefore, these must be handled with a try-catch clause.

If no value is set for $$min$$ , it is set to the default value of $$0.0$$ . Furthermore, it is not possible to set $$min$$ to a value below $$0.0$$ .

If no value is set for $$max$$ , the maximum value for the data class of case and query is chosen. So, if they are double objects, the maximum Double value is set as maximum. If they are integer objects, the maximum Integer value is set as maximum.

It is also possible, to use this measure with numeric objects, that contain an interval range. If no value for $$max$$ is set, the highest possible distance is used. This is calculated as the distance of the highest upper bound and the smallest lower bound.

## Threshold #

Threshold functions should be used, if the contribution of an attribute to the utility is binary in the sense that up to a certain difference the case is useful and beyond it is not useful.

There is also the Threshold value $$\theta$$ , which has to be set. The value for the similarity depends on the $$\theta$$ value. If the distance $$d(q,c)$$ is smaller than $$\theta$$ , the similarity is $$1.0$$ . Otherwise, the similarity is $$0.0$$ .

The formula for the similarity calculation is as follows:

$$sim(q,c) = \begin{cases} 1 &\text{, if } d(q,c) \lt \theta \\ 0 &\text{, if } d(q,c) \geq \theta \end{cases}$$

The following figure illustrates the similarity function: The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
thresholdthreshold (double)1.0The parameter expects a Double value given, that defines a threshold value $$\theta$$ . If the distance $$d(q,c)$$ is smaller than $$\theta$$ , the similarity is $$1.0$$ . Otherwise, the similarity is $$0.0$$ .

The similarity measure can be defined in the sim.xml like below:

<NumericThreshold name="SMNumericThreshold" class="Double" threshold="1"/>


To create this measure during runtime, use the following code:

SMNumericThreshold smNumericThreshold = (SMNumericThreshold) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericThreshold.NAME, ModelFactory.getDefaultModel().getDoubleSystemClass());
smNumericThreshold.setThreshold(1.0);


## Fuzzy #

This measure is providing fuzzy similarity functionality, according to the paper by Ahmed et al.2 (see p.13 for a visual description). A spread is defined, which allows triangle-areas on those values. If the areas of the compared values overlap, this indicates a similarity. $$max(Overlap/area1, Overlap/area2)$$ is then the similarity.

A complete description, how the fuzzy similarity is computed in detail, can be found here.

The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
spreadLength (double)1.0The parameter expects a Double value given, that specifies the length of a line the x-axis, which depends on the x-value of the point. A detailed explanation can be found here.

The similarity measure can be defined in the sim.xml like below:

<NumericFuzzy name="SMNumericFuzzy" class="Double" spread="0.5"/>


To create this measure during runtime, use the following code:

SMNumericFuzzy smNumericFuzzy = (SMNumericFuzzy) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericFuzzy.NAME, ModelFactory.getDefaultModel().getDoubleSystemClass());


## Exponential #

One possibility to describe the similarity between two numeric values is by an exponential function. The semantics of such a similarity function is that little differences between the two values cause a big decrease of similarity. The larger the parameter $$\alpha$$ , the faster is the decrease. Make sure that $$\alpha > 1$$ .

$$sim(q,c) = e^{d_{norm}(q,c) \cdot (-\alpha)}$$

The following graph shows the similarity values for two different values of $$\alpha$$ , namely $$\alpha_1$$ and $$\alpha_2$$ with $$\alpha_1 < \alpha_2$$ . As already said, the sim value decreases faster the larger $$\alpha$$ is. The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
alphaGrowth (double)1.0The parameter expects an Integer value given, that controls the growth of the exponential function for the similarity. The larger alpha is, the faster the function grows.

The similarity measure can be defined in the sim.xml like below:

<NumericExponential name="SMNumericExponential" class="Double" alpha="6"/>


To create this measure during runtime, use the following code:

SMNumericExponential smNumericExponential = (SMNumericExponential) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericExponential.NAME, ModelFactory.getDefaultModel().getDoubleSystemClass());
smNumericExponential .setAlpha(6.0);


## Sigmoid #

Sigmoid functions can also be used to compare numeric values. The parameter $$\theta \geq 0$$ specifies the distance value at which the similarity is $$0.5$$ . The parameter $$\alpha>0$$ specifies the steepness of the similarity decrease: the smaller $$\alpha$$ the steeper is the decrease.

$$sim(q,c) = \frac{1}{e^{\frac{d_{norm}(q,c)-\theta}{\alpha}}+1}$$

The following graph shows the similarity values for two different values of $$\alpha$$ , namely $$\alpha_1$$ and $$\alpha_2$$ with $$\alpha_1 \lt \alpha_2$$ . As already said, the sim value decreases more steeply the larger $$\alpha$$ is. The following parameters can be set for this similarity measure.

ParameterType/RangeDefault ValueDescription
alphaGrowth (double)1.0The parameter expects an Double value given, that specifies the steepness of the similarity decrease. The smaller alpha, the steeper is the decrease.
thetaGrowth (double)1.0The parameter is used to set the distance value, at which the similarity is 0.5.

The similarity measure can be defined in the sim.xml like below:

<NumericSigmoid name="SMNumericSigmoid" class="Double" alpha="0.1" theta="0.5"/>


To create this measure during runtime, use the following code:

SMNumericSigmoid smNumericSigmoid = (SMNumericSigmoid) simVal.getSimilarityModel().createSimilarityMeasure(SMNumericSigmoid.NAME, ModelFactory.getDefaultModel().getDoubleSystemClass());
smNumericSigmoid.setAlpha(0.1);
smNumericSigmoid.setTheta(0.5);