Intervals

Interval #

The data types in the query and in the case are intervals. We name the query interval \(x_i = [x_{ilb},x_{iub}]\) , the case interval \(y_i = [y_{ilb},y_{iub}]\) , and a possible intersection \(z_i = [z_{ilb},z_{iub}]\) , each defined by a lower bound and an upper bound. The calculation of the similarity between two intervals does not only depend on the intervals themselves, but also on a given local similarity measure \(sim_{Ai}\) . This local similarity measure could, for instance, be the numeric linear measure.

The interval in the case can be interpreted in a way that a valid value exists in this range, but it is not sure where it is. It can’t be assumed that valid values are in the intersection of query and case interval. Therefore, three strategies must be distinguished:

  • The optimistic strategy assumes that valid points are in the intersection. With the imprecise query, one point is requested from all the possible points of the precise case. Therefore, we just have to distinguish the following two situations: \( z_i =\not \emptyset\) : The local similarity is one, because an arbitrary element of \(z_i\) or \(z_i\) itself can be returned. In other words, the maximum similarity is returned because there is some kind of intersection.

    • \(z_i = \emptyset\) : The local similarity must consider the distance from \(x_{ilb}\) to \(y_{iub}\) and \(x_{iub}\) to \(y_{ilb}\) , so that \(sim^*_{Ai}(x_i,y_i) = \max\left(sim_{Ai}(x_{ilb},y_{iub}), sim_{Ai}(x_{iub},y_{ilb})\right)\) . Thereby, the similarity decreases with an increasing distance between query and case intervals.
    • Summarizing, a local similarity measure can be defined as:
    • \(sim^*_{Ai}(x_i, y_i) = \begin{cases} \qquad \qquad \quad 1 &\text{, if } z_i =\not \emptyset \\max\begin{pmatrix} sim_{Ai}(x_{ilb}, y_{iub}), \\ sim_{Ai}(x_{iub}, y_{ilb})\end{pmatrix} & , otherwise\end{cases}\)
  • The pessimistic strategy assumes that no valid point is in the intersection which leads to the local similarity:

    \(sim^*_{Ai}(x_i, y_i) = \underset{\forall x_{is}\in x_i, y_{it}\in y_i}{min}\{sim_{Ai}(x_{is},y_{it})\}\)
  • The average strategy calculates the probability that a valid point is in the intersection \(z_i\) , e.g. by calculating the relation between the intersection size and the case interval size. Consequently, the similarity measure is defined as:

    \(sim^*_{Ai}(x_i, y_i) = \begin{cases} \frac{|y_{iub}-y_{ilb}|}{|z_{iub}-z_{ilb}|} &\text{, if } z_i =\not \emptyset \\ 0 &, otherwise \end{cases}\)

For example, using this measure on the intervals \(Q = [0,2]\) and \(C = [1,3]\) , the intersection is \(z = [1,2]\) . Using the optimistic strategy, the similarity is 1.0, because there is an intersection. Using the pessimistic strategy, the lowest similarity value between all points of both intervals is taken. In this example, a Numeric Linear measure is used, with min = 0 and max = 5. The lowest similarity is \(sim_{Ai}(0,3) = 0.4 \) , so this value is returned. Using the average strategy, the similarity is computed as \(sim_{Ai}(\frac{3-1}{2-1}) = 2 \) , so the similarity is 1.0.

Using this measure on the intervals \(Q = [0,1]\) and \(C = [2,3]\) , there is no intersection. Using the optimistic strategy, the NumericLinear measure from above is used. So, the following similarities are computed: \(sim_{Ai}(x_{ilb}, y_{iub} = sim_{Ai}(0,3) = 0.4 \) and \(sim_{Ai}(x_{iub}, y_{ilb}) = sim_{Ai}(1,2) = 0.8 \) . So, the maximum value 0.8 is chosen and returned as overall similarity. Using the pessimistic strategy and also the NumericLinear measure, the similarity is \(sim_{Ai}(0,3) = 0.4 \) . Using the average strategy, the similarity is 0.0, because there is no intersection.

The following parameters can be set for this similarity measure.

Parameter Type Default Value Description
strategy Strategy (String) optimistic The parameter is used to set the strategy for the similarity computation. The interval in the case can be interpreted in a way that a valid value exists in this range, but it is not sure where it is. It can’t be assumed that valid values are in the intersection of query and case interval. Therefore, the three strategies optimistic, pessimistic and average can be used.

This measure can be defined in the similarity model like below:

<Interval name="SMInterval" class="IntervalDataClass" strategy="optimistic"/>

Here, the class IntervalDataClass must refer to a Interval class. Instead of optimistic also average and pessimistic can be chosen as strategies.

SMIntervalImpl smInterval = (SMIntervalImpl) simVal.getSimilarityModel().createSimilarityMeasure(SMInterval.NAME, ModelFactory.getDefaultModel().getIntervalSystemClass());
smInterval.setDataClass(ModelFactory.getDefaultModel().getClass("IntervalDataClass"));
smInterval.setStrategy(Strategy.OPTIMISTIC);
simVal.getSimilarityModel().addSimilarityMeasure(smInterval, "SMInterval");