# Interval #

The data types in the query and in the case are intervals. We name the query interval \(x_i = [x_{ilb},x_{iub}]\) , the case interval \(y_i = [y_{ilb},y_{iub}]\) , and a possible intersection \(z_i = [z_{ilb},z_{iub}]\) , each defined by a lower bound and an upper bound. The calculation of the similarity between two intervals does not only depend on the intervals themselves, but also on a given local similarity measure \(sim_{Ai}\) . This local similarity measure could, for instance, be the numeric linear measure.

The interval in the case can be interpreted in a way that a valid value exists in this range, but it is not sure where it is. It can’t be assumed that valid values are in the intersection of query and case interval. Therefore, three strategies must be distinguished:

The

**optimistic**strategy assumes that valid points are in the intersection. With the imprecise query, one point is requested from all the possible points of the precise case. Therefore, we just have to distinguish the following two situations: \( z_i =\not \emptyset\) : The local similarity is one, because an arbitrary element of \(z_i\) or \(z_i\) itself can be returned. In other words, the maximum similarity is returned because there is some kind of intersection.- \(z_i = \emptyset\) : The local similarity must consider the distance from \(x_{ilb}\) to \(y_{iub}\) and \(x_{iub}\) to \(y_{ilb}\) , so that \(sim^*_{Ai}(x_i,y_i) = \max\left(sim_{Ai}(x_{ilb},y_{iub}), sim_{Ai}(x_{iub},y_{ilb})\right)\) . Thereby, the similarity decreases with an increasing distance between query and case intervals.
- Summarizing, a local similarity measure can be defined as:
- \(sim^*_{Ai}(x_i, y_i) = \begin{cases} \qquad \qquad \quad 1 &\text{, if } z_i =\not \emptyset \\max\begin{pmatrix} sim_{Ai}(x_{ilb}, y_{iub}), \\ sim_{Ai}(x_{iub}, y_{ilb})\end{pmatrix} & , otherwise\end{cases}\)

The

\(sim^*_{Ai}(x_i, y_i) = \underset{\forall x_{is}\in x_i, y_{it}\in y_i}{min}\{sim_{Ai}(x_{is},y_{it})\}\)**pessimistic**strategy assumes that no valid point is in the intersection which leads to the local similarity:The

\(sim^*_{Ai}(x_i, y_i) = \begin{cases} \frac{|y_{iub}-y_{ilb}|}{|z_{iub}-z_{ilb}|} &\text{, if } z_i =\not \emptyset \\ 0 &, otherwise \end{cases}\)**average**strategy calculates the probability that a valid point is in the intersection \(z_i\) , e.g. by calculating the relation between the intersection size and the case interval size. Consequently, the similarity measure is defined as:

For example, using this measure on the intervals \(Q = [0,2]\)
and \(C = [1,3]\)
, the intersection is \(z = [1,2]\)
. Using the *optimistic* strategy, the similarity is *1.0*, because there is an intersection. Using the *pessimistic* strategy, the lowest similarity value between all points of both intervals is taken. In this example, a Numeric Linear measure is used, with *min = 0* and *max = 5*. The lowest similarity is \(sim_{Ai}(0,3) = 0.4 \)
, so this value is returned. Using the *average* strategy, the similarity is computed as \(sim_{Ai}(\frac{3-1}{2-1}) = 2 \)
, so the similarity is *1.0*.

Using this measure on the intervals \(Q = [0,1]\)
and \(C = [2,3]\)
, there is no intersection. Using the *optimistic* strategy, the *NumericLinear* measure from above is used. So, the following similarities are computed: \(sim_{Ai}(x_{ilb}, y_{iub} = sim_{Ai}(0,3) = 0.4 \)
and \(sim_{Ai}(x_{iub}, y_{ilb}) = sim_{Ai}(1,2) = 0.8 \)
. So, the maximum value *0.8* is chosen and returned as overall similarity. Using the *pessimistic* strategy and also the *NumericLinear* measure, the similarity is \(sim_{Ai}(0,3) = 0.4 \)
. Using the *average* strategy, the similarity is *0.0*, because there is no intersection.

The following parameters can be set for this similarity measure.

Parameter | Type | Default Value | Description |
---|---|---|---|

strategy | Strategy (String) | optimistic | The parameter is used to set the strategy for the similarity computation. The interval in the case can be interpreted in a way that a valid value exists in this range, but it is not sure where it is. It can’t be assumed that valid values are in the intersection of query and case interval. Therefore, the three strategies optimistic, pessimistic and average can be used. |

This measure can be defined in the *similarity model* like below:

```
<Interval name="SMInterval" class="customIntervalClass" strategy="optimistic"/>
```

Here, the class *customIntervalClass* must refer to a *Interval class*. There, also the necessary data classes for this example are shown. Instead of *optimistic* also *average* and *pessimistic* can be chosen as strategies.

```
SMInterval smInterval = (SMInterval) simVal.getSimilarityModel().createSimilarityMeasure(SMInterval.NAME, ModelFactory.getDefaultModel().getClass("customIntervalClass"));
smInterval.setStrategy(Strategy.OPTIMISTIC);
simVal.getSimilarityModel().addSimilarityMeasure(smInterval, "SMInterval");
```