All Superinterfaces:
SimilarityMeasure
All Known Subinterfaces:
SMAggregateAverage, SMAggregateEuclidian, SMAggregateKMaximum, SMAggregateKMinimum, SMAggregateMaximum, SMAggregateMinimum, SMAggregateMinkowski, SMAggregateWeighted
All Known Implementing Classes:
SMAggregateAverageImpl, SMAggregateEuclidianImpl, SMAggregateImpl, SMAggregateKMaximumImpl, SMAggregateKMinimumImpl, SMAggregateMaximumImpl, SMAggregateMinimumImpl, SMAggregateMinkowskiImpl, SMAggregateWeightedImpl

public interface SMAggregate extends SimilarityMeasure
Abstract interface that collects all similarity measures for AggregateClasses.

Global similarity measures are defined by applying an aggregation function Φ to the local similarity values. The simple similarity measures for numeric attributes can be generalized easily to aggregation functions. Such aggregation functions are defined by determining

  • a basic aggregation function and
  • a weight model that determines weights ω = (ω1,\ldots, ωi) such that 0 ≤ ωi ≤ 1 and ∑1n ωi = 1

The default weight is 1.0 for all attributes. To ensure that ∑1n ωi = 1 all weights will be normalized automatically during runtime.

The aggregate measures can be defined in the xml file sim.xml. Therefore, it's necessary that an aggregate class was created in the xml file model.xml, which is referenced in the definition of the measure. It also needs an arbitrary name. In the inner tag, weights for the single attributes can be defined. The aggregate classes Average, Euclidian and Minkowski need weights anyway, otherwise the similarity will always be 1.0. The other classes will have the same weight for each attribute, if no weights are defined.

For example, an aggregate measure can look like:

     <AggregateMinimum name="AggregateMinimumDataflowWeighted" class="DataflowElement" default="false">
          <AggWeight att="name" weight="0.5"/>
     </AggregateMinimum>
 
Author:
Rainer Maximini
  • Field Details

    • DEFAULT_IGNORE_NULL_ATTRIBUTES_IN_QUERY

      static final boolean DEFAULT_IGNORE_NULL_ATTRIBUTES_IN_QUERY
      The default for ignoring null attribute values as void is true.
      See Also:
    • LOG_ATTRIBUTE_NAME_NOT_FOUND

      static final String LOG_ATTRIBUTE_NAME_NOT_FOUND
      See Also:
    • LOG_ATTRIBUTE_NOT_FOUND

      static final String LOG_ATTRIBUTE_NOT_FOUND
      See Also:
    • PROPERTY_USER_WEIGHT

      static final String PROPERTY_USER_WEIGHT
      The query case can contain user weights $w_u$ that are stored in the properties, accessable with this key. The weight $w$ for an attrbibute is the mulitplikation of $w_u$ and $w_c$, the weight defined for the class.
      See Also:
  • Method Details

    • isIgnoreNullAttributesInQuery

      boolean isIgnoreNullAttributesInQuery()
      Returns:
    • setIgnoreNullAttributesInQuery

      void setIgnoreNullAttributesInQuery(boolean ignoreNullAttributesInQuery)
      Parameters:
      ignoreNullAttributesInQuery -
    • getSimilaritiesToUse

      HashMap getSimilaritiesToUse()
      Returns:
      The defined names of the SimilarityMeasures that should be used for the elements.
      See Also:
    • getSimilarityToUse

      String getSimilarityToUse(String attributeName)
      Parameters:
      attributeName - The name of the attribute for which the specific similarity should be returned.
      Returns:
      The name of the similarity measure, that should be used for this attribute. Can be null.
    • setSimilarityToUse

      void setSimilarityToUse(String attName, String similarityMeasure)
      In general, the element objects of the collection are compared with their default similarity measure. But in some situations it can be necessary to use another similarity measure for the elements of a collection. Therefore, it exists the possibility to specify a similarity measure name that should be used instead. For each DataObject a similarity measure with that name should exist. Otherwise, the comparision of objects are ignored.

      Summarizing:

      • If the newValue is null the default measures of the objects are used. This is the default behaviour.
      • If the newValue is the name of a similarity measure, for each data class whose objects can be occured in the collection a similarity measure with this name must exist. Attention, this also include the common super classes of the objects.
      Parameters:
      attName - The name of the element object.
      similarityMeasure - The name of the similarity measure that should be used for the elements.
    • getSimilaritiesToUse

      String getSimilaritiesToUse(String attributeName)
      Parameters:
      attributeName - The name of the element object.
      Returns:
      The defined name of the SimilarityMeasure that should be used for the element.