Interface SMStringTermCount

  • All Superinterfaces:
    SimilarityMeasure, SMString
    All Known Implementing Classes:
    SMStringTermCountImpl

    public interface SMStringTermCount
    extends SMString
    Compares two strings using the Term Count algorithm. The comparison depends on the used delimiter.

    The Term Count is a measure of the similarity between two strings, which we will refer to as query and case. It counts the number of terms in query and case. These two numbers are compared, not the terms themself. For example,

    • If the delimiter is "-", the query is "String-Term-Count" and the case is "An-other-example", the Term Count is identical. So the similarity is 1.
    • If the delimiter is "-", the query is "String-Term-Count" and the case is "Another-example", there is a gap of one term. So the similarity is 1 minus the gap divided by the length of the largest array. In this case the similarity is 0.667.

    Similarity

    The similarity between query and case is defined as sim(q,c) = 1 - abs(length(query) - length(case)) / max(length(query), length(case))

    Author:
    Rainer Maximini