Dependency Guided Retrieval

Dependency-Guided Retrieval #

This page contains the following content:

In ProCAKE, the Dependency-Guided Retrieval (DGR) approach is implemented. This is based on the paper “Considering Inter-Case Dependencies During Similarity-Based Retrieval in Process-Oriented Case-Based Reasoning”. The DGR is an approach to integrate inter-case dependencies into the retrieval phase of CBR. It was developed for process-oriented cases, but can also be used in general. The background is described in the paper, on this page the application in ProCAKE is presented. In the demo project there is an example application for the DGR approach. The domain used there comes from the paper and is presented in it.

Definition of dependencies in XML #

Knowledge about dependencies is created in ProCAKE using XML. A separate schema exists for this purpose. A corresponding file is divided into three parts:

  1. Dependency Types: Here the different types of dependency are created.
  2. Dependency Similarities: Similarities between the different dependency types are defined here.
  3. Dependencies: In this part the actual dependencies are created.

A corresponding file starts with the DependencyModel tag and is outlined as follows:

<DependencyModel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cake.wi2.uni-trier.de/xml/dependency" xsi:schemaLocation="http://cake.wi2.uni-trier.de/xml/dependency dependency.xsd">
   ...
</DependencyModel>

Definition of Dependency Types #

When defining DependencyTypes, basically one tag is used for each type. In this, the type is given a name which must be unique to identify it. If type names are used twice, an exception is thrown. There is also an attribute graph_item_information_required which is a boolean value. This is set to false by default. In this case, we are dealing with dependencies that exist between two cases and only require the cases themselves as attributes. If the value is set to true, the similarity type will only work for process-oriented cases. In this case, a dependency requests the corresponding graph items as attributes in addition to the cases. Further, references to the semantic descriptions can optionally be set.

When defining a DependencyType it is possible to enter a description of the dependency within the tag. This is used for documentation and traceability of the dependency, in the actual application this is not used.

An XML definition of DependencyTypes can look like this:

dependencies.xml

    <DependencyTypes>
        <DependencyType name="AlternativeRecipe">
            Dependency, that points to one or more recipes that are suitable alternatives to the current recipe.
        </DependencyType>
        <DependencyType name="CommonIngredient" graph_item_information_required="true">
            Dependency, that describes when two recipes use the same ingredients.
        </DependencyType>
    </DependencyTypes>

In this example, two similarities have been created: AlternativeRecipe and CommonIngredient. The corresponding descriptions can be found in the code snippet. The dependency CommonIngredient can only be used with NEST graphs because of the required graph item information.

Definition of Dependency Similarities #

To specify the similarities between dependencies, the XML tag DependencySimilarities is used. Within this tag, a single tag is used to specify the similarity values for one dependency type to another. The types are specified with the attributes source and target respectively. Both must refer to predefined dependency types, otherwise an exception is thrown. As a further attribute value exists, which specifies the similarity value. This must be given as a double value.

With the definition of the similarities always the direction is to be considered, these are in principle not symmetrical. For all dependency type similarities that are not defined in the XML, the similarity is basically 0.0. The only exception is dependencies that point to themselves. Here the similarity is 1.0 unless another value is defined.

An example XML definition might look like this:

dependencies.xml

    <DependencySimilarities>
        <DependencySimilarity source="AlternativeRecipe" target="CommonIngredient" value="0.0"/>
        <DependencySimilarity source="CommonIngredient" target="AlternativeRecipe" value="0.1"/>
    </DependencySimilarities>

Here the similarities of the dependency types defined above are defined. The similarity of AlternativeRecipe to CommonIngredient is 0.0, so this would not necessarily need to be defined in the XML (but it can still be for clarity). In the reverse direction, however, the similarity is 0.1, so this is not symmetric.

Definition of Dependencies #

Finally, the actual dependencies can be defined. For this the XML tag Dependencies is used, in which again each dependency is created with its own tag. This contains as attribute type, which in turn must refer to an already created type. For this, the name is again used as identifier. If an attempt is made to create a dependency of an inexistent type, an exception is thrown. In addition, there are two attributes sourceCase and targetCase, which contain as parameters the IDs of the cases between which the dependency exists. The cases must be stored with this ID in the case base, otherwise an exception is thrown.

If it was specified in the definition of a dependency type that information about graph items is required, the attributes sourceCaseItem and targetCaseItem must also be specified. Both must refer to the respective graph items of the respective cases that are responsible for the similarity. The IDs must exist again like this in the case base, otherwise an exception is thrown. Additionally, the attributes sourceCaseItemSemanticDescriptor and targetCaseItemSemanticDescriptor can be set, which point to the IDs of the semantic descriptions of the set graph items. These fields are optional because the semantic descriptions often do not have an ID, but can be queried by the Graph Items.

Dependencies are not symmetrical. If a dependency is to exist in both directions, it must also be explicitly defined in both directions. There are dependency types, which are per se always symmetrical (like in the example CommonIngredient), nevertheless also these must be modeled in both directions.

An example of an XML definition can look like this:

dependencies.xml

    <Dependencies>
        <Dependency type="AlternativeRecipe" sourceCase="W01" targetCase="W02"/>
        <Dependency type="CommonIngredient" sourceCase="W01" targetCase="W02" sourceCaseItem="DATA_110" targetCaseItem="DATA_118"/>
        <Dependency type="CommonIngredient" sourceCase="W02" targetCase="W01" sourceCaseItem="DATA_118" targetCaseItem="DATA_110"/>
    </Dependencies>

Here, on the one hand, the dependency AlternativeRecipe is created from the first to the second case. This goes only in one direction. For example, this can be the case, if the dependency describes a vegetarian alternative to a recipe. Two further dependencies of the type CommonIngredient are created, which point from the first to the second case and vice versa. Since the type specified that graph item information is required, the IDs of the respective items (data nodes in this case) are also specified. The references to the semantic descriptions are omitted.

Usage of the approach #

The Dependency-Guided Retrieval approach is used in the same way as the standard retriever in ProCAKE. First, an instance of ProCAKE must be started and it must be ensured that a case base has also been read in. Then the DependencyModel can be read in via a method of the CakeInstance class. The method initDefaultDependencyModel(String dependencyModelPath, WriteableObjectPool casebase) requires on the one hand the valid path to the XML file, which should be structured as described above, and the casebase as an object. The call looks like this:

Wiki_DependencyGuidedRetrievalTest.java

    DependencyModel dependencyModel = CakeInstance.initDefaultDependencyModel(PATH_DEPENDENCY_MODEL_DEPENDENCY_GUIDED_RETRIEVAL, casebase);

It is important that in the XML file of the dependency model the same IDs are used for the objects as in the casebase. Otherwise, an exception will be thrown.

The dependency model must be created in any case, even if it is not always used afterwards. Sometimes other classes access the model directly in the background. If it is empty, exceptions are thrown and the retrieval cannot be performed.

After reading in the dependency model, a specific DependencyRetriever must be created. This must receive a reference to the case base analogous to the normal Retriever. Additionally, a parameter can be set using the method setResultsNumber(int resultsNumber) (this is analogous to the parameter ms from the paper). This parameter specifies the minimum number of dependent cases that must be returned as the result of the retrieval. This will be discussed in more detail below. A corresponding call of the retriever can look as follows:

Wiki_DependencyGuidedRetrievalTest.java

    DependencyRetriever dependencyRetriever = new DependencyRetrieverImpl();
    dependencyRetriever.setObjectPool(casebase);
    dependencyRetriever.setResultsNumber(5);
    dependencyRetriever.setAlpha(0.5);

By means of the method setAlpha(double alpha) an additional weighting parameter is set, which must be in range [0.0, 1.0]. This parameter is used in the later similarity calculation and indicates to which part the dependency similarities and the similarities between the data objects are involved in the overall similarity. This will be explained in more detail later in the formula.

After creating the retriever, a query object can be created via the newQuery() method analogous to the standard retriever. This is an instance of the DependencyQuery class. The call looks like this:

Wiki_DependencyGuidedRetrievalTest.java

    DependencyQuery dependencyQuery = dependencyRetriever.newQuery();

A dependency query, unlike a traditional query, contains several individual data objects as well as knowledge about the dependencies between them. The data objects can be arbitrary objects, but they can be of the same type. They are added to the dependency query using the addQueryObject(DataObject queryObject) method. Below is an example of a dependency query that contains two NEST graphs as query objects.

Wiki_DependencyGuidedRetrievalTest.java

    // Create query object1
    NESTWorkflowObject queryObject1 = objectUtil.createNESTWorkflowObject();
    NESTWorkflowModifier modifier = queryObject1.getModifier();
    NESTDataNodeObject queryObject1DataNode = modifier.insertNewDataNode(objectUtil.createDataSemantic("rice"));

    // Create query object2
    NESTWorkflowObject queryObject2 = objectUtil.createNESTWorkflowObject();
    NESTWorkflowModifier modifier2 = queryObject2.getModifier();
    NESTDataNodeObject queryObject2DataNode = modifier2.insertNewDataNode(objectUtil.createDataSemantic("rice"));

    // Add data objects to query
    dependencyQuery.addQueryObject(queryObject1);
    dependencyQuery.addQueryObject(queryObject2);

Here, the object objectUtil is used as an instance of the class DataObjectUtils to create simplified data objects. As with standard retrieval, the individual query objects can be arbitrarily detailed. The subsequent similarity calculation for the individual objects is performed using the standard calculation in the CBR.

After the individual query objects have been added to the dependency query, the dependencies still have to be added. These are added using the addDependency(Dependency dependency) method. Here, a new instance of the DependencyImpl class must be created for each call. The class has three different constructors. All of them refer by name to a DependencyType, which must be defined beforehand in the XML, as well as to the source and the target object. The following constructors exist:

  1. DependencyImpl(String dependencyType, DataObject sourceCase, DataObject targetCase): This constructor can be used for all dependencies that do not have any additional graph item information. If an appropriate boolean flag was set in the XML and this constructor is used, an exception is thrown.
  2. DependencyImpl(String dependencyType, DataObject sourceCase, DataObject targetCase, NESTGraphItemObject sourceCaseItem, NESTGraphItemObject targetCaseItem): This constructor is for dependencies that exist only between NEST graphs and require graph item information. The additional objects sourceCaseItem and targetCaseItem must be set. There is the special case that they are set to null. This means that a dependency of that particular type must be present, but the corresponding graph items between which it is present do not matter.
  3. DependencyImpl(String dependencyType, DataObject sourceCase, DataObject targetCase, NESTGraphItemObject sourceCaseItem, NESTGraphItemObject targetCaseItem, DataObject sourceCaseItemSemanticDescriptor, DataObject targetCaseItemSemanticDescriptor): This constructor is analogous to the second constructor, but additionally contains references to the semantic descriptions. If the second constructor is used, these links are automatically set in the background if semantic descriptions are available.

Adding dependencies to a dependency query can look like this:

Wiki_DependencyGuidedRetrievalTest.java

    dependencyQuery.addDependency(new DependencyImpl("AlternativeRecipe", queryObject2, queryObject1));
    dependencyQuery.addDependency(new DependencyImpl("CommonIngredient", queryObject1, queryObject2, queryObject1DataNode, queryObject2DataNode));
    dependencyQuery.addDependency(new DependencyImpl("CommonIngredient", queryObject2, queryObject1, null, null));

In this example, three dependencies are defined in the query, which are required by the later case. The first dependency is at the case level. The second dependency is on concrete data objects. Here it is checked later whether the data objects contain the respective value (here the string rice within an aggregate object of the class DataSemantic) at these places. With the third dependency these objects are set to null. Thus, only one dependency of this type must be present, but the concrete objects do not matter. (In this concrete example, the third dependency will always be contained if the second is contained, since the second is symmetric and thus the dependency exists in the reverse direction).

After creating the dependency query, the perform(DependencyQuery dependencyQuery) method can be called analogously to the standard retriever, so that the retrieval runs in the background. Here a DependencyRetrievalResultList is returned, which contains individual DependencyRetrievalResults. This can look as follows:

Wiki_DependencyGuidedRetrievalTest.java

    DependencyRetrievalResultList list = dependencyRetriever.perform(dependencyQuery);

A DependencyRetrievalResult contains a list of combined cases. These were created in the background during the retrieval process. A distinction is made between full and partial matches (see paper). A full match contains exactly the required dependencies as specified in the dependency query. Thus, unless otherwise defined, the combined similarity between the dependencies is 1.0. For partial matches, the similarity between dependencies is the average of the individual dependency similarities. The Full Matches function here as a kind of heuristic. The retriever was given a minimum number of results in advance. If so many or more results were found, only the full matches are used. Otherwise, partial matches are also calculated, which requires more computing time, since the best possible mapping of the dependencies to each other is searched for. It is also possible that too few dependent cases are found despite the use of full and partial matches. In this case, a warning is issued and the retrieval is continued with the reduced number. In the DependencyRetrievalResult, in addition to the list of combined cases, there is a DependencySimilarity that contains the calculated similarity value for the case combination. This is calculated according to the following formula:

\(sim(Q_Q, dc) = \alpha \cdot sim_D(Q_Q,dc) + (1 - \alpha) \cdot \displaystyle \sum_{i=1}^{m} sim(q_i,c_i)\)

Here, \(Q_Q\) describes the single queries contained in the dependency query and \(dc\) a single combined dependent case. \(\alpha\) points to the weight factor that was set when the retriever was created. \(sim_D(Q_Q,dc)\) describes the Dependency Similarity, which was calculated as an average of the individual Dependency Similarities. \(\displaystyle \sum_{i=1}^{m} sim(q_i,c_i)\) is the average of all calculated local similarities between the individual data objects.