Retrieval

Retrieval #

This page contains the following content:

Basic Information #

ProCAKE implements various retrieval approaches that use the similarity measures as defined in the similarity model.

A new instance of a retrieval algorithm is created using the method newRetriever(String retrieverName) from the factory RetrievalFactory. The names of the retrievers provided by ProCAKE can be obtained from the interface SystemRetrievers.

For example, an instance of the linear retriever can be created as follows:

Retriever retriever = RetrievalFactory.newRetriever(SystemRetrievers.LINEAR_RETRIEVER);

The case base for the retrieval is given to the retriever in form of an object pool (instance of ReadableObjectPool):

retriever.setObjectPool(pool);

To perform a retrieval, a query must be provided. This query can be created using the retriever method newQuery(). Every query is an object of a data class. The query object can be set as follows:

Query query = retriever.newQuery();
query.setQueryObject(queryObject);
query.setRetrieveCases(true);

It is important to note that similarity measures are defined for all data classes used when creating the query object.

The query object is also used for specifying parameters for the retrieval. For example, the number of the results returned by the retriever can be limited using the method setNumberOfResults(int number). Another parameter is a threshold for the similarity between the query and an object to include this object to the result list. It is also possible to state whether the objects that were retrieved should be returned. It can be set with the method setRetrieveCases(boolean retrieveCases) and checked with isRetrieveCases(). By default, only the IDs of retrieved objects are returned. The default values of the parameters can be obtained from the corresponding interface Query.

To run the retrieval, the retriever method perform(Query query) is used. The method returns a RetrievalResultList, which contains RetrievalResult objects:

RetrievalResultList retrievalResultList = retriever.perform(query);

For iterating the results, a for-each-loop can be used:

for (RetrievalResult result : retrievalResultList) {
   doSomething(result);
}

Retrieval Approaches #

Linear #

The LinearRetriever performs a k-NN search by computing all similarities between a given query and each case in the case base. For the calculation, the cases of the case base are iterated one by one sequentially.

A LinearRetriever can be created as follows:

Retriever linearRetriever = RetrievalFactory.newRetriever(SystemRetrievers.LINEAR_RETRIEVER);

Parallel Linear #

The ParallelLinearRetriever performs a k-NN search by computing all similarities between a given query and each case in the case base. For this calculation, it uses all available computing cores of the respective CPU. Therefore, each worker thread (usually one worker per CPU core) extracts a certain number of cases from the case base and computes their similarities to the query. This is repeated until all cases of the case base are processed. An ParallelLinearRetriever instance can be created analogous to the LinearRetriever. However, to use the specific methods of the ParallelLinearRetriever, an explicit cast as an object of this class is necessary. This is done as follows:

ParallelLinearRetriever parallelLinearRetriever = (ParallelLinearRetriever) RetrievalFactory.newRetriever(SystemRetrievers.PARALLEL_LINEAR_RETRIEVER);

The user can define a number of parameters that control the retriever’s behavior:

  • setSorting (boolean sorting): The retriever sorts the elements of the case base based on their complexity before the retrieval. This can be useful when comparing NEST workflow graphs by prioritizing bigger cases before smaller cases, in order to maintain a high utilization of CPU cores towards the end of the retrieval. Although the ParallelLinearRetriever is implemented for case bases that contain arbitrary data objects, the sorting method can only be used for case bases that contain NEST workflow graphs.
  • setTaskSize (int taskSize): The taskSize specifies the amount of cases that each worker extracts from the case base during a single computation step. For instance, setting this number to 1 means that only one case is extracted by a single worker from the case base at a time. Analogously, a value of 10 means that ten cases are extracted in parallel and the similarity of all ten cases is computed before the next cases are extracted.

Graph A* Parallel #

The GraphAStarParallelRetriever is a retriever specifically tailored to the retrieval of cases represented as NEST workflow graphs. It applies the A* search in parallel for the entire case base. For more information, please refer to the paper Similarity Assessment and Efficient Retrieval of Semantic Workflows.

Obviously, the retriever can be only applied to graph-based cases and further requires an existing definition of an A* similarity measure in the similarity model.

A GraphAStarParallelRetriever instance can be created as follows:

Retriever graphAStarParallelRetriever = RetrievalFactory.newRetriever(SystemRetrievers.GRAPH_ASTAR_PARALLEL_RETRIEVER);

MAC/FAC #

The different retrieval possibilities in ProCAKE also include a generic way for using MAC/FAC retrievers. A MAC/FAC retrieval is two-staged, where the first phase, i.e., the MAC phase, uses a lightweight similarity measure to filter out (presumably) irrelevant cases and the second phase, i.e. the FAC phase, computes the final similarities for the candidates resulting from the MAC phase. This way, not all cases have to be evaluated during the FAC phase, which, assuming the similarity measure used in the MAC phase is computationally inexpensive, results in reduced retrieval times. However, since the MAC phase might disregard highly similar cases according to the similarity measure of the FAC phase, the time of the retrieval is reduced at the expense of its quality.

For the implementation in ProCAKE, this means that a MAC/FAC retriever is build-up from two regular retrievers, i.e., the class Retriever<>. In order to create a custom MAC/FAC retriever from two regular retrievers, a subclass of AbstractMACFACRetriever has to be used. Therefore, RetrievalFactory has the method createNewMACFACRetriever(Retriever,Retriever) which takes as arguments the MAC retriever and the FAC retriever and returns a MAC/FAC retriever that is configured to use those retrievers. The returned retriever is an instance of an anonymous subclass of AbstractMACFACRetriever that can be used like a normal retriever by calling the method perform(Query).

Unlike normal retrievers, MAC/FAC retrievers use a special kind of query, i.e., MACFACQuery. A MACFACQuery stores additional information that can be used during a MAC/FAC retrieval. Currently, this is the filterSize and the minMACSimilarity. The filterSize controls the number of results that are (at maximum) taken from the MAC phase to being evaluated in the FAC phase. Thus, the parameter controls how many cases are evaluated by the computationally complex similarity measure of the FAC phase. The minMACSimilarity also controls the number of cases to evaluate by the FAC similarity measure, but this time, by setting a similarity threshold for cases. If both values are set, then the more restrictive one is applied, i.e., the parameter that results in less result cases for the MAC phase.

In the text before, it is shown how MAC/FAC retrievers can be defined by setting the MAC retriever and the FAC retriever at runtime. It is also possible to define MAC/FAC retrievers at compile time by creating subclasses of AbstractMACFACRetriever where MAC retriever and FAC retriever are both fixed. An example of such a retriever is the FeatureMACFACRetriever that uses a similarity measure on graph features (see this paper) as an inexpensive MAC similarity measure. The FAC retriever is the standard LinearRetriever. If you want to instantiate your custom MAC/FAC retriever at runtime, make sure to register it in the composition.xml.