Optimization of join operators in Search Computing
Search Computing systems gather data from sources on the web that capture different aspects of a user query. In order to compute the result to be presented to the user, the retrieved data need to be joined. Joins in this context are different from the traditional relational setting for a number of aspects: accessing data sources is costly, since they are typically remote; data sources can be accessed according to limited patterns, i.e. some inputs need to be provided; often, the returned items are ranked according to a score; the output is returned in pages of results.
In the Search Computing model, query execution is performed according to plans in which data sources might be accessed either sequentially or in parallel. This gives rise to two different kind of joins: parallel join and pipe join (the output of a data source is used as input for the other source).
In this context we propose to investigate the following problems:
- Cost-aware optimization of pipe-joins.
- Score- and cost-aware optimization of parallel joins.
- Analysis of cost models that account for parallism during join execution.
- Login to post comments
- Printer-friendly version