Step 1: Search Computing Theory and Optimization


The first step of the SeCo project will take place during the first 18 months of the project. It will focus on search computing theory and the statistical and optimization methods (items A-C of the synopsis). The results of PRIN project described in part B1 of the state of art will evolve in a first nucleus of solid results proving the feasibility of the approach. Problems to be covered will grow in complexity, from simple situation featuring the combination of two search services to broader situations open to an arbitrary number of services. In Step 1 we will consider static scenarios, but the environment will gradually evolve, to include dynamic scenarios in the third step and adaptation in the fourth step.

  1. KIGS. Step 1 will produce, as key intermediate goal KIG1, the definition of a search computing infrastructure, supporting basic operations for registering services, for registering queries, for expressing queries and their formal properties, for associating them with execution strategies, for computing the cost of each execution strategy and for determining the global ranking by combining local rankings of each individual search service.
  2. RESULTS. In achieving KIG1, we will produce:
    R1: State of art on search computing infrastructure, available at month 9.
    R2: Report on search computing infrastructure, available at month 18.
    P1: Execution environment prototype, a first prototype implementation of the execution environment, which will allow us to test the service integration approaches upon statically defined search services over different domains, available at month 18. This step will advance the state-of-the-art as reviewed in sections A, B1 and C4.
  3. MEASURES. Metrics for measuring the success of this step will be the number and heterogeneity of registered services, the good fitting of cost models (correspondence between modelled and actual costs), and the precision and recall of information retrieved with given queries according to the execution strategies.
  4. RESOURCES. Step 1 will require 20% of SeCo resources.


Step 2: Search Computing Languages and Methods

The second step of the SeCo project will take place between months 7 and month 30, in partial overlap with Step 1. It concerns the definition of queries upon search services through language abstractions and description formalisms (items D-E of the synopsis). Languages will enable the development of search computing applications. Therefore, emphasis on language abstraction will gradually evolve into service computing engineering (item K of the synopsis), i.e., a set of methodologies, tools and best practices helping developers in setting up software applications which benefit from search computing.

  1. KIGS. Step2 will produce the key intermediate goals KIG2 and KIG3.
    KIG2 is the definition of search computing languages, ranging from highly abstract, declarative languages down to detailed, imperative languages, with clear relationships between them (in terms of expressive power and translation requirements).
    KIG3 is the definition of search computing methods, consisting of a clear definition of the steps that must be followed by service integration developers for building service integration queries, using the specialized languages defined above and assisted by design tools.
  2. RESULTS. In achieving KIG2 and KIG3, we will produce:
    R3: Report on search computing languages, available at month 24.
    R4: Report on search computing methods, available at month 30.
    P2: Prototype supporting specialized languages and design tools: languages and methods defined in this step will use as execution environment the infrastructure developed as prototype P1, that will be suitably enhanced due to the new requirements introduced at Step 2, yielding the prototype P2, available at month 30. At this stage it will be possible to start focused experimentations of search computing with users and collect their feedbacks – although this will be the main focus of step 4. This step will advance the state-of-the-art as reviewed in sections A and C.
  3. MEASURES. The success of Step 2 will be measured though semi-formal metrics which are classic of language design and software engineering, expressing how developers are supported in expressing queries, testing and debugging them, monitoring their performances and using these parameters as feedbacks.
  4. 4. RESOURCES. This step will require 30% of SeCo resources. The end of Step 2 (at month 30) is a significant milestone, marking the transition from an initial phase of SeCo dedicated to basic research in infrastructure, languages and methods, to a following phase which is more interdisciplinary and exploratory. Reports R1-R4 will collectively constitute the midterm report. Adjustments to the work plan, if required, could take place at this point.


Step 3: Higher-Order and Semantic Search

The third step of SeCo will take place between months 30 and 54. It is the most innovative and presents highest risks in the SeCo project, therefore research on this topic will start earlier and will be conducted by dedicated PhD students and post-docs, which will be enrolled during the first year of the project and will contribute to it until the end of the project. With the addition of dynamic choice of search services, we will attack the higher-order ranking problem, i.e. the problem of dynamically ranking search engines and then selecting search services at query time (item H of the synopsis); this entails collecting information about search service usage, by monitoring search performances. We will also augment search computing effectiveness with increasingly sophisticated semantic injections, by using vocabularies, taxonomies, and ontologies within selected domains (item G of the synopsis). Step 3 will be conducted through experiments which will address specific domains (such as, e.g., bio-informatics, health, urban management, and entertainment); from experiments, we will try to generalize and abstract some general methods and techniques.

  1. KIGS. Step3 will produce the key intermediate goals KIG4 and KIG5.
    KIG4 achieves multi-level ranking, where the ranking between sources and/or searched elements uses object-based models, with nodes associated with objects and interrelated by semantically weighted links. The ingredients of this step will include computing object similarities and their clustering. User behaviour monitoring and source reputation as observed at Step 4 will contribute to object weights.
    KIG5 achieves semantically enriched search computing, supporting dynamic search services selection. This in turn requires their characterization, both in terms of semantics (understanding what the service is doing) and efficiency/effectiveness (understanding how the service is performing).
  2. RESULTS. In achieving KIG4 and KIG5, we will produce:
    R5: Report on higher-order search computing, available at month 48.
    R6: Report on semantically enriched search computing, available at month 54.
    P3: Step 3 will contribute to the final prototype P3, available at month 54, by integrating into prototype P2 higher-order search and semantic components. P3 will support semantic search service descriptions at registration time and search service discovery at query execution time; for what concerns semantic descriptions, we will use existing vocabularies, taxonomies, and ontologies, by focusing on few (from one to three) selected domains. This step will advance the state-of-the-art as reviewed in section B2 and C2, with a specific focus on search services.
  3. MEASURES. The main measure of success in this step will be the trade-off analysis and comparison between the qualities of prototype P2 and the additions progressively added into P3. We expect an increase of effectiveness (measured in terms of improved search result hits) but also a decrease of performance (measured in terms of response time and required computing resources; we will define when performance losses are acceptable).
  4. RESOURCES. This step will require 25% of SeCo resources. Note that statistical models and ranking based upon probabilistic inference are the subject of Step 1 and they use search service profiles which are made available at service registration (as a finite set of parameters), while high-order ranking and semantic reasoning is the subject of Step 3, used for dynamically determining the search services and search objects which best match with the user query by means of weighted resource graphs and of semantic descriptions of services and domain-specific existing vocabularies, taxonomies, and ontologies. Thus, the two aspects address different phases of the SeCo project and apply in separate stages of a given search application. In the first, high-order ranking and semantic methods are applied to determine the search space by identifying the sources and retrieving some of the relevant objects. In the second, statistical models and ranking based upon probabilistic inference are applied to the retrieved objects. If necessary, the two stages can be repeated broadening the search space and retrieving additional objects.


Step 4: Search Computing in Context

The fourth step of SeCo will be conducted during months 30-54, in parallel to step 3. It concerns the study of human-computer interfaces and of individual and social interaction with search computing systems (items F. and J. of the synopsis); we will build experimental settings, where individual and collective behaviour will be subject to analysis and verification.

  1. KIGS. Step4 will produce the key intermediate goals KIG6 and KIG7.
    KIG6 is a human-computer interface for search computing, concerned with improving end-user interaction by acting upon a variety of aspects, ranging from visual interfaces, to interaction paradigms, to result ranking and visualization, to analyzing the users’ reactions to new search methods (with psychological and pedagogical concerns).
    KIG7 is the provision of adaptive search computing supporting individuals and communities. We will study the individual and collective experience of searching; we will use social analytics algorithms for extracting search-relevant knowledge from users behavior (e.g., popularity of sources / documents / items) and recommendation of sources/items. Individual and collective rankings will contribute to the characterization of search services and therefore will be added as parameters used for the dynamic selection, discussed at step 3. For both KIG5 and KIG6, we intend to reuse results which are made available by specialized research communities, and tailor them to the requirements of search computing.
  2. RESULTS. In achieving KIG6 and KIG7, we will produce:
    R7: Report on search computing interfaces, available at month 42.
    R8: Report on search computing adaptive components, available at month 48.
    P3: Step 4 will contribute to the final prototype P3, available at month 54, by integrating into prototype P2 improved user interfaces and its extension with adaptation mechanisms based on personalization and community interaction.
  3. MEASURES. Results will be measured with specific experiments, which need to be planned, performed, and validated according to methods which are accepted in the HCI and adaptation communities. The main measure will concern trade-off analysis and comparison between prototype P2 and the progressive additions into P3 for improving the human-computer interaction and for supporting adaptation to individuals and to communities. Comparisons with other systems will be also conducted, to determine the advantages introduced in general by search computing and more specifically by the SeCo platform.
  4. RESOURCES. This step will require 15% of SeCo resources.


Step 5: Search Computing Evolution

At the conclusion of the SeCo project, we will consider the use of new hardware architectures for a deployment within powerful computing infrastructures (computing clouds, item I of the synopsis) and the study of economical and legal issues related to business models for search computing (items L and M of the synopsis). The fifth step of SeCo will be conducted in the final part of the project (months 42- 60); however, the study of economic and security impact of search computing will start earlier and be conducted throughout the SeCo project, so as to make sure that the technology being invented can be supported by feasible business models and does not infringe any protection requirements. For achieving both goals, we will interact with external contributors, who will help us in defining a first exploratory approach to these issues – possibly hinting to research directions that could yield to follow-up research proposals after SeCo; we expect to raise the interest of other scientific communities, who could in turn route their interests towards search computing. In the last 6 months of the project, resources will also be dedicated to extensive dissemination of all research results, to experimentation of prototype P3, and to the delivery of R11, “Final plan for the use and dissemination of foreground”.

  1. KIGS. Step5 will produce the key intermediate goals KIG8 and KIG9.
    KIG8 is a design of high-performance search computing system, producing a first-level design of a high-performance infrastructure for search computing (we will not produce prototype implementations for Step 5 in the context of SeCo).
    KIG9 is the economic and legal aspects of search computing, including the study of business model which will foster the diffusion of a search service economy, and the legal and protection/privacy aspects involved in deploying and using search services.
  2. RESULTS. In achieving KIG8 and KIG9, we will produce:
    R9: Report on high performance search computing, available at month 60.
    R10: Report on economic and legal aspects of search computing, available at month 60.
    R11: Final plan for the use and dissemination of foreground, available at month 60.
  3. MEASURES. Measuring Step 5 will be harder due to its exploratory nature. Performance aspects can be simulated based upon either analytical or stochastic models. Economic and legal aspects can be measured by means of articles appearing in specialized literature.
  4. RESOURCES. This step will require 10% of SeCo resources. Reports R5-R10 will be collectively constitute the final report.