Fakultät für Informatik TU München - Fakultät für Informatik
Lehrstuhl III: Datenbanksysteme
Technische Universität München
Home  |  Personen  |  Forschung  |  Lehre  |  Sonstiges  | 

The QueryFlow Project

Motivation and Introduction

Virtual electronic market places and virtual enterprises have become important applications for query processing [Jhi00]. Building a scalable virtual B2B market place with hundreds or thousands of participating suppliers requires highly flexible, distributed query processing capabilities. Architecting an electronic market place as a data warehouse by integrating all the data from all participating enterprises in one centralized data repository incurs severe problems:

We propose a reference architecture for building scalable and dynamic market places and a framework for evaluating so-called HyperQueries in such an environment. HyperQueries are essentially query evaluation sub-plans "sitting behind" hyperlinks. This way the electronic market place can be built as an intermediary between the client and the providers executing their sub-queries referenced via hyperlinks. The hyperlinks are embedded as attribute values within data objects of the intermediary's database. Retrieving such a virtual object automatically initiates the execution of the referenced HyperQuery in order to materialize the entire object. Thus, sensitive data can remain under the full control of the data providers. Instead of replicating the data at the intermediary, only the hyperlink is embedded.

An Overview of the QueryFlow System

We give a short overview of the basic architecture of our implementation, the QueryFlow system [1]. The proposed architecture claims to be a reference architecture for building open, extensible, and scalable electronic B2B market places. Furthermore we demonstrate the execution of queries in such an environment. Figure 1 sketches how hyperlinks of the market place refer to HyperQueries at the remote hosts. Thereby the remote hosts have the possibility to implement the HyperQueries using different approaches: at first, a remote host can state an SQL query, second a complex business application or even human input can be used using special wrappers, and finally the remote hosts can delegate the query to further hosts.

HyperQueries are referenced by Hyperlinks
Figure 1: HyperQueries are referenced by Hyperlinks

Architecture of the QueryFlow System

We propose a reference architecture for building scalable electronic market places. During the implementation of our prototypical system, we payed special attention to rely on standardized (or proposed) protocols such as: XML [BPSMM00] and XML Schema [XML00] for all data being processed, SQL for querying data [2], X.509 certificates [HFPS00] and XML Signature [XML01] for authentication, and HTTP [FGM+99] and SOAP [BEK+00] for exchanging data between multiple hosts. Figure 2 depicts the basic components of the system, that can be described as follows:

The Architecture of the QueryFlow System
Figure 2: The Architecture of the QueryFlow System
A full description of these components can be found in [KW01b].

Processing HyperQueries in the QueryFlow System

We demonstrate the HyperQuery technique with a scenario of the car manufacturing industry. We assume a hierarchical supply chain of suppliers and sub-contractors. A typical process of e-procurement to cover unscheduled demands of the production is to query a market place for these products and to select the incoming offers by price, terms of delivery, available quantity, etc. The price of the needed products can vary by customer/supplier-specific sales discounts, the quantity of materials to be provided, duties, plant utilization, etc.

In traditional distributed query processing systems such a query can only be executed if a global schema exists or all local databases are replicated at the market place. Considering an environment, where hundreds of suppliers participate in a market place, one global query which integrates the sub-queries for all participants would be too complex and error-prone.

Following our approach the suppliers have to register their products at the market place, which they want to participate in, and specify by hyperlinks the sub-plans to compute the price information at their sites. These hyperlinks to sub-plans are embedded as virtual attributes into the tables of the market place. For instance hq://supplier1.com/Price?ProdID=1255 would refer to supplier1.com and request at the remote host the calculation of ProdID=1255 with the sub-plan named Price.

The SQL-like query

select   p.ProductDescription, c.Supplier, c.Price 
from     NeededProducts p,  Catalog@MarketPlace c 
where    p.ProductDescription = c.ProductDescription 
order by p.ProductDescription, c.Price 
expires  Friday, May 18, 2001 5:00:00 PM CET
returns the prices and suppliers of all needed products. The query execution is stopped at the latest at the given value of the expires clause. Only the results gathered so far are considered.

When evaluating these hyperlinks, our QueryFlow system distinguishes between two modes: In hierarchical mode (Figure 3(a)) the initiator of a HyperQuery is in the charge of collecting the processed data. Under broadcast mode (Figure 3(b)) data objects are routed directly to the query initiator. The two basic patterns for both modes are shown in Figure 4(a)/(b), where the smaller boxes represent HyperQueries and the Dispatch operator is responsible for routing objects to the HyperQuery given by the hyperlink. The decision, which processing mode is used, relies-with some restrictions-only to the initiator of a HyperQuery. Thus, the initiator determines, if the results should be sent directly to the client, or if the initiator is in charge of collecting the objects processed by the HyperQueries. During the execution trace of one query both processing modes can be mixed and nested to obtain more complex, multi-level scenarios. So, HyperQueries may be arbitrary complex and involve sub-contractors, too. As our system is written in Java and provides secure extensibility, user-defined operators can be integrated into the query plan and are loaded on demand. Thus, special wrappers for accessing legacy systems, applications, JDBC databases, XML data sources, or even human input can be used within the HyperQueries. All our operators are tuned for processing mass data, are pipelined, and the communication operators work push-based, i.e., objects are sent to the next sub-plan when the local processing is finished. A detailed description of HyperQuery processing including security, optimization issues, and implementation details can be found in [KW01a,KW01b].

Hierarchical Processing Broadcast Processing
(a) Hierarchical Processing (b) Broadcast Processing
Figure 3:Execution Traces
(The dashed (red) lines indicate the flow of control and intermediate results, the solid (black) lines indicate the flow of result objects)

Hierarchical Pattern Sequencing Pattern
(a) Hierarchical Mode (b) Broadcast Mode
Figure 4:Patterns for HyperQuery Execution

The execution of the query in our QueryFlow system is demonstrated here.

People Involved into QueryFlow

Results

The following publications are avaliable:

The work on the QueryFlow system is still ongoing.

References

BEK+00 D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winer.
Simple Object Access Protocol (SOAP) 1.1.
http://www.w3.org/TR/SOAP, May 2000.
BKK+01 R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, and K. Stocker.
ObjectGlobe: Ubiquitous query processing on the Internet.
The VLDB Journal: Special Issue on E-Services, 10(3):48--71, August 2001.
BPSMM00 T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler.
Extensible Markup Language (XML) 1.0 (Second Edition).
http://www.w3.org/XML/, October 2000.
CFR+01 D. Chamberlin, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu.
XQuery: A Query Language for XML.
http://www.w3.org/TR/xquery/, June 2001.
FGM+99 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee.
Hypertext Transfer Protocol -- HTTP/1.1.
ftp://ftp.isi.edu/in-notes/rfc2616.txt, June 1999.
HFPS99 R. Housley, W. Ford, W. Polk, and D. Solo.
Internet X.509 Public Key Infrastructure Certificate and CRL Profile.
http://www.rfc-editor.org/rfc/rfc2459.txt, January 1999.
Jhi00 A. Jhingran.
Moving up the food chain: Supporting E-Commerce Applications on Databases.
ACM SIGMOD Record, 29(4):50--54, December 2000.
KW01a A. Kemper and C. Wiesner.
HyperQueries: Dynamic Distributed Query Processing on the Internet.
In Proc. of the Conf. on Very Large Data Bases (VLDB), pages 551-560, Rome, Italy, September 2001.
KW01b A. Kemper and C. Wiesner.
HyperQueries: Dynamic Distributed Query Processing on the Internet.
Technical report, Universität Passau, Fakultät für Mathematik und Informatik, October 2001.
XML00 XML Schema, April 2000.
http://www.w3.org/xml/Schema.
XML01 XML Signature, August 2001.
http://www.w3.org/TR/2001/PR-xmldsig-core-20010820/.


[1] The name of our system was derived from query processing and workflow systems because processing queries with HyperQueries bears some similarities with processing distributed workflows by routing documents to the appropriate tasks.

[2] We plan to support XQuery [CFR+01], too.


Lehrstuhl für Datenbanksysteme
Letzte Änderung: 25.05.2005 um 14:34:43