Fakultät für Informatik TU München - Fakultät für Informatik
Lehrstuhl III: Datenbanksysteme
Technische Universität München
Home  |  Personen  |  Forschung  |  Lehre  |  Sonstiges  | 



Overview:

We present the design of ObjectGlobe, a distributed and open query processor. Today, data is published on the Internet via Web servers which have, if at all, very localized query processing capabilities. The goal of the ObjectGlobe project is to establish an open market place in which data and query processing capabilities can be distributed and used by any kind of Internet application. Furthermore, ObjectGlobe integrates cycle providers (i.e., machines) which carry out query processing operators. The overall picture is to make it possible to execute virtually any kind of query operator on any machine and any kind of data on the Internet. One of the main challenges in the design of such an open system is to ensure security. We discuss the ObjectGlobe security requirements, show how basic components such as the optimizer and runtime system need to be extended, and present the results of performance experiments that assess the additional cost for secure distributed query processing.

Introduction

The World Wide Web has made it very easy and cheap for people and organizations all over the world to exchange data. Today, virtually everybody can publish a document by generating HTML (or XML) and placing it on some Web server; likewise, it is more or less standard to make data stored in relational (or other) databases publicly available on the Web by establishing form-based interfaces and by using CGI scripts or Servlets. WWW clients can retrieve individual documents by a simple ``click'' and they can get specific information from a database (behind the Web server) by filling out a form. In other words, WWW clients today can easily execute ``point queries'' (i.e., given URL, return document) and they can execute queries that can be handled by a single database behind a Web server.

The goal of the ObjectGlobe project is twofold. First, we would like to create an infrastructure that makes it as easy to distribute query processing capabilities (i.e., query operators) as it is to publish data and documents on the Web today. Second, we would like to enable clients to execute complex queries which involve the execution of operators from multiple providers at different sites and the retrieval of data and documents from multiple data sources. In contrast to Applets, all query operators should be able to interact in a distributed query plan and it should be possible to move query operators to arbitrary sites, including sites which are near the data. The only requirement we make is that all query operators must be written in Java and conform to the secure interfaces of ObjectGlobe.

We believe that our ObjectGlobe system can help to develop new application scenarios and new ways in which people and organizations interact on the Internet. An organization, for instance, could outsource all or part of its data processing to specialized providers on the Internet. As another example, WWW clients can query the Web and carry out different operations on different data sources. Providers could charge for data and new query operators. A data provider (e.g., a car dealer) could also be interested in participating in ObjectGlobe in order to supply its product catalog for free. Furthermore, ObjectGlobe can serve as an experimental platform in order to test new distributed query processing techniques; a researcher could implement a new distributed join method and test its performance with real data using our demo installation in Passau and other sites.

In some sense, the ObjectGlobe system can be seen as a distributed query processor. ObjectGlobe has a lookup service (i.e., a meta-data repository) which registers all data sources, operators, and machines on which queries can be executed. The lookup service is used by the ObjectGlobe optimizer in order to discover relevant resources for a query. The optimizer generates a query evaluation plan with the goal to execute the query with as little cost as possible. This plan is then initiated and executed by the execution engine. The design of all of these components has been addressed in previous work. Jini, for example, has a related lookup service [Wal99], and projects like Mariposa [SAL+96] or Garlic [HKWY97] (to name just two) have recently studied wide-area distributed query processing. What makes the ObjectGlobe system special is its ``brutal'' openness that in principle allows to execute any kind of (Java) operation on any machine and on any kind of data. One particular issue that needs to be addressed in this kind of system is ``security'' and how to protect data (and other resources) from unauthorized access. Another challenge is to ensure scalability in the number of sites. In this paper, we will describe the approaches we have chosen to address these challenges and give some initial performance results obtained using our system. The development of techniques for ``schema integration'' in a distributed and heterogeneous environment is not the target of our work because this has been addressed in other work (e.g., [SL90]); we assume that all data is in a standard format (e.g., relational or XML) or wrapped [RS97]. Although, ``selling'' services is one of the main motiviations for the ObjectGlobe project, the system does not require a particular business model; many different business models can be implemented on top of ObjectGlobe.

   
Overview of the ObjectGlobe System

The goal of the ObjectGlobe project is to distribute powerful query processing capabilities (including those found in traditional database systems) across the Internet. The idea is to create an open market place for three kinds of suppliers: data providers supply data, function providers offer query operators to process the data, and cycle providers are contracted to execute query operators. ObjectGlobe enables applications to execute complex queries which involve the execution of operators from multiple function providers at different sites (cycle providers) and the retrieval of data and documents from multiple data sources. In this section, we will outline how such queries are processed, give an example, and discuss the security requirements of the system.


  
Overview


Figure 1: Processing a Query in ObjectGlobe
\begin{figure}\begin{picture}
(0,0)%
\epsfig{file=stages.pstex} %
\end{picture}%...
...gsave 0 0 0 setrgbcolor}query\special{ps: grestore}}}}
\end{picture}\end{figure}

Query Processing in ObjectGlobe

Processing a query in ObjectGlobe involves four major steps. These four steps are illustrated in Figure 1:

1.
Lookup: In this phase, the ObjectGlobe lookup service is queried to find relevant data sources, cycle providers, and function providers that might be useful to execute the query. In addition, the lookup service provides the authorization data to determine what resources may be accessed by the user (or application) who initiates the query and what other restrictions apply for processing the query.
2.
Optimize: From the information obtained from the lookup service, a cost-based query optimizer compiles a low-cost and valid (as far as user privileges are concerned) query execution plan. This plan is annotated with site information indicating on which cycle provider each operator is executed and from which function provider the external query operators involved in the plan are loaded.
3.
Plug: The generated plan is distributed to the cycle providers and the external query operators are loaded and instantiated at each cycle provider. Furthermore, the communication paths (i.e., sockets) are established.
4.
Execute: the plan is executed following an iterator model [Gra93]. In addition to the external query operators provided by function providers, ObjectGlobe has built-in query operators for selection, projection, join, union, nesting, unnesting, and sending and receiving data. If necessary, communication is encrypted and authenticated. Furthermore, the execution of the plan is monitored in order to interfere and possibly halt the execution of the whole plan in case of failures.
The whole system is written in Java, for two reasons (Currently, the optimizer is written in C++, but we are planning to rewrite it in Java.). First, Java is portable so that ObjectGlobe can be installed with very little effort; in particular, cycle providers which need to install the ObjectGlobe core functionality can very easily join an ObjectGlobe system. The only requirement is that a site runs a Java virtual machine. Second, Java provides secure extensibility. Like ObjectGlobe itself, external query operators are written in Java, they are loaded on demand (from function providers), and they are executed at cycle providers in their own Java ``sandbox'' (more details in Section `Security'). To provide an external query operator a simple uniform interface must be implemented; in addition, the new external query operator must be registered in the lookup service. Likewise, data providers and cycle providers must register in the lookup service before they can participate.

ObjectGlobe supports a nested relational data model; this way, relational, object-relational, and XML data sources can very easily be integrated. Other data formats (e.g., HTML), however, can be integrated by the use of wrappers that transform the data into the required nested relational format; wrappers are treated by the system as external query operators. As shown in Figure 1, XML is used as a data exchange format between the individual ObjectGlobe components. Part of the ObjectGlobe philosophy is that the individual ObjectGlobe components can be used separately; XML is used so that the output of every component can easily be visualized and modified. For example, a user can browse through the lookup service in order to find interesting functions which he/she might want to use in the query. Furthermore, a user can look at and change the plan generated by the optimizer.

   
An Example

To illustrate query processing in ObjectGlobe, let us consider the example shown in Figure 2. In this example, there are two data providers, A and B, and one function provider. We assume that the data providers also operate as cycle providers so that the ObjectGlobe system is installed on the machines of A and B. Furthermore, the client (not shown in Figure 2) can act as a cycle provider in this example. Data provider A supplies two data collections, a relational table R and some other collection Swhich needs to be transformed (i.e., wrapped) for query processing. Data provider B has a (nested) relational table T. The function provider supplies two relevant query operators: a wrapper ( wrap_S) to transform S into nested relational format and a compression algorithm (thumbnail) to apply on an image attribute of T.


  
Figure 2: Distributed Query Processing with ObjectGlobe
\begin{figure}\begin{center} \begin{picture} (0,0)%
\epsfig{file=load2.pstex} % ...  ...0 0 setrgbcolor}T\special{ps:
grestore}}}} \end{picture}\end{center}\end{figure}

Figure 3 shows a graphical representation with the most important annotations of the XML query plan for this example. (The real XML plan is given in Appendix A.) In particular, Figure 3 shows the host, source, and codeBase annotations. The host annotation of an operator indicates at which machine (i.e., cycle provider) the operator is executed; e.g., the display operator is executed at the client. The source annotation of a scan iterator indicates which collection is to be read. The codeBase annotation indicates from which function provider an external query operator is read. scan, display, and the joins are built-in operators so that they do not have a codeBase annotation.


  
Figure 3: The Annotated Query Execution Plan
\begin{figure}\begin{center}
\begin{picture}
(0,0)%
\epsfig{file=QEP.pstex} %
\e...
...size host=client}\special{ps: grestore}}}}
\end{picture}\end{center}\end{figure}

   
Security Requirements in ObjectGlobe

Safety is one of the crucial issues in an open and distributed system like ObjectGlobe. ObjectGlobe provides the infrastructure to deal with the following security issues:

Protection of Cycle and Data Providers:
It has to be ensured that the resources of the cycle and data providers are protected from (possibly malicious) external operators loaded from unknown function providers. Based on the Java security model, all query operators are therefore executed in a protected area, a so-called sandbox (Section Security).

Secure Communication:
The communication streams between the ObjectGlobe servers have to be protected against unauthorized access and modification. We adhere to the well-established standards (SSL and TLS) for encrypting and digitally signing messages between ObjectGlobe engines.
Authentication:
ObjectGlobe supports a flexible authentication policy. Users and applications that only access free and publicly available resources can be anonymous and no authentication is required. If a user accesses a resource that charges and accepts electronic money, then the user can again stay anonymous and the electronic money is shipped as part of the ``plug'' step. Authentication is only required for authorization or accounting purposes of providers; in this case, users must provide certificates or other authentication information (e.g., passwords). Cycle providers can also require authenticated external operators to restrict the providers; e.g., to execute only code originating from trusted sources within the same company or Intranet.
Authorization:
Some providers constrain the access or use of their resources to particular user groups. In general, providers apply their own autonomous authorization policy and control the execution of, say, query operators at their site themselves. In order to generate valid query execution plans and avoid failures at execution time, ObjectGlobe must know about these authorization constraints and register them in its lookup service.

Lehrstuhl für Datenbanksysteme
Letzte Änderung: 25.05.2005 um 14:34:37