Overview
|
Overview:
We present the design of ObjectGlobe, a
distributed and open query processor. Today, data is published on the
Internet via Web servers which have, if at all, very localized query
processing capabilities. The goal of the ObjectGlobe project is to
establish an open market place in which data and query
processing capabilities can be distributed and used by any kind of
Internet application. Furthermore, ObjectGlobe integrates cycle
providers (i.e., machines) which carry out query processing
operators. The overall picture is to make it possible to execute
virtually any kind of query operator on any machine and any kind of
data on the Internet. One of the main challenges in the design of
such an open system is to ensure security. We discuss the ObjectGlobe
security requirements, show how basic components such as the optimizer
and runtime system need to be extended, and present the results of
performance experiments that assess the additional cost for secure
distributed query processing.
The World Wide Web has made it very easy and cheap for people and
organizations all over the world to exchange data. Today,
virtually everybody can publish a document by generating HTML
(or XML) and placing it on some Web server; likewise, it is
more or less standard to make data stored in relational (or other)
databases publicly available on the Web by establishing form-based
interfaces and by using CGI scripts or Servlets. WWW clients can
retrieve individual documents by a simple ``click'' and they can get
specific information from a database (behind the Web server) by
filling out a form. In other words, WWW clients today can easily
execute ``point queries'' (i.e., given URL, return document) and they
can execute queries that can be handled by a single database behind a
Web server.
The goal of the ObjectGlobe project is twofold. First, we would
like to create an infrastructure that makes it as easy to distribute
query processing capabilities (i.e., query operators) as it is
to publish data and documents on the Web today. Second, we would like
to enable clients to execute complex queries which involve the
execution of operators from multiple providers at different sites and
the retrieval of data and documents from multiple data sources. In
contrast to Applets, all query operators should be able to interact in
a distributed query plan and it should be possible to move query
operators to arbitrary sites, including sites which are near
the data. The only requirement we make is that all query operators
must be written in Java and conform to the secure interfaces of
ObjectGlobe.
We believe that our ObjectGlobe system can help to develop new
application scenarios and new ways in which people and organizations
interact on the Internet. An organization, for instance, could
outsource all or part of its data processing to specialized providers
on the Internet. As another example, WWW clients can query the
Web and carry out different operations on different data sources.
Providers could charge for data and new query operators. A data
provider (e.g., a car dealer) could also be interested in
participating in ObjectGlobe in order to supply its product catalog
for free. Furthermore, ObjectGlobe can serve as an experimental
platform in order to test new distributed query processing techniques;
a researcher could implement a new distributed join method and test
its performance with real data using our demo installation in Passau
and other sites.
In some sense, the ObjectGlobe system can be seen as a distributed
query processor. ObjectGlobe has a lookup service (i.e., a meta-data
repository) which registers all data sources, operators, and machines
on which queries can be executed. The lookup service is used by the
ObjectGlobe optimizer in order to discover relevant resources for a
query. The optimizer generates a query evaluation plan with the goal
to execute the query with as little cost as possible. This plan is
then initiated and executed by the execution engine. The design of
all of these components has been addressed in previous work. Jini,
for example, has a related lookup service [Wal99], and projects like Mariposa [SAL+96]
or Garlic [HKWY97] (to name just two)
have recently studied wide-area distributed query processing. What
makes the ObjectGlobe system special is its ``brutal'' openness that
in principle allows to execute any kind of (Java) operation on any
machine and on any kind of data. One particular issue that needs to
be addressed in this kind of system is ``security'' and how to protect
data (and other resources) from unauthorized access. Another
challenge is to ensure scalability in the number of sites. In this
paper, we will describe the approaches we have chosen to address these
challenges and give some initial performance results obtained using
our system. The development of techniques for ``schema integration''
in a distributed and heterogeneous environment is not the target of
our work because this has been addressed in other work (e.g., [SL90]); we assume that all data is in a
standard format (e.g., relational or XML) or
wrapped [RS97]. Although, ``selling''
services is one of the main motiviations for the ObjectGlobe project,
the system does not require a particular business model; many
different business models can be implemented on top of ObjectGlobe.
Overview of the ObjectGlobe System
The goal of the ObjectGlobe project is to distribute powerful query
processing capabilities (including those found in traditional database
systems) across the Internet. The idea is to create an open market
place for three kinds of suppliers: data providers supply data,
function providers offer query operators to process the data,
and cycle providers are contracted to execute query operators.
ObjectGlobe enables applications to execute complex
queries which involve the execution of operators from multiple
function providers at different sites (cycle providers) and the
retrieval of data and documents from multiple data sources. In this
section, we will outline how such queries are processed, give an
example, and discuss the security requirements of the system.
Figure 1:
Processing a Query in ObjectGlobe
 |
Processing a query in ObjectGlobe involves four major steps. These
four steps are illustrated in Figure 1:
- 1.
- Lookup: In this phase, the ObjectGlobe lookup service is
queried to find relevant data sources, cycle providers, and
function providers that might be useful to execute the query. In
addition, the lookup service provides the authorization data
to determine what resources may be accessed by the user (or
application) who initiates the query and what other restrictions
apply for processing the query.
- 2.
- Optimize: From the information obtained from the lookup service,
a cost-based query optimizer compiles a low-cost and valid (as far as user
privileges are concerned) query execution plan. This
plan is annotated with site information indicating on which cycle
provider each operator is executed and from which function provider
the external query operators involved in the plan are loaded.
- 3.
- Plug: The generated plan is distributed to the cycle providers and
the external query operators are loaded and instantiated at each cycle
provider. Furthermore, the communication paths (i.e., sockets) are
established.
- 4.
- Execute: the plan is executed following an iterator
model [Gra93]. In addition to the external
query operators provided by function providers, ObjectGlobe has built-in query operators for selection, projection, join,
union, nesting, unnesting, and sending and receiving data. If
necessary, communication is encrypted and authenticated.
Furthermore, the execution of the plan is monitored in order to
interfere and possibly halt the execution of the whole plan in case
of failures.
The whole system is written in Java, for two
reasons (Currently, the optimizer is written in C++, but we
are planning to rewrite it in Java.). First, Java is
portable so that ObjectGlobe can be installed with very little
effort; in particular, cycle providers which need to install the
ObjectGlobe core functionality can very easily join an
ObjectGlobe system. The only requirement is that a site runs a Java
virtual machine. Second, Java provides secure extensibility. Like
ObjectGlobe itself, external query operators are written in Java, they
are loaded on demand (from function providers), and they are executed
at cycle providers in their own Java ``sandbox'' (more details in
Section `Security'). To provide an external query operator a simple
uniform interface must be implemented; in addition, the new external
query operator must be registered in the lookup service. Likewise,
data providers and cycle providers must register in the lookup service
before they can participate.
ObjectGlobe supports a nested relational data model; this way,
relational, object-relational, and XML data sources can very
easily be integrated. Other data formats (e.g., HTML),
however, can be integrated by the use of wrappers that transform the
data into the required nested relational format; wrappers are treated
by the system as external query operators. As shown in Figure 1, XML is used as a data
exchange format between the individual ObjectGlobe components. Part
of the ObjectGlobe philosophy is that the individual ObjectGlobe
components can be used separately; XML is used so that the
output of every component can easily be visualized and modified. For
example, a user can browse through the lookup service in order to find
interesting functions which he/she might want to use in the query.
Furthermore, a user can look at and change the plan generated by the
optimizer.
An Example
To illustrate query
processing in ObjectGlobe, let us consider the example shown in
Figure 2. In this example,
there are two data providers, A and B, and one function
provider. We assume that the data providers also operate as cycle
providers so that the ObjectGlobe system is installed on the machines
of A and B. Furthermore, the client (not shown in
Figure 2) can act as a cycle
provider in this example. Data provider A supplies two data
collections, a relational table R and some other collection
Swhich needs to be transformed (i.e., wrapped) for query
processing. Data provider B has a (nested) relational table
T. The function provider supplies two relevant query operators:
a wrapper ( wrap_S) to transform S into nested
relational format and a compression algorithm (thumbnail) to
apply on an image attribute of T.
Figure 2:
Distributed Query Processing with ObjectGlobe
 |
Figure 3 shows a graphical
representation with the most important annotations of the XML
query plan for this example. (The real XML plan is given in
Appendix A.) In
particular, Figure 3
shows the host, source, and codeBase
annotations. The host annotation of an operator indicates at
which machine (i.e., cycle provider) the operator is executed; e.g.,
the display operator is executed at the client. The
source annotation of a scan iterator indicates which
collection is to be read. The codeBase annotation indicates
from which function provider an external query operator is read.
scan, display, and the joins are built-in
operators so that they do not have a codeBase annotation.
Figure 3:
The Annotated Query Execution Plan
 |
Security Requirements in ObjectGlobe
Safety is one of the crucial issues in an open and distributed
system like ObjectGlobe. ObjectGlobe provides the infrastructure to
deal with the following security issues:
- Protection
of Cycle and Data Providers:
- It has to be ensured that
the resources of the cycle and data providers are protected from
(possibly malicious) external operators loaded from unknown function
providers. Based on the Java security model, all query operators
are
therefore executed in a protected area, a so-called sandbox
(Section Security).
- Secure Communication:
- The communication streams between the
ObjectGlobe servers have to be protected against unauthorized access
and modification. We adhere to the well-established standards (SSL
and TLS) for encrypting and digitally signing messages between
ObjectGlobe engines.
- Authentication:
- ObjectGlobe supports a flexible
authentication policy. Users and applications that only access free and
publicly available resources can be anonymous and no
authentication is required. If a user accesses a resource that
charges and accepts electronic money, then the user can again stay
anonymous and the electronic money is shipped as part of the
``plug'' step. Authentication is only required for authorization
or accounting purposes of providers; in this case, users must
provide certificates or other authentication information (e.g.,
passwords). Cycle providers can also require authenticated
external operators to restrict the providers; e.g., to execute only
code originating from trusted sources within the same company or Intranet.
- Authorization:
- Some providers constrain the access or
use of their resources to particular user groups. In general, providers
apply their own autonomous authorization policy and control the
execution of, say, query operators
at their site themselves. In order to generate valid query
execution plans and avoid failures at execution time, ObjectGlobe
must know about these authorization constraints and register them in
its lookup service.
Lehrstuhl für Datenbanksysteme
Letzte Änderung:
25.05.2005 um 14:34:37