Meta Data Management
|
In this section, we describe the ObjectGlobe lookup service that finds relevant resources for a query and the
parser and optimizer that try to find a good plan to execute a query.
Lookup Service
The lookup service plays the same role in ObjectGlobe as the catalog or meta-data management of a traditional query
processor. Every provider must register its services before it can
participate in ObjectGlobe.
The ObjectGlobe parser and optimizer consult the lookup service in
order to find all relevant resources to execute a query and get
statistics. Furthermore, end users can use the lookup service to
browse through the meta-data and search for available query
capabilities and data sources for their applications. In some sense,
therefore, ObjectGlobe's lookup service can also be seen as a search engine for data and query processing
capabilities; it also bears some similarity with the Jini lookup
service [Wal99] or with X.500 [CCI89] and LDAP
directory services [WHK97].
The difference is that the ObjectGlobe lookup service is geared to the
particular requirements of an open and secure distributed query processor.
The ObjectGlobe lookup service records the following information:
- data provider: each collection of objects stored by a data
provider and the attributes of each collection are recorded
by the lookup service. Each collection is associated to a theme; for example, www.HotelBook.com and www.HotelGuide.com
provide two different collections associated to the theme
hotel. A collection can be seen as a horizontal partition, but
two collections of the same theme may have different attributes.
In addition, all iterators that
can be used to scan through a collection are recorded. Furthermore,
the lookup service records if a collection provided by a data
provider is a replica (i.e., mirror) of a collection provided by
some other data provider.
- cycle provider: the CPU power, size of main memory, and
temporary disk space of each cycle provider is recorded.
- function provider: the name and signature of each query
operator is recorded. Furthermore, the requirements in terms of CPU
speed, size of main memory and disk space to execute each query
operator is kept by the lookup service. ObjectGlobe differentiates
between iterators like join or display and transformers such as thumbnail. (In
addition, ObjectGlobe also has special categories for predicates and aggregate functions.) Any kind of
function, however, will automatically be wrapped by ObjectGlobe into
an iterator so that we ignore these distinctions in this paper and use
the words function and query operator
interchangeably for the general concept.
- statistics: the lookup service stores any available
information that helps the optimizer to estimate the cost (in $ and
in response time) of a plan; e.g., histograms to estimate the
selectivity of simple (i.e., non-external) predicates, latency
and bandwidth of the interconnects between two cycle providers,
typical load of cycle providers as a function of time, and possibly
the URL of functions that can be used to estimate the cost of
query operators.
- authorization information: the lookup service maintains access control lists that store which data may be processed at
which cycle provider and by which query operator. It is also
possible to restrict the execution of specific query operators at
certain cycle providers. Following the ObjectGlobe authorization
model, it is possible to specify positive and negative
authorizations [RBKW91]. Also, it is possible to group collections,
functions, and cycle providers into ``authorization classes'' in
order to reduce the overhead of maintaining and processing this
information in the lookup service.
To give a concrete example, Appendix B shows an
example RDF document that can be used by a data provider that registers a
hotel collection. It is important to keep in mind that all
providers are autonomous and have their own local authorization
policies. The meta-data kept in
the lookup service mirrors that information and, thus, this meta-data
can be outdated or incomplete. It is possible, for instance, that a data
provider revokes the grant of some cycle providers to process its
data without notifying the lookup service; as a result, the
execution of an ObjectGlobe query might fail due to an authorization
violation at execution time. ObjectGlobe relies on data, function,
and cycle providers notifying the lookup service if important
meta-data changes. If a plan fails due to stale meta-data in the
lookup service, all the relevant meta-data is invalidated so that
providers that do not update their meta-data do not participate in the
ObjectGlobe federation.
As an alternative, [CZH+99] propose to use a time-to-live scheme; in that scheme, providers must periodically
contact the lookup service if they want to participate in the federation.
The lookup service is currently implemented on top of a relational
database system. Meta-data (i.e., RDF documents) are mapped to binary
tables as described in [FK99]. Search requests are
translated into SQL join queries. This translation is not one-to-one
as the lookup service hides the details of how the meta-data is
stored. Clients of the lookup service, for example, can ask for all
cycle providers that are allowed to process objects of a specific
collection; the lookup service will answer such a query considering
all groups of cycle providers as well as all positive and negative
authorizations. Translating search requests into SQL queries
is quite complicated (albeit straightforward) and describing all the
details is beyond the scope of this paper.
Currently, the lookup service is implemented as a centralized
component of the ObjectGlobe system. Obviously, a centralized lookup
service can easily become a bottleneck of the whole system.
Therefore, we are currently working towards a distributed/hierarchical
lookup service (similar to the hierarchical name server we described
in [EKK97]) with domain-specific
replication and caching of meta-data.
Lehrstuhl für Datenbanksysteme
Letzte Änderung:
25.05.2005 um 14:34:37