Fakultät für Informatik TU München - Fakultät für Informatik
Lehrstuhl III: Datenbanksysteme
Technische Universität München
Home  |  Personen  |  Forschung  |  Lehre  |  Sonstiges  | 

The Window Algorithm

Introduction

The Window Algorithm was developed to accommodate the demand for a method to calculate the exact result of the BestmatchJoin on streaming data. Thus, the two main characteristics of the Window Algorithm have already been mentioned:

The second characteristic distincts the Window Algorithm from approximative algorithms for calculating the BestmatchJoin. Whereas approximative methods in general do not generate all correct result tuples and can furthermore generate "false" tuples, i.e. tuples that are not correct result tuples, the Window Algorithm generates all correct result tuples of the BestmatchJoin and only those.

How can this be achieved? Of course, to be able to calculate the exact result of the BestmatchJoin on streaming data, certain preconditions and constraints must hold. First of all, the streaming input data must be sorted in ascending order in one dimension (i.e. in one attribute). Secondly, only those pairs of tuples are accepted in the join result whose join attribute values have a distance of no more than a given epsilon value from each other. This is why the calculated BestmatchJoin is called "constrained BestmatchJoin". The epsilon values can be different for each join dimension (i.e. pair of corresponding join attributes from the two input streams).
We shall limit our considerations to the constrained left-outer BestmatchJoin (LOBMJE) here. Other variants of the join work similarly.


Description of the Algorithm

Simply put, the Window Algorithm works like this: At the beginning, two data windows are initialized, one for each input stream. Both data windows have identical upper- and lowerbounds. Initially, they are set to zero and to the epsilon value of the sorted dimension respectively. Then the windows are filled with all input tuples that fit into them, that means, whose sorted attribute values lie within the window boundaries. Then the contents of the two data windows are joined using any appropriate join implementation and partial order for determining if one pair of tuples dominates another. Hereby generated pairs of potential result tuples are written to a temporary buffer. This must be done because at a later stage, better pairs might be generated and the before generated pair must then be removed from the buffer. Hence, potential result tuples cannot be written to the join output immediately but must be buffered. After that, the data windows are updated like this: All tuples that lie on the lowerbound of the left data window are removed from the window. Then, the window boundaries of both data windows are updated following one of four possible cases:

Now, all tuples in the right data window that lie below the new lowerbound are removed. Also, all pairs of potential result tuples in the temporary buffer whose left tuples lie below the new lowerbound (i.e. whose values of the sorted attribute are smaller than the new lowerbound) are removed from the buffer and written to the output as they cannot be dominated by a better pair anymore (no more pair will be generated in the according epsilon environment). Then the data windows are filled again with all tuples from the input streams that fit into the windows with the new boundaries. Notice that this works fine and the data windows always move up along the sorted dimension because the tuples in the input streams are sorted in ascending order in that dimension. In the last of the above cases, the contents of the two data windows are then again joined at the beginning. In all other cases however, only parts of the data windows are joined to avoid the generation of duplicates. Here, only the "old" part of the left data window (the tuples in the window that already were in the window prior to the last update) is joined with the "new" part of the right data window (the tuples that were written into the window during the last update step). Furthermore, the "new" part of the left data window is joined with the whole right data window ("old" and "new" part together). This is shown in the pictures below.
All this loops until the first of the above four cases occurs and the join computation ends having generated all correct result tuples of the constrained left-outer BestmatchJoin (and only those).

LOBMJE1 LOBMJE2
LOBMJE3 LOBMJE4
Figure 1: Join computation for the constrained left-outer BestmatchJoin


People


Lehrstuhl für Datenbanksysteme
Letzte Änderung: 25.05.2005 um 14:37:47