While an instance of a service is running, log files for this service will be generated. These services run in a common scenery, divided over multiple instances of a service. In the following case, the log files contain the load data for cpu and memory. Because it might be helpful for computing an allocation to know the cyclic behaviour of a given service, the log history of any service will be analysed. So, the goal of my approach is to extract a cyclic behaviour out of the log files and save this cyclic patterns for a later calculation of a perfectly fitting allocation in the services' scenery.
The following figure takes a look at the semantic behaviour of the pattern analysis. The given charts are snapshots taken at the end of the described step.
The following image will give you a short overview of the process. All elements are referencing a detailed description shown below. The blue boxes denote the user interface, the green boxes the preparation process, the orange box the filtering process and the yellow boxes the actual extraction process.
To start an analysis, a instance of class ServiceAnalysis has to be created. The possible constructors are documented using JavaDoc. Depending on the given parameters, different steps are taken to prepare the data. The details of the preparation are described in subsequent sections.
The common way to get an instance of this class is to call the constructor with just the parameter of type ServiceDescription. Using this constructor all necessary preparation steps are taken automatically.
Comments on exceptions, which might occure during the analysis, can be found in the section "Exception handling".
Before the actual extraction can be done, the data must be prepared. Currently, the database directory contains many single files, which must be combined to extract the data. The log files are the cross product of every service host and every service. In order to get the data for a specific service, all log files of this service running on any host must be combined. This procedure is described below.
First the database directory will be opened. The location is defined in the config file internal.xml. If the database directory cannot be accessed or is not present, an IOException is thrown.
The needed data are filtered out of the directory listing. All files with a filename like ServiceAdvisor-[serviceID]-[hostname].csv are needed. The filtering is done via ServiceLogFileFilter which is an implementation of java.io.FileFilter.
To adapt the filtering process, simply a new file filter must be written and included in the current service analysis.
After filtering the data, the log files are parsed. The objects generated during the parsing process are referenced as "input series" below. A set of input series corresponding to just one service is referenced as "multiple input series".
For every basic load file the follwing three statistical parameters are needed:
The density is calculated as ( [number of values] / Δt), with (Δt = [ending point] - [starting point] ). That means, that for (Δt = 0 ), the density is ( -∞).
Having these parameters, we can proceed to the next step and generate a skeleton for combining all log files of a service.
The aequidistant timestamps are referenced as skeleton. These timestamps are calculated using the following procedure. The first timestamp is the earliest starting point of the multiple input series, the last timestamp is the latest starting point. Between these two timestamps there are as many time points as the density of the skeleton is equal to the maximum of the densities of the multiple input series.
That means, that every input series is sampled using aequidistant spaces. If a queried timestamp is not in the input series, the maximum of the left-adjacent and the right-adjacent timestamp is returned.
Now that the timestamps in the resulting time series are known, we query each input series the load at the specific point. After that we substract the base load given in the service description. This is done with every input series at every timestamp of the skeleton. The result is additive superposition of the input series, referenced as "load series".
Returning this result gives a holistic view at the load of a service spanning over multiple instances.
At this point, there is just one object, which combines all the logs of a specific service. We will reference this object as a "load series" in further sections.
By default no filtering is used, but user can specify an Filter object to be used during the process. The load series will then be filtered using this object and the result of the filtering process will be used for further processing.
At the moment, there are two filters implemented, a NullFilter, which does nothing, and a LowPassFilter, which has a standard low pass filter behaviour.
In this section the actual extraction process will be described. The steps before are only needed for the preparation of the following process. The core procedures, Fast Fourier Transformation (FFT) and the iteration steps using covariances are discussed first. After that, the pattern will be cut out of the load series.
This is the core step. Using Fast Fourier Transformation (FFT) the start value for all following steps is calculated. The complex FFT retuns a frequency spectrum of the given load series. The position of the maximum on the f-axis (see above) is the base frequency. This maximum corresponds to a maximum in the fourier coeffecients. The position of the first maximum in the fourier coefficients is the amount of cycles in the load series.
The FFT is implemented, that it returns an array of fourier coefficents, so the result is an integer. To get a more precise value, we do the following step and iterate over the covariances matrix.
This step gets the non-integer maximum of the frequency spectrum. The FFT sets the starting point. Now there are a few functions to evaluate:

The function I(λ) returns the intensity of the harmonic oscillation with frequency λ of the series, adjusted by the mean
.
The first iteration step evaluates I(FFT), that means I("result from previous step"), I(FFT - [step]) and I(FFT + [step]). Then the argument of the maximum is taken as the input of the next iteration step and the step is halved. At the moment these steps are repeated ten times.
To cut off the starting phase, which might lead to incorrect pattern data, we need to find the first so-called valley point.

To find the valley point for the l-th cycle, the process is described using the figure above. Starting at xlt, the measuring point which index is situated next to lT, the next local extemum xE must be searched. Inside the interval [lT, E] the point Sl, with the maximum distance δ between the load series and the stright line, fixed by (lt, xlt) and (E, xE), is searched. The found point is the valley point.
The load series between 0 and the first valley point S1 will be thrown away.
Now that we have thrown away the starting-up phase, we can cut off the pattern. The cutting of the pattern can be done using two different methods. The criterium to use a specific method is the amount of cycles inside the load series. The first method is used, if more than four cycles are found and is described in the first section, the second method is used, if less then four cycles are found and is described in the second section.
Using this method, we have to average the patterns length first. This is done calculating all valley points of the load series. The distances between two adjacent valley points will be averaged and taken as a more precise pattern length. Now that all cycles are known, these cycles can be averaged by building an additive superposition and then by dividing the superposition by the amount of the cycles in the load series. The last cycle must not be part of the superposition, because it is possible, that the last cycle is not complete.
Knowing this method, it is very easy, why there must be at least four cycles to take this method. The first cycle is thrown away as starting-up phase and the last as a potentially incomplete cycle. So there must remain at least two cycles to average them.
This is the simplest way. The first occurence of the pattern is taken, and returned. No other processing is done.
The following sections describe what can be done, once patterns are extracted. The current implementation gives possibilities to load and save patterns using a XML file. The XML file and the object inside the JVM can easily be synchronized.
The patterns are saved in a file called patterns.xml. This file contains all patterns that are calculated, one pattern for each service. The XML file is wrapped as a Patterns object inside the JVM. This class has methods to synchronize the object with the XML file, either to reload the XML file or to store the current object state as a XML file.
The Patterns object is implemented as a singleton. The instance will be created during the parse process of the service descriptions.
To get data out of the calculated patterns, the PatternDescription class has some methods to do so. All calculations respect the cyclic character of a pattern, so querying a value in the patterns past or future is supported. The pattern will be appended as many times as needed to return the queried value. If a value between two sampling points is queried, the value will be interpolated using linear interpolation. This also works in past and future.
There are two hierarchies of exceptions which can be thrown during the analysis. The first hierarchie are the exceptions of type ServiceAnalysisException. The second type is PatternDescriptionException.
The ServiceAnalysisException are thrown during the preparation and the actual analysis. An exception of this type is thrown, if a common problem occured, such as an empty database directory or if the log files contain no or no valueable data. A subclass denotes a more special problem that could be handled through the application. For example, it is possible to get rid of a FilterException by using the standard NullFilter.
The subtree consists of the following exceptions:
ParserException
NoSuitableParserException
FilterException
NoCalculationPossibleException
The NoPatternException is the only subtype of the PatternDescriptionException, which is thrown, if no pattern could be found in the pattern storage. This exception is an indicator, that no pattern is present in the whole system. A useful exception handling is to calculate a new pattern for the given service description.
Furthermore IOExceptions could be thrown, if file access is not possible. In such cases check if all necessary files exist and have correct permissions set.
There are a few possible extensions that can be done:
The first two extensions can be done easily using the already existing interfaces. The second two extensions need more or less changes in the whole source code.