Mining Association
Rules in Time Series Data
Discovery of structures in multi-stream of data
is an important problem with great significance. A dependency is an
unexpectedly frequent or infrequent co-occurrence of events over time. A stream
is a sequence of values produced over time. In a time series, an event is an
important occurrence. The definition of an event is dependent on the time
series data-mining goal. This indicates that an event on one stream is related
to other events on other streams, which seems to be independent from the former
event time series patterns. The stock price is a good example for such
dependencies. Rise and fall of price on some stocks obviously cause price of
one stock to rise and fall. If we analyze the multi-stream of time series for
some stock price and we can discover dependencies between all streams, and the
dependencies can help us to decide better time to buy stocks. So, the
dependencies can be expressed as rules. In our case, these dependencies are
called motion association rules. Strong dependencies capture structure in the
streams because it indicates that there is relationship between their
constituent patterns that is occurrences of those patterns are not independent.
The association rule discovery problem usually translates into finding all sets
of patterns of dependencies that satisfy a pre-specified minimum threshold on
support, and then post-processing them to find the interesting rules. Such
dependencies are called frequent. The association rules usually predict the
occurrence of some other set of dependencies with certain degree of confidence.
l Parallel Algorithms for Mining Association Rules in Time
Series Data
It
is quite effective to find how current and past values in the streams of data
are related to the future. However, these data sets with high dimensionality
are enormous in size results in possibly large number of mined dependencies.
This strongly motivates the need of efficient parallel algorithms. In this research,
we introduced a parallel algorithm to discover dependency from the large amount
of motion data. For example, association rule discovered from motion data about
walking is “when right hand is up then the left hand and left knee are down”.
Since motion data is multi-stream data of 3-D time series and the amount of
data is huge and expensive. We introduce the method of extracting
sequence of symbols from the time series data by using segmentation and
clustering processes. To reduce the search space and speed up the process we
investigate the technique to group the time series data. The experimental
results conducted on a shared memory multiprocessors system justifies the
inevitability of using parallel techniques for mining huge amount of data in
the time series domain. As far as present research concern, we are
concentrating to develop new algorithms without grouping technique as it may be
constraints for other datasets in the time series data mining.
l Parallel Mining of Sequential
Patterns in Temporal Multi-Stream
It is to mention that in above approach we need to specify the number of
combinations before the discovery process starts. It infers that before
commencing the discovery process, the algorithm is to be provided the information about the amount of
combinations from which the rules can be discovered. From the data mining point
of view, this problem is an important problem. The reason is due to the fact
that data mining algorithms should not be developed for any kind of specific
system or particular problem of multi-stream data. It has to be general for the
problem in that domain. Thus, we are motivated to develop a new algorithm for
this purpose considering the data mining goal. In the new approach, we do not want
to concern about the amount of combinations. The aim is to find the rules for any
number of combinations. Hence, the main problem lies on how to discover all the
sequences of the body parts in minimum time that take part in any kind of
motion performed by a human. To achieve the objective of parallel data mining,
i.e. to speed up the discovery process, we introduce another approach of mining
sequential body parts as another parallel algorithm. Due to the depth (large
sequence) among the sequence of the parts that perform motions of the body in
the motion database, it is complex and very time swallowing matter to determine
the associations among the body parts that perform any kind of motion. The
lattice based approach helps to decompose original search space into smaller
pieces termed as the suffix based classes which can be processed independently
in main-memory considering the advantages of the Shared Memory processors
(SMP). The decomposition is recursively applied within each parent class to produce
even smaller classes in the next level. As a future research we aim to develop
newer algorithms.
Association Rule Mining in a large transactional
database is an important problem in the field of knowledge discovery. When a
database is partitioned among several shared nothing machines, the problem can
be addressed by the distributed data mining algorithm. Several algorithms have
been reported in the literature. These algorithms are able to discover good
amount of rules from the datasets but the main problem lies in that they do not
scale well with the number of positions or locations. The algorithms are mainly
concerned with reducing the number of database scans. This reduction is
necessary because the database is usually very large and is stored in secondary
memory. So, basically two main things need to consider for solving the problem.
They are to parallelize disk I/O for independent parallel computations as well
as communicate with one another within the different parts of the database
located in different machines. However, parallelizing I/O is itself an easy
task as shown by the above algorithms, but still the main difficulty is
sustained with communication complexity. Our research mainly aims at developing
algorithm that is the most communication efficient and scalable as well.
Furthermore, this research considers time series data for the purpose.
l Efficient Parallel Programming Techniques in a
CLUMPS for Mining of Time Series Data
Message Passing Interface (MPI) is generally
used as a parallel programming interface for cluster of workstations, commodity
of clusters, distributed computing systems etc for the past several years.
However, communication overhead is a main factor in this purpose which can not
be always beneficial considering with speed up. While using in a Cluster of
SMPS (CLUMPS), this may not sustain its advantage using only the MPI
programming. Recently, OpenMP, a new standard of API
is becoming popular due to its ease of use and efficiency in SMP programming.
This research aims to develop new techniques for a collaboration of MPI and OpenMP in the purpose of data mining. MPI can be used for
communication calls among the SMPS in the CLUMS and on the other hand OpenMP can be used for the internal computation within a
SMP.