Finding “outliers” or exceptions can be useful to detect credit card fraud, telephone calling card fraud, analysis of performance statistic of professional athletes, exploration of satellite and medical images and many more where the occurrence patterns that are exceptions may need special attention. Therefore, the outliers may point out surprising and suspicious activities, extreme or relatively extreme values or observations (or a subset of values/observations) which appear to be inconsistent with the remainder of the set of data.

The sheer volume of data is becoming larger day by day. For example, many companies already have data warehouses in terabytes. Similarly, scientific data is reaching gigantic properties. While scientists have traditionally been able to deal with small datasets containing a very small number of attributes; dataset size and number of dimensions have proven to be a key obstacle to the analysis of data especially when data can not be fit in memory. Thus, implementation of data mining ideas in high performance parallel and distributed environments is thus becoming crucial for ensuring system scalability and interactivity as data continues to grow inexorably in size and complexity.  Similarly, mining outliers from a huge database is a complex and time consuming problem as well as other data mining tasks. Using parallel and distributed processing significantly can reduce the total time required for the discovery process. 

In this research, we aim to develop efficient distributed algorithm for detecting outliers from physically distributed computing systems.