Parallel and Distributed Data Mining: An Introduction

The explosive growth in data collection in business and scientific fields has literally forced upon us the need to analyze and mine useful knowledge from it. Data mining refers to the entire process of extracting useful and novel patterns/models from large datasets. Due to the huge size of data and amount of computation involved in data mining, high-performance computing is an essential component for any successful large-scale data mining application. This chapter presents a survey on large-scale parallel and distributed data mining algorithms and systems, serving as an introduction to the rest of this volume. It also discusses the issues and challenges that must be overcome for designing and implementing successful tools for large-scale data mining.

Author information

Authors and Affiliations

  1. Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180 Mohammed J. Zaki
  1. Mohammed J. Zaki