What is Hadoop !
Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.
What is MapReduce?
Simplified Data Processing on Large Clusters
- MapReduce is a set of code and infrastructure for parsing and building large data sets. A map function generates a key/value pair from the input data and this data is then reduced by a function to merges all values associated with equivalent keys. Programs are automatically parallelized and executed on a run-time system which manages partitioning the input data, scheduling execution and managing communication including recovery from machine failures
- This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system!