Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Google's MapReduce Programming Model -- Revisited
[go: Go Back, main page]

Google's MapReduce Programming Model -- Revisited

Status
Draft, 22 January 2006

Author
Ralf Lämmel

Abstract
MapReduce is a programming model (and an associated implementation) used at Google for processing large amounts of input data (such as millions of documents) and generating keyed or indexed query results (such as the indexes used for Google's web search). The programming model is stunningly simple, and it allows the programmer to largely abstract from the parallel and distributed execution of the data-processing computations. Parallel execution is promoted by the assumed skeleton of MapReduce computations. Load balancing, network performance and fault tolerance are taken care of by the MapReduce implementation.

We revisit the MapReduce programming model in an attempt to provide a rigorous description of the model. We focus on the key abstraction for MapReduce computations; this abstraction is parameterized by the problem-specific ingredients for data extraction and reduction. We use Haskell as a lightweight specification language to capture the essence of MapReduce computations in a succinct, executable and strongly typed manner. Our study substantiates that clarity, generality and correctness of designs (or the presentations thereof) are easily improved, if modest functional programming skills are put to work.

Keywords
Data processing, MapReduce, parallel programming, distributed programming, typed functional programming, map, reduce, list homomorphism, fold, unfold, Haskell.

Bibtex entry
@unpublished{MapReduce,
 author = "Ralf L{\"a}mmel",
 title = "{Google's MapReduce Programming Model -- Revisited}",
 year = 2006,
 month = "22~" # jan,
 note = {Draft; Online since 2 January, 2006; 26 pages}
}

Links



(C) Ralf Lämmel