Hadoop, Amazon Elastic MapReduce and Eclipse
Presented by Seth Ladd
Riding on the excitement from April's presentation on Eclipse and Amazon EC2, we'll be taking a look at a powerful way to take advantage of all that processing potential. Ever wondered how Google crawls and processes the entire web? With incredible amounts of data (on the order of exobytes) being generated across the world, how can one process, analyze, and derive value from it all? At the May meeting of the Eclipse Users' Group, we'll take a look at Hadoop, the open source Map Reduce and distributed file system framework. Map Reduce is a distributed data processing paradigm, originally popularized by Google, designed to utilize clusters of commodity machines to crunch very large data sets. Hadoop is an open source implementation of MapReduce, along with supporting systems and frameworks, that you can use right now on your own machines or on Amazon EC2. We'll also look at Eclipse based tools that can help you write Map Reduce programs. In addition, we'll take a look at Pig, the high level language for writing Hadoop Map Reduce jobs.
Held on 05/26/2009, IBM Honolulu office, 5:30pm
Q & A
Put your questions here! If you know the answer to any of the questions, by all means, feel free to share your knowledge.