Skip to content

Latest commit

 

History

History
31 lines (16 loc) · 1.19 KB

File metadata and controls

31 lines (16 loc) · 1.19 KB

##Elastic MapReduce Example

To run this example, first download the enron data set and put the messages.bson file into a bucket on S3.

Export the variable S3_BUCKET to refer to the name of the s3 bucket to use as the location for code and input/output files.

Set the environment variables for AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID accordingly.

###Files

#####update_s3.sh

Run this file to place the necessary .jar files into your S3 bucket, and sets the permissions to be readable.

This script requires the use of s3cp which you can install using gem install s3cp or get from here.

#####emr-bootstrap.sh

This is the file that will run on each node in the cluster to download dependencies, such as the Java MongoDB driver and the mongo-hadoop core code, and put them into a classpath location for Hadoop.

#####run_emr_job.sh

This script submits the job to Elastic MapReduce. Requires elastic-mapreduce-ruby.

###Source code

The Java source code for this example can be found in the directory /examples/enron in the source repo.