Stoat is an Apache Hadoop computing facility running Cloudera CDH5.4 with the following components:
Hadoop is an open-source software framework for distributed storage and distributed processing of large datasets. You can find more information about Hadoop from the [[https://hadoop.apache.org/][Apache Software Foundation]
Cloudera is a software company that provides Apache Hadoop based software, support and services. Cloudera's hadoop distribution, (CDH), includes several apache licensed open source projects that work with hadoop.
Requesting an account
Please contact us to be added to the stoat group.
Accessing the cluster
In order to access the cluster, you need to be on campus or connected the General Campus VPN
If you want to monitor your jobs with a web browser, you will need to configure your web browser to use a ProxyServer
Use SSH to initiate a session to the login node for the cloud cluster:
From the login node you can launch your jobs.
Submit a job
Run following command under Linux shell of the login node
hadoop jar YourJar.jar YourClass CommandLineArguments
hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar pi 10 10
Spark (YARN Client Mode)
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client $SPARK_HOME/lib/spark-examples.jar 100
Monitoring your jobs
You can monitor applications at http://rm.stoat.pdl.local.cmu.edu:8088/
(note that you have to configure a ProxyServer