You are here: PDLWiki>Stoat Web>WebHome (17 Aug 2017, MitchFranzos1)Edit Attach

Stoat Cluster

About Stoat

Stoat is an Apache Hadoop computing facility running Cloudera CDH5.4 with the following components:
  • HDFS
  • Hive
  • Oozie
  • Sqoop2
  • YARN
  • ZooKeeper

About Hadoop

Hadoop is an open-source software framework for distributed storage and distributed processing of large datasets. You can find more information about Hadoop from the [[][Apache Software Foundation]

About Cloudera

Cloudera is a software company that provides Apache Hadoop based software, support and services. Cloudera's hadoop distribution, (CDH), includes several apache licensed open source projects that work with hadoop.

Getting Started

Requesting an account

Please contact us to be added to the stoat group.

Accessing the cluster

In order to access the cluster, you need to be on campus or connected the General Campus VPN.
Proxy Server

If you want to monitor your jobs with a web browser, you will need to configure your web browser to use a ProxyServer.

Use SSH to initiate a session to the login node for the cloud cluster:


From the login node you can launch your jobs.

Submit a job

Run following command under Linux shell of the login node

hadoop jar YourJar.jar YourClass CommandLineArguments
Example Jobs:
hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-examples.jar pi 10 10
Spark (YARN Client Mode)
source /etc/spark/conf/
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client $SPARK_HOME/lib/spark-examples.jar 100

Monitoring your jobs

You can monitor applications at (note that you have to configure a ProxyServer first).
Topic revision: r4 - 17 Aug 2017, MitchFranzos1 - This page was cached on 03 Mar 2018 - 08:02.

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding PDLWiki? Send feedback