Folks willing to share interesting datasets are encouraged to add a page here that describes the dataset, how to interpret it and where to find it. Main.GarthGi...
FAQ: How do I check the quota for my allocated storage? Answer: HDFS Storage: To check the quota for your home directory use dfs repquota as follows. Note: Raw ...
FAQ: How do I log on to cluster? Answer: Use SSH to initiate a session to the login node for the cloud cluster: ssh shell.disc.pdl.cmu.local From the login node ...
FAQ: How do I transfer large files between the Cloud Cluster HDFS and a host on campus that is outside the cluster? Answer: $ Transferring into HDFS: To copy ...
FAQ: How to submit Hadoop jobs to the Cloud Cluster Answer: Run following command under Linux shell of the login node, hadoop jar YourJar.jar YourClass CommandLi...
FAQ: How do I access the status page for the jobs? Answer: * First, configure your browser to access the cluster through the proxy serer. See CloudFaqBrowserP...
FAQ: How do I configure memory parameters for my mapreduce jobs? Answer: Memory Parameters: There are several memory parameters configurable for users: * map...
Monitoring Jobs in the Cloud Cluster from Web Browser * Set up your browser's proxy as explained in the Cloud Faq Job Status Page . * Click on the hadoop...
FAQ: What happens when I exceed the allocated storage space? Answer: The answer depends on the type of storage being used. Home Directory: For yourhome directory...
Cloud Cluster Users Meeting Date: 2010 10 19 Cluster Characteristics * Data Intesive Computing Cluster: Designed for large scale data processing workloads tha...
Quick Start Guide for the Cloud Cluster After having applied for an account and received your account information: 1 Log on to the cluster 1 Submit jobs fr...
Cloud Cluster External Shared Storage The cloud cluster has 13 nodes of external storage, running a PVFS2 filesystem. This storage space will be available for all...
FAQ: Is the data in the cloud cluster backed up? Answer: No. The data in the cluster is not backed up. You should make your own copies of the data in order to ...
FAQ: What is the "distributed cache" feature provided by Hadoop? and How can my application use it? Answer: Some jobs require each Map task to read in one or mor...
Reference Manuals and Links for Open Cloud Cluster Hardware and Software NOTE: Please help improving this page by adding manuals and references for the Open Cloud...