FAQ: How do I transfer large files between the Cloud Cluster HDFS and a host on campus that is outside the cluster?

Answer:

Transferring into HDFS
To copy a file into the cluster's HDFS use the following command from the computer containing the source file:
 =$ ssh shell.disc.pdl.cmu.local hadoop dfs -put - /path/on/hdfs < /local/path/to/source/bigfile= 

Into local FS
To copy a file into the cluster's local file system, use the scp command. If your username on opencloud and your local machine is userX, and you want to transfer "/home/userX/myfile.txt" to your home directory, do the following:
scp some_file.txt userX@shell.disc.pdl.cmu.local:path/to/directory/some_file.txt

Transferring out of HDFS
To get a file from the HDFS, use the following command at the computer where the destination file is to be stored:
ssh shell.disc.pdl.cmu.local hadoop dfs -get /path/in/hdfs - > /local/path/to/destination/bigfile 

Out of local FS
To copy a file from the cluster's local file system, to your computer, use the scp command. If you're on your local machine and your username on opencloud and your local machine is userX, and you want to transfer "/h/userX/myfile.txt" to your local machine's home directory, do the following:
scp userX@shell.disc.pdl.cmu.local:/h/userX/myfile.txt ~/

Back to: CloudClusterFAQ

This topic: OpenCloud > WebHome > CloudClusterFAQ > CloudFaqHdfsFileTransfer
Topic revision: 12 Jul 2013, MitchFranzos
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding PDLWiki? Send feedback