The PDL Narwhal cluster is a "bare metal" computing facility used for system research. Researchers are allocated phyiscal nodes (not virtual machines) for their experiments and have complete remote control of the nodes while the experiment is running. The Narwhal cluster uses a locally customized version of the Emulab
testbed software from University of Utah to manage its nodes. Note that Narwhal contains more than one type of node, so users must a node type when allocating machines (see table below).
Nodes in a PDL Emulab clusters can be allocated and managed using either command-line tools or by writing Emulab NS Script files and uploading them to the cluster (normally done through the web). Documentation for either mechanism can be found here:
Narwhal Node Types
The Following Node Types are available in Narwhal:
| Node Class
|| NS Syntax
|| add-desire rr
|| IBM Blade Server LS21 - 4 Cores AMD Opteron, 16 GiB RAM, 73.6 GB Disk
|| add-desire big
|| 2013 Intel donation - 80 cores, 1 TiB RAM
|| add-desire qsi
|| 2013 Intel donation - 40 cores, 256 GiB RAM
|| add-desire sus
|| Former PRObE SUSITNA nodes - 64 cores, 128 GiB RAM, Nvidia Tesla K20c, 40 GbE and FDR10 InfiniBand
| Same as above but lack 40 GbE
For more about the PDL network take a look at our general Getting Started
RR node network
Two separate physical networks are provided for each RR node. Each node has two 1 GbE ports, each connected to one of the physical networks. One port is used for the Emulab Control Plane, the other will be used for the Emulab Data Plane. There is some oversubscription for both networks: 14:6 at the access layer, 24:20 at the distribution layer, illustrated below.
PDL Emulab installations do not enable or use Emulab's built in mailing list functionality (some generic Emulab documentation refers to this feature, please ignore it).
- You do not have to apply for a Narwhal account: your PDL account will get you access to Narwhal.
- Projects on Narwhal are still managed through Emulab.
- Narwhal nodes are on a private network. All access from Narwhal nodes to the Internet must go through a proxy. The ops node runs a simple web proxy on port 8888, and the PDL central Proxy Servers should also be available.
- You must specify node type when allocating nodes. This can be done by using a node-specific "makebed" script (e.g. "rr-makebed") or by adding a "desire to your NS script (for example:
$node add-desire 1.0).
In NARWHAL, there are multiple types of storage.
- Your home directory
- Project directories (/proj) Your NARWHAL home directory is small and not part of your normal home directory. This is because Emulab likes to manage SSH keys for you. Do not directly modify your SSH keys in $HOME/.ssh or your SSL keys in $HOME/.ssl.
- Local Storage Different node classes have different local storage. Please see #Narwhal_Node_Types for more information about different node classes.
You are free to use local storage however you want. Common uses are to add or install programs on the boot/root filesystem or creating an additional partions using the free space on the drive. You can create partitions on the disk, but It is recommended that you not modify existing partitions in the partition table.
Your home directory and
directories are provided via NFS from an external filer. This is a shared resource and should be treated as such. Please be aware that writes to your home directory from hundreds of nodes could result in a tremendous amount of bandwidth and overhead for the filer. Be cautious and make that output, e.g. a core dump, does not unintentionally get written to your home directory. Make sure you set your working directory to a local directory, like tmp. E.g.
cd /tmp && ~/run_experiment
PDL AutoFS Maps
Datasets requiring large amounts of fast network-based storage should use the Panasas storage cluster. This cluster has the following characteristics:
- distributed storage cluster with 5 director and 50 storage blades
- two access protocols
- NFS to a director blade (which proxies requests to storage blades)
- DirectFlow - clients talk directly to the 50 storage nodes (experiment hosts must be running the OS image named
UBUNTU14-64-PROBE or a derivative to use DirectFlow)
- shared, not backed up. This storage resource may be used for large data sets, however, it must be treated as volatile scratch space whose contents may not persist over a long period of time.
- high performance, particularly when using DirectFlow
- This space can be used within an either via NFS or DirectFlow:
- via NFS: run
/share/testbed/bin/linux-autofs-ldap to install and configure autofs. The panasas realm (pana) is available under
- via DirectFlow
- Install DirectFlow
sudo dpkg -i /share/testbed/misc/panfs-3.13.0-33-generic-6.0.1-1133721.61.ul_1404_x86_64.deb
- Mount the filesystem mount.panfs panfs://10.25.0.200:global /panfs</verbatim>
The Panasas realm (pana) will be available in
Selecting and customizing an OS image
A number of OS images are ready for general use. To view the list of operating systems available on the cluster, click on the "List ImageIDs" menu item from the "Experimentation" pulldown menu. Use the table to determine the name of operating system image you want to use. For example, UBUNTU14-64-PROBE is a 64 bit version of Ubuntu 14.04 Linux.
To customize an OS image:
- create a single node experiment with the base operating system image installed
- login to the node and get a root shell
- use a package manager (apt-get, yum, etc.) to install/configure software
- manually add any local software to the image
- Return to the web interface
- Navigate to your experiment and select the physical node at the bottom below "Reserved Nodes"
- Click "Create a Disk Image"
- Name and Describe your intended new image
- shutdown and delete the single node experiment
Interacting with your nodes
Most users will use ssh to interact with their nodes. Using the ProxyCommand configuration, you can use the PDL ProxyServer
to transparently proxy ssh connections to your nodes.
Configure SSH on your workstation: Add the following to your ~/.ssh/config:
ProxyCommand ssh proxy.pdl.cmu.edu -W %h:%p
Now ssh (including scp etc) will work transparently through the proxy server as long as you use the fqdn of your nodes ending in .local. E.g.
Hints, FAQs and common problems
- Node names/fqdn are reported ending in .edu but also in exist .local for dns lookup from other clusters) (e.g. Emulab reports node names as pnn.cdh5.disc.narwhal.pdl.cmu.edu but they also exist as pnn.cdh5.disc.narwhal.pdl.local.cmu.edu).
- If you ask for all the available nodes, your experiment is more likely to see a node failure. Node failures are, by default, fatal--the experiment swap-in will fail.
- Where available, applications that require high performance networking in order to perform well should use the clusters' data network for IO rather than the control network.
- Boss monitors all allocated nodes to ensure they are being used. If an experiment sits idle for to long, it will be forcibly swapped out so that others can use the nodes. Boss also limits overall duration.
- Local data is destroyed on swapout. Intermediate data can (and should) be stored on local disks, but should be copied to persistent storage (e.g. /proj) before the nodes are released or the experiment is swapped out due to max duration or idle-swap. Data on local disks is not preserved after an experiment is swapped out.