You are here: PDLWiki>Narwhal Web>WebHome (03 May 2017, MitchFranzos1)Edit Attach

Narwhal Cluster

The PDL Narwhal cluster is a "bare metal" computing facility used for system research. Researchers are allocated phyiscal nodes (not virtual machines) for their experiments and have complete remote control of the nodes while the experiment is running. The Narwhal cluster uses a locally customized version of the Emulab testbed software from University of Utah to manage its nodes. Note that Narwhal contains more than one type of node, so users must a node type when allocating machines (see table below).

Quick Start

Nodes in a PDL Emulab clusters can be allocated and managed using either command-line tools or by writing Emulab NS Script files and uploading them to the cluster (normally done through the web). Documentation for either mechanism can be found here:

Narwhal Node Types

The Following Node Types are available in Narwhal:

Node Class Quantity NS Syntax Description
RR 392 add-desire rr IBM Blade Server LS21 - 4 Cores AMD Opteron, 16 GiB RAM, 73.6 GB Disk
BIG 1 add-desire big 2013 Intel donation - 80 cores, 1 TiB RAM
QSI 2 add-desire qsi 2013 Intel donation - 40 cores, 256 GiB RAM
SUS 34 add-desire sus Former PRObE SUSITNA nodes - 64 cores, 128 GiB RAM, Nvidia Tesla K20c, 40 GbE and FDR10 InfiniBand
SUS 34

add-desire sus

add-desire nofge

Same as above but lack 40 GbE

Network

For more about the PDL network take a look at our general Getting Started guide.

RR node network

Two separate physical networks are provided for each RR node. Each node has two 1 GbE ports, each connected to one of the physical networks. One port is used for the Emulab Control Plane, the other will be used for the Emulab Data Plane. There is some oversubscription for both networks: 14:6 at the access layer, 24:20 at the distribution layer, illustrated below.

RR_Network.png

Mailing List

PDL Emulab installations do not enable or use Emulab's built in mailing list functionality (some generic Emulab documentation refers to this feature, please ignore it).

Localizations/Notes

  • You do not have to apply for a Narwhal account: your PDL account will get you access to Narwhal.
  • Projects on Narwhal are still managed through Emulab.
  • Narwhal nodes are on a private network. All access from Narwhal nodes to the Internet must go through a proxy. The ops node runs a simple web proxy on port 8888, and the PDL central Proxy Servers should also be available.
  • Yyou must specify node type when allocating nodes. This can be done by using a node-specific "makebed" script (e.g. "rr-makebed") or by adding a "desire to your NS script (for example: $node add-desire 1.0).

Review the PDL GettingStarted Guide.

Understanding storage

In NARWHAL, there are multiple types of storage.

  • Your home directory
  • Project directories (/proj) Your NARWHAL home directory is small and not part of your normal home directory. This is because Emulab likes to manage SSH keys for you. Do not directly modify your SSH keys in $HOME/.ssh or your SSL keys in $HOME/.ssl.
  • Local Storage Different node classes have different local storage. Please see #Narwhal_Node_Types for more information about different node classes.

You are free to use local storage however you want. Common uses are to add or install programs on the boot/root filesystem or creating an additional partions using the free space on the drive. You can create partitions on the disk, but It is recommended that you not modify existing partitions in the partition table.

Network Storage

Your home directory and /proj directories are provided via NFS from an external filer. This is a shared resource and should be treated as such. Please be aware that writes to your home directory from hundreds of nodes could result in a tremendous amount of bandwidth and overhead for the filer. Be cautious and make that output, e.g. a core dump, does not unintentionally get written to your home directory. Make sure you set your working directory to a local directory, like tmp. E.g. cd /tmp && ~/run_experiment

PDL AutoFS Maps
  • In order to make PDL automount maps available to Narwhal nodes, AutoFS can be installed and configured to use the PDL LDAP server. A helper script is available to install autofs and perform the minimal configuration:
    skitch@ubuntu1:~$ sudo /share/testbed/bin/linux-autofs-ldap
    skitch@ubuntu1:~$ ls /n
    admin  games      kickstart  oltpbench  scratch  t1disc  vquery
    atc    home       media      pana       sid      traces
    flow   homestore  music      refdbms    stoat    video 

Panasas ActiveScale

Datasets requiring large amounts of fast network-based storage should use the Panasas storage cluster. This cluster has the following characteristics:
  • distributed storage cluster with 5 director and 50 storage blades
  • two access protocols
    1. NFS to a director blade (which proxies requests to storage blades)
    2. DirectFlow - clients talk directly to the 50 storage nodes (experiment hosts must be running the OS image named UBUNTU14-64-PROBE or a derivative to use DirectFlow)
  • shared, not backed up. This storage resource may be used for large data sets, however, it must be treated as volatile scratch space whose contents may not persist over a long period of time.
  • high performance, particularly when using DirectFlow
  • self-service
  • This space can be used within an either via NFS or DirectFlow:
    • via NFS: run /share/testbed/bin/linux-autofs-ldap to install and configure autofs. The panasas realm (pana) is available under /n/pana
    • via DirectFlow
      1. Install DirectFlow
        sudo dpkg -i /share/testbed/misc/panfs-3.13.0-33-generic-6.0.1-1133721.61.ul_1404_x86_64.deb
      2. Mount the filesystem mount.panfs panfs://10.25.0.200:global /panfs</verbatim>
The Panasas realm (pana) will be available in /panfs/pana.

Selecting and customizing an OS image

A number of OS images are ready for general use. To view the list of operating systems available on the cluster, click on the "List ImageIDs" menu item from the "Experimentation" pulldown menu. Use the table to determine the name of operating system image you want to use. For example, UBUNTU14-64-PROBE is a 64 bit version of Ubuntu 14.04 Linux.

To customize an OS image:
  1. create a single node experiment with the base operating system image installed
  2. login to the node and get a root shell
  3. use a package manager (apt-get, yum, etc.) to install/configure software
  4. manually add any local software to the image
  5. logout
  6. Return to the web interface
  7. Navigate to your experiment and select the physical node at the bottom below "Reserved Nodes"
  8. Click "Create a Disk Image"
  9. Name and Describe your intended new image
  10. shutdown and delete the single node experiment

Interacting with your nodes

Most users will use ssh to interact with their nodes. Using the ProxyCommand configuration, you can use the PDL ProxyServer to transparently proxy ssh connections to your nodes.

Configure SSH on your workstation: Add the following to your ~/.ssh/config:

Host *.pdl.local.cmu.edu
        ProxyCommand ssh proxy.pdl.cmu.edu -W %h:%p

Now ssh (including scp etc) will work transparently through the proxy server as long as you use the fqdn of your nodes ending in .local. E.g. cdh5manager.cdh5demo.pdl.narwhal.pdl.local.cmu.edu

Hints, FAQs and common problems
  • Node names/fqdn are reported ending in .edu but also in exist .local for dns lookup from other clusters) (e.g. Emulab reports node names as pnn.cdh5.disc.narwhal.pdl.cmu.edu but they also exist as pnn.cdh5.disc.narwhal.pdl.local.cmu.edu).
  • If you ask for all the available nodes, your experiment is more likely to see a node failure. Node failures are, by default, fatal--the experiment swap-in will fail.
  • Where available, applications that require high performance networking in order to perform well should use the clusters' data network for IO rather than the control network.
  • Boss monitors all allocated nodes to ensure they are being used. If an experiment sits idle for to long, it will be forcibly swapped out so that others can use the nodes. Boss also limits overall duration.
  • Local data is destroyed on swapout. Intermediate data can (and should) be stored on local disks, but should be copied to persistent storage (e.g. /proj) before the nodes are released or the experiment is swapped out due to max duration or idle-swap. Data on local disks is not preserved after an experiment is swapped out.

Further Reading

Topic revision: r36 - 03 May 2017, MitchFranzos1 - This page was cached on 25 May 2017 - 14:28.

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding PDLWiki? Send feedback