Xem Cluster
The PDL Xem cluster is a "bare metal" computing facility used for system research. Researchers are allocated physical nodes for their experiments and have complete remote control of the nodes while the experiment is running. The Xem cluster uses a locally customized version of the
Emulab testbed software from University of Utah to manage its nodes. Note that Xem contains more than one type of node, so users must specify a node type when allocating machines (see table below).
Creating a Xem account and joining a Project
Email account creation requests to
pdl-gripe@ece.cmu.edu. Be sure to include your Andrew ID and your advisor if you are a student. Once your account is active, you can login to Xem's web interface here:
https://xem.pdl.cmu.edu/. Once you have an account, you need to join a project in order to allocate nodes. This is done through the "Join Existing Project" section of the "Experimentation" menu (see
UserGuide for details).
How to allocate nodes on Xem
You must have an account on and be a member of an Emulab Project on Xem in order to allocate nodes. Nodes in Emulab experiments can be allocated and managed using either command-line tools or by writing Emulab NS Script files and uploading them to the cluster (normally done through the web). Documentation for either mechanism is linked below:
Xem Node Types
Nodes on Xem are allocated by type. The following Node Types are available in Xem:
Click on a node class for additional hardware specifications.
The node type is used with the
makebed script to allocate nodes of a given class. For example, use the
/share/testbed/bin/thm-makebed version of the
makebed script to allocate THM nodes. You can ls
/share/testbed/bin/*makebed to list the current set of node-type scripts available (see
UserGuide for more on
makebed).
Logging into allocated nodes
Xem has a login node
xem-ops.pdl.cmu.edu that is on the Xem control network. Users can ssh into
xem-ops and from there access Xem nodes.
It is also possible to configure ssh on your local system to use the PDL proxy to access Xem nodes. To do this, add the following to your
~/.ssh/config:
Host *.pdl.local.cmu.edu
# uncomment and customize the following configuration option if your username on the system you are invoking ssh from (e.g., your personal laptop) is different from your PDL username
#User your_pdl_username
ProxyCommand ssh proxy.pdl.cmu.edu -W %h:%p
Now
ssh (including
scp,
sftp,
rsync, etc.) will work transparently through the proxy server as long as you use the FQDN of your nodes (e.g., something like
h0.cdn.disc.xem.pdl.local.cmu.edu).
Networking
- Xem experiment nodes are on a private PDL network. To access the Internet (e.g., the web, git, `apt-get` repositories) from a Xem node you must go through the PDL Proxy located at
http://proxy.pdl.cmu.edu:3128/. This proxy setting has been added to the standard PDL disk images. Prior to March 2020 a proxy server was also available on the xem-ops.pdl.cmu.edu node at port 8888/tcp. That server is no longer present, so all references to the ops:8888 or xem-ops.pdl.cmu.edu:8888 proxy in existing scripts or experiment configurations should be replaced with the address of the PDL Proxy.
- Nodes may have more than one network interface (e.g., THM nodes have a 10 GbE network for control/login and a 100 GbE network for experiments). Applications that require high performance networking in order to perform well should take care to use the cluster's high speed data network for I/O rather than the control network.
- Internally, node names/FQDN are reported as ending in
xem.pdl.cmu.edu but these names are only visible within Xem. On the CMU campus the Xem node names also exist in pdl.local.cmu.edu. For example, a node with the internal Xem-only name h0.cdn.disc.xem.pdl.cmu.edu will be listed as h0.cdn.disc.xem.pdl.local.cmu.edu on the CMU campus network (note the addition of "local" in the domain name). With appropriate routing, this allows xem nodes to be referenced by name from other PDL clusters.
Xem storage options
In Xem, there are multiple types of storage:
- Your home directory
-
/users/$USER - NFS filesystem private to Xem. Not intended for storing large data. Emulab manages SSH keys and SSL certificates for you using this directory. Do not directly modify your SSH keys in $HOME/.ssh or your SSL keys in $HOME/.ssl.
- Project directories
-
/proj/ - a directory shared among all users in your project.
- Other external NFS mounts
- Volumes for large datasets, parallel filesystems, etc.
- Node local storage
- Different node classes have different local storage options. Please see #Xem_Node_Types for more information about different node classes. Common uses for node local storage include scratch space, additional space for installing software, etc. You can create partitions on the disk, but It is recommended that you not modify existing partitions in the partition table.
Your home directory and
/proj directories are provided via NFS from an external filer.
This is a shared resource and should be treated as such. Please be aware that writes to your home directory from hundreds of nodes could result in a tremendous amount of bandwidth and overhead for the filer. Be cautious and make sure that output, e.g. a core dump, does not unintentionally get written to your home directory. Make sure you set your working directory to a local directory, like
/tmp, e.g.,
cd /tmp && ~/run_experiment
Additional notes
- PDL Emulab installations do not enable or use Emulab's built in mailing list functionality. Some generic Emulab documentation refers to this feature, so please ignore it.
- If you attempt to allocate all the available nodes, your experiment is more likely to see a node failure. Node failures are, by default, fatal--the experiment swap-in will fail. If you need a large allocation and are having trouble, contact
pdl-gripe@ece.cmu.edu.
- Xem monitors all allocated nodes to ensure they are being used. If an experiment sits idle for to long, it will be forcibly swapped out so that others can use the nodes. Xem also limits overall duration of node allocations.
- Local data is destroyed on swapout. Intermediate data can (and should) be stored on local disks, but should be copied to persistent storage (e.g.,
/proj) before the nodes are released or the experiment is swapped out due to max duration or idle-swap. Data on local disks is not preserved after an experiment is swapped out.
- Additional documentation on the PDL lab systems can be found in the PDL GettingStarted Guide.