You are here: PDLWiki>Xem Web>WebHome (12 Dec 2025, ChadDougherty)Edit Attach

Xem Cluster

The PDL Xem cluster is a "bare metal" computing facility used for system research. Researchers are allocated physical nodes for their experiments and have complete remote control of the nodes while the experiment is running. The Xem cluster uses a locally customized version of the Emulab testbed software from University of Utah to manage its nodes. Note that Xem contains more than one type of node, so users must specify a node type when allocating machines (see table below).

Creating a Xem account and joining a Project

Email account creation requests to pdl-gripe@ece.cmu.edu. Be sure to include your Andrew ID and your advisor if you are a student. Once your account is active, you can login to Xem's web interface here: https://xem.pdl.cmu.edu/. Once you have an account, you need to join a project in order to allocate nodes. This is done through the "Join Existing Project" section of the "Experimentation" menu (see UserGuide for details).

How to allocate nodes on Xem

You must have an account on and be a member of an Emulab Project on Xem in order to allocate nodes. Nodes in Emulab experiments can be allocated and managed using either command-line tools or by writing Emulab NS Script files and uploading them to the cluster (normally done through the web). Documentation for either mechanism is linked below:

Xem Node Types

Nodes on Xem are allocated by type. The following Node Types are available in Xem:

Node Class Quantity Node Type Description
THM 64 thm Dell PowerEdge R640, 2x Xeon Gold 6244, 192 GiB DDR4, 100GbE
ATH 324 ath Dell PowerEdge R640, 2x Xeon Gold 6244, 192 GiB DDR4, 100 Gigabit EDR InfiniBand
Click on a node class for additional hardware specifications.

The node type is used with the makebed script to allocate nodes of a given class. For example, use the /share/testbed/bin/thm-makebed version of the makebed script to allocate THM nodes. You can ls /share/testbed/bin/*makebed to list the current set of node-type scripts available (see UserGuide for more on makebed).

Logging into allocated nodes

Xem has a login node xem-ops.pdl.cmu.edu that is on the Xem control network. Users can ssh into xem-ops and from there access Xem nodes.

It is also possible to configure ssh on your local system to use the PDL proxy to access Xem nodes. To do this, add the following to your ~/.ssh/config:
Host *.pdl.local.cmu.edu
        # uncomment and customize the following configuration option if your username on the system you are invoking ssh from (e.g., your personal laptop) is different from your PDL username
        #User your_pdl_username
        ProxyCommand ssh proxy.pdl.cmu.edu -W %h:%p

Now ssh (including scp, sftp, rsync, etc.) will work transparently through the proxy server as long as you use the FQDN of your nodes (e.g., something like h0.cdn.disc.xem.pdl.local.cmu.edu).

Networking

  • Xem experiment nodes are on a private PDL network. To access the Internet (e.g., the web, git, `apt-get` repositories) from a Xem node you must go through the PDL Proxy located at http://proxy.pdl.cmu.edu:3128/. This proxy setting has been added to the standard PDL disk images. Prior to March 2020 a proxy server was also available on the xem-ops.pdl.cmu.edu node at port 8888/tcp. That server is no longer present, so all references to the ops:8888 or xem-ops.pdl.cmu.edu:8888 proxy in existing scripts or experiment configurations should be replaced with the address of the PDL Proxy.
  • Nodes may have more than one network interface (e.g., THM nodes have a 10 GbE network for control/login and a 100 GbE network for experiments). Applications that require high performance networking in order to perform well should take care to use the cluster's high speed data network for I/O rather than the control network.
  • Internally, node names/FQDN are reported as ending in xem.pdl.cmu.edu but these names are only visible within Xem. On the CMU campus the Xem node names also exist in pdl.local.cmu.edu. For example, a node with the internal Xem-only name h0.cdn.disc.xem.pdl.cmu.edu will be listed as h0.cdn.disc.xem.pdl.local.cmu.edu on the CMU campus network (note the addition of "local" in the domain name). With appropriate routing, this allows xem nodes to be referenced by name from other PDL clusters.

Xem storage options

In Xem, there are multiple types of storage:
Your home directory
/users/$USER - NFS filesystem private to Xem. Not intended for storing large data. Emulab manages SSH keys and SSL certificates for you using this directory. Do not directly modify your SSH keys in $HOME/.ssh or your SSL keys in $HOME/.ssl.
Project directories
/proj/ - a directory shared among all users in your project.
Other external NFS mounts
Volumes for large datasets, parallel filesystems, etc.
Node local storage
Different node classes have different local storage options. Please see #Xem_Node_Types for more information about different node classes. Common uses for node local storage include scratch space, additional space for installing software, etc. You can create partitions on the disk, but It is recommended that you not modify existing partitions in the partition table.

Your home directory and /proj directories are provided via NFS from an external filer. This is a shared resource and should be treated as such. Please be aware that writes to your home directory from hundreds of nodes could result in a tremendous amount of bandwidth and overhead for the filer. Be cautious and make sure that output, e.g. a core dump, does not unintentionally get written to your home directory. Make sure you set your working directory to a local directory, like /tmp, e.g., cd /tmp && ~/run_experiment

Additional notes

  • PDL Emulab installations do not enable or use Emulab's built in mailing list functionality. Some generic Emulab documentation refers to this feature, so please ignore it.
  • If you attempt to allocate all the available nodes, your experiment is more likely to see a node failure. Node failures are, by default, fatal--the experiment swap-in will fail. If you need a large allocation and are having trouble, contact pdl-gripe@ece.cmu.edu.
  • Xem monitors all allocated nodes to ensure they are being used. If an experiment sits idle for to long, it will be forcibly swapped out so that others can use the nodes. Xem also limits overall duration of node allocations.
  • Local data is destroyed on swapout. Intermediate data can (and should) be stored on local disks, but should be copied to persistent storage (e.g., /proj) before the nodes are released or the experiment is swapped out due to max duration or idle-swap. Data on local disks is not preserved after an experiment is swapped out.
  • Additional documentation on the PDL lab systems can be found in the PDL GettingStarted Guide.
Topic revision: r4 - 12 Dec 2025, ChadDougherty - This page was cached on 13 Dec 2025 - 00:11.

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding PDLWiki? Send feedback