Overview:
We have built a data-intensive resource called
OpenCloud?, or just Cloud. This is used by both systems researchers and application researchers, so there are multiple web spaces for the multiple uses. This section is intended for application researchers; more systems-y information is available elsewhere.
OpenCloud? is composed of compute nodes, (external) IO/storage nodes and login/master nodes. In general, a user logs into a login node and submits jobs from the login node. Job control systems like Hadoop allocate compute node processes and manage these processes on your behalf. Users monitor status using services available on the login nodes, or via web services. Users don't need to directly access the compute nodes, IO/storage nodes or service master nodes. All these other nodes are controlled and managed by the tools/systems invoked by users via login nodes.
The external IO/storage nodes provide over 250 TB of RAID protected storage outside (in addition to) the storage resources in the compute and login nodes. There are 13
dedicated storage nodes each with 10GE connections to the compute and login nodes.
Compute Node Configuration.
The compute nodes in the
OpenCloud? cluster currently has 64 nodes
worker nodes, each with 8 cores, 16 GB DRAM, 4 1TB disks and 10GbE connectivity between nodes. The compute cluster in
OpenCloud?, seen as a single computer, has over 1 tera-operations per second, over 1 TB memory, 256 1TB disks, and over 40 Gbps bisection bandwidth.
The compute node configuration is detailed below.
| Item |
Specifications |
| Form factor |
1U single system |
| RAM |
16GB |
| Disks |
4x 1TB 3.5" SATA disks |
| 10 GE Network |
1x Qlogic QLE3142-CU-CK (NetXen) 10Gb Ethernet NIC |
| 1 GE Network |
2x 1GbE ports on the motherboard (unused) |
| CPUs |
53 compute nodes with 2x quad-core Intel E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB) 11 compute nodes with 2x quad-core Intel E5450 (3GHz, 12MB L2 cache, 1333 MHz FSB) |
There are various master nodes each with less disk than the compute nodes. In addition to providing login services for users, these nodes provide a range of master services, such as: zookeeper replicated system state, Hadoop and Maui/Torque job control queues, NFS service and storage, HDFS master, Hbase master, ganglia and naggios monitoring, and others to come.
Network
The network for the
OpenCloud? cluster is made up of three 10GE switches as follows:
- Each compute node and storage node connects to one of the two 48-port Arista 7148S switches.
- These compute/storage node switches connect to eachother and head nodes via a 24-port Arista 7124S Head end switch .
Storage
There are many different storage facilities available to a user of
OpenCloud?.

Due to their scale, the storage facilities in the cloud cluster are not backed up, thus data losses may be possible. Back up your data or risk loosing it.
- Scratch space on each compute node. This is not replicated. It is not really managed, and it can be lost or deleted by various events in the system. Users should never try to use this. Some services (MapReduce/Hadoop for example) use it temporarily.
- HDFS storage in the compute cluster. This is replicated, persistent storage, but embedded in the compute clusters. This is the primary storage users will compute from and into. There may be multiple different HDFS services, sometimes to isolate which nodes experience HDFS "interference" from other jobs, sometimes because experimental services are being put into use. There is TBs of HDFS, but not 100s of TBs of HDFS.
- Home directories on the login nodes. Inputs and code will often be installed into the login nodes and made available to the compute nodes via NFS. There will be GBs of space for home directories, but not TBs of space here.
Note: None of these storage facilities are backed up.