Cloud Cluster Users Meeting

Date: 2010-10-19

Cluster Characteristics

  • Data-Intesive Computing Cluster: Designed for large-scale data processing workloads that are not well served by other computational resources like the ones available at super computing center (e.g., Teragrid), or local group clusters.
  • Data-intensive computing software stack: HDFS, Hadoop Map Reduce, HBase, Hive, Pig.
  • Managed by the PDL staff: Michael Stroucken and Mitch Franzos.

  • Cloud Cluster Team:
    • System administrators: Michael Stroucken and Mitch Franzos.
    • Volunteer Students: Wittawat Tantisiriroj, Lin Xiao, Milo Polte, Kai Ren, Soila Pertet, Swapnil Patil.
    • Faculty: Julio López and Garth Gibson (technical guidance).

What is it being used for:

  • Comp-Bio: tissue image processing
  • LTI/ML: Read the Web, Worldly Knowledge.
  • Tweeter data analysis
  • Security / Malware analysis
  • Seismology: seismic wavefield analysis and compression.
  • Astrophysics/cosmology: Galaxy clustering, Quasar identification, time particle history.

Please tell us what you are doing


  • ClueWeb09: Targeted web crawl corpus
  • Tweeter (Meeder/O'Connor).
  • Dark Matter simulation output / black holes.
  • Seismic wavefield (Quake group) (soon).

What interesting datasets do you have in the cluster?

Getting Help

Usage Monitoring


System Usage

Reducing Space Storage Usage

  • RAIDTool (Wittawat).
Topic revision: r1 - 19 Oct 2010, JulioLopez - This page was cached on 06 Jan 2018 - 13:19.

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding PDLWiki? Send feedback