Tag Archives: hpc

Ganglia

Taken from http://ganglia.sourceforge.net/.

“Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.”

Tools required to build Ganglia

The main tools that are required is as follows:-

  • freetype2
  • libart-2.0
  • libpng
  • rrdtool
  • ganglia

For web based monitoring, PHP is required.

Building Ganglia on OSX

Here is my build configuration for the various tools stated above. I have previously faced difficulty bulding some of these tools in the past using older versions and on thePowerPC? platform. However, when i tried again recently, things seems to be able to build without much problems. The versions and configuration options that i used is as follows:-

  • freetype2 (2.3.4)
    • ./configure –prefix=/usr/local
  • libart-2.0 (2.3.17)
    • ./configure –prefix=/usr/local
  • libpng (1.2.16)
    • ./configure –prefix=/usr/local
  • rrdtool (1.2.19)
    • export LDFLAGS=”-L/usr/local/lib”
    • export CPPFLAGS=”-I/usr/local/include/freetype2 -I/usr/local/include/libart-2.0″
    • ./configure –prefix=/usr/local –disable-ruby –disable-perl –disable-python –disable-tcl
  • ganglia (3.0.4)
    • ./configure –prefix=/usr/local –with-gmetad

Note that the accompanying configuration file for gmond can be obtained by running the following:-

gmond -t > gmond.conf

A default gmetad.conf file is found within the gmetad directory in the ganglia source.

On boot starting on OSX

Here are the example files for launchd starting of Ganglia.

Ganglia launchd startup files

Showing the number of SGE jobs on Ganglia

It is useful to try to push information to be charted into Ganglia so that you can have one simple interface to look at all the various information related to your cluster. Here is a simple script that allows one to be able to see the number of SGE jobs that are in the queue and the number of SGE jobs that are running.

Save the below as a script file and add it into your CRON job to get these information into Ganglia. Simple

#!/bin/sh

GMETRIC="/usr/bin/gmetric"
source /etc/bashrc

# Might have to change path to reflect your SGE install.. 
QSTAT=`qstat | grep -v "$(qstat | head -n2)" | wc -l`
$GMETRIC --name SGEJOBS --type uint16 --units jobs --value $QSTAT

QSTAT=`qstat | grep -v "$(qstat | head -n2)" | grep @ | wc -l`
$GMETRIC --name SGEJOBS_RUNNING --type uint16 --units jobs --value $QSTAT