You are on page 1of 7

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

Adventures of an adopted yooper

Home

About

Showcase

Gallery

Ramblings

Say 'ello!

Seventh Sense

Rambling about life's little things, in 7 1 (mod 6) fashion

GO
CUDA/C Hello, World! | HPL 2.0 Benchmark With GCC 4.1.2 On Rocks 5.4.2 Most of these posts, especially the ones with any hint of technical jargon, are intended to be

Ganglia gmond Python Module For Monitoring NVIDIA GPU On Rocks 5.4.2
February 17th, 2012 @ 18:24:43 | BASH, Clusters, CUDA, Ganglia, GPU, Linux, MichiganTech, Python, Research, Rocks, Science, Technology

Disclaimer
The instructions/steps given below worked for me (and Michigan Technological University) running NPACI Rocks 5.4.2 (with CentOS 5.5) as has been a common practice for several years now, a full version of Operating System was installed. These instructions may very well work for you (or your institution), on Rocks-like or other linux clusters. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.net, nor its author (or Michigan Technological University) is responsible for any/all damage intellectual and/or otherwise.

Note2Self. But if any of them float your boat, then feel free to sail along. If you feel so generous, improve my journey with your comments &/or thoughts!

Looking for MS Thesis or PhD Dissertation Template in LaTeX? Click below!

A Bit About Ganglia (gmond & gmetad)


Citing Ganglia website, Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRD tool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. Further, citing Wikipedia, gmond (Ganglia Monitoring Daemon) is a multi-threaded daemon which runs on each cluster node that needs to be monitored. Installation does not require having a common NFS file system or a database back-end, install special accounts or maintain configuration files. It has four main responsibilities: monitor changes in host state; announce relevant changes; listen to the state of all other ganglia nodes via a unicast or multicast channel; answer requests for an XML description of the cluster state. Each gmond transmits in information in two different ways: unicasting or multicasting host state in external data representation (XDR) format using UDP messages OR sending XML over a TCP connection. Federation in Ganglia is achieved using a tree of point-to-point connections amongst representative cluster nodes to aggregate the state of multiple clusters. At each node in the tree, a Ganglia Meta Daemon (gmetad) periodically polls a collection of child data sources, parses the collected XML, saves all numeric, volatile metrics to round-robin databases and exports the aggregated XML over a TCP sockets to clients. Data sources may be either gmond daemons, representing specific clusters, or other gmetad daemons, representing sets of clusters. Data sources use source IP addresses for access control and can be specified using multiple IP addresses for fail over. The latter capability is natural for aggregating data from clusters since each gmond daemon contains the entire state of its cluster. The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. Although the web front-end to ganglia started as a simple HTML view of the XML tree, it has evolved into a system that keeps a colourful history of all collected data. The Ganglia web front-end caters to system administrators and users (for e.g., one can view the CPU utilization over the past hour, day, week, month, or year). The web front-end shows similar graphs for memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics. The web front-end depends on the existence of the gmetad which provides it with data from several Ganglia sources. Specifically, the web front-end will open the local port 8651 (by default) and expects to receive a Ganglia XML tree. The web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site. This behaviour leads to a very responsive site, but requires that the full XML tree be parsed on every page access. Therefore, the Ganglia web front-end should run on a fairly powerful, dedicated machine if it presents a large amount of data. The Ganglia web front-end is written in the PHP scripting language, and uses graphs generated by gmetad to display history information.

'Coffee is still on the house, keep ya money to ya self & get out' said the same lady, with a smile, in Bruce Crossing IGA! #LifeIsGoodToday # 2012/06/17

"it's on the house" said the lady in IGA for the coffee in my hand. Stay classy, Bruce Crossing!! Pleasant journey so far #WeddingWeekend # 2012/06/15

Heading to Hodag Country to see @JCVertin & Melissa Becker say 'I Do', & hang with some long time no see friends #WeddingWeekend # 2012/06/15

Archives

By Date By Category
Akira Amy Bryant Cal Chong Dan Deepti Erika Jess Kalyan Karen Kyle Mahesh Maria Michael Mike Nagesh Nils Paul Randy

1 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

Installation & Configuration


Rocks 5.4.2 installation in itself takes care of almost everything pertaining to installing and configuring Ganglia, gmond, gmetad and Ganglia web interface. However, by default & design, a Rocks clusters web interface is not publicly accessible. To fix this, following commands were run:
1 #! /bin/bash 2 # 3 # update_web_firewall.sh 4 # BASH script to run necessary 'rocks' commands to update the firewall rules on a 5 # Rocks 5.4.2 cluster's front end to make the web interface accessible from anywhere 6 # Must be root (or at least have sudo privilege) to run this script 7 8 # Begin root-check IF 9 if [ $UID != 0 ]; 10 then 11 12 13 14 15 16 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

Sam Soumya Tim

17 else 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 fi 55 # End root-check IF echo " Step #5: display current firewall rules" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #4: adding new rule for https" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #3: adding new rule for www" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #2: removing the current rule for https" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #1: removing the current rule for www" echo echo " Step #0: display current firewall rules"

/opt/rocks/bin/rocks report host firewall localhost

/opt/rocks/bin/rocks remove host firewall localhost chain=INPUT \ flags="-m state --state NEW --source &Kickstart_PublicNetwork;/&Kickstart_PublicNetmask;" \ protocol=tcp service=www action=ACCEPT network=public

/opt/rocks/bin/rocks remove host firewall localhost chain=INPUT \ flags="-m state --state NEW --source &Kickstart_PublicNetwork;/&Kickstart_PublicNetmask;" \ protocol=tcp service=https action=ACCEPT network=public

/opt/rocks/bin/rocks add host firewall localhost chain=INPUT \ flags="-m state --state NEW --source 0.0.0.0/0.0.0.0" \ protocol=tcp service=www action=ACCEPT network=public

/opt/rocks/bin/rocks add host firewall localhost chain=INPUT \ flags="-m state --state NEW --source 0.0.0.0/0.0.0.0" \ protocol=tcp service=https action=ACCEPT network=public

/opt/rocks/bin/rocks report host firewall localhost echo

Upon pointing the browser to the http://FQDN/ganglia/, the web page should display the relevant information.

Monitoring NVIDIA GPU


The aforementioned set up works fine and as expected but it doesnt necessarily provide any information about GPU(s) that

2 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

may be part of the hardware. For e.g., the cluster used in this case has two NVIDIA GeForce GTX 260 cards in each compute node. For testing purposes, only one compute node was installed also, one of the GTX 260 cards was replaced with a NVIDIA Quadro 6000. With more and more scientific & engineering computations tending towards GPU based computing, itd be useful to include their status/usage information in Ganglias web portal. To this effect, NVIDIA released gmond Python module for GPUs (made aware of it by one of Michigan Tech ITSS directors). The instructions given in the NVIDIA-linked pages do work as described however, Rocks 5.4.2 uses python 2.4 while one requires python 2.5 (or higher) to get the GPU metrics to show up in Ganglia.

Rebuilding Rocks Distribution with Python ctypes Library


I downloaded python-ctypes-1.0.2-2.el5.x86_64.rpm from http://ftp.osuosl.org/pub/fedora-epel/5/x86_64/ and placed it in /export/rocks/install/contrib/5.4/x86_64/RPMS/ rebuilding of the distribution, with following commands as usual, was uneventful.
1 #! /bin/bash 2 # 3 # update_rocks_distribution.sh 4 # BASH script to download python-ctypes-1.0.2-2.el5.x86_64.rpm from 5 # http://ftp.osuosl.org/pub/fedora-epel/5/x86_64/ and rebuild the 6 # rocks distribution 7 # Must be root (or at least have sudo privilege) to run this script 8 9 # Begin root-check IF 10 if [ $UID != 0 ]; 11 then 12 13 14 15 16 17 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

18 else 19 20 21 22 23 24 25 26 fi 27 # End root-check IF cd /export/rocks/install/ rocks create distro echo cd /export/rocks/install/contrib/5.4/x86_64/RPMS/ wget http://ftp.osuosl.org/pub/fedora-epel/5/x86_64/python-ctypes-1.0.2-2.el5.x86_64.rpm

Re-install the compute node(s) [in this case, compute-0-0]. Without and with ctypes library, gmond (when run in debug mode, i.e. gmond -d9 -f), results in the following message.

NVIDIA Driver Installation


Once the compute node(s) are re-installed, NVIDIA driver, NVIDIA-Linux-x86_64-285.05.33.run, was installed using the following script.

3 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

1 #! /bin/bash 2 # 3 # install_nvidia_driver.sh 4 # BASH script to install NVIDIA driver in compute node(s) - save this in /share/apps/bin/ 5 # Assumes that NVIDIA-Linux-x86_64-285.05.33.run is located in /share/apps/src/nvidia_cuda/ 6 # Also, assumes that CUDA SDK 4.1.28 has been installed on front end in /share/apps/cuda/ 7 # Must be root to run this script and run this in all compute nodes from the front end via 8 # the command 9 # 10 # rocks run host '/share/apps/bin/install_nvidia_driver.sh' 11 # 12 13 # Begin root-check IF 14 if [ $UID != 0 ]; 15 then 16 17 18 19 20 21 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

22 else 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 fi 38 # End root-check IF # 3: Creating missing symbolic links to necessary libraries cd /usr/lib64/ ln -sf libXmu.so.6.2.0 libXmu.so ln -sf libXi.so.6.0.0 libXi.so # 2: Updating /etc/ld.so.config echo "/share/apps/cuda/lib64" >> /etc/ld.so.conf echo "/share/apps/cuda/lib" >> /etc/ld.so.conf /sbin/ldconfig echo # 1. Install NVIDIA driver /share/apps/src/nvidia_cuda/NVIDIA-Linux-x86_64-285.05.33.run --silent

Python Bindings for the NVIDIA Management Library


This provides Python access to static information and monitoring data for NVIDIA GPUs, as well as management capabilities. It exposes the functionality of the NVML and one may download these from here as before, the necessary steps are included in a BASH script.
1 #! /bin/bash 2 # 3 # install_python_nvml_bindings.sh 4 # BASH script to install Python Bindings for the NVML in compute node(s) - save this in /share/apps/bin/ 5 # Assumes that nvidia-ml-py-2.285.01.tar.gz is in /share/apps/src/nvidia_ganglia/ 6 # Must be root to run this script and run this in all compute nodes from the front end via the command 7 # 8 # rocks run host '/share/apps/bin/install_python_nvml_bindings.sh' 9 # 10 11 # Begin root-check IF 12 if [ $UID != 0 ]; 13 then 14 15 16 17 18 19 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

20 else 21 22 # # Download and install

4 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

23 24 25 26 27 28 29 30 31 32 33

cd /tmp/ cp /share/apps/src/nvidia_ganglia/nvidia-ml-py-2.285.01.tar.gz .

tar -zxvpf nvidia-ml-py-2.285.01.tar.gz cd nvidia-ml-py-2.285.01 python setup.py install

# Copy nvidia_smi.py & pynvml.py to /opt/ganglia/lib64/ganglia/python_modules/ cp nvidia_smi.py /opt/ganglia/lib64/ganglia/python_modules/ cp pynvml.py /opt/ganglia/lib64/ganglia/python_modules/

34 fi 35 # End root-check IF

gmond Python Module For Monitoring NVIDIA GPUs using NVML


After downloading ganglia-gmond_python_modules-3dfa553.tar.gz from GitHub for ganglia / gmond_python_modules to /share/apps/src/nvidia_ganglia/, the following steps need to be performed:
1 #! /bin/bash 2 # 3 # copy_ganglia_gmond_python_computenodes.sh 4 # BASH script to copy relevant files from ganglia-gmond_python_modules to Ganglia, 5 # and restart gmond - save this in /share/apps/bin/ 6 # Assumes that ganglia-gmond_python_modules-3dfa553.tar.gz is in /share/apps/src/nvidia_ganglia/ 7 # Must be root to run this script and run this in all compute nodes from the front end via the command 8 # 9 # rocks run host '/share/apps/bin/copy_ganglia_gmond_python_computenodes.sh' 10 # 11 12 13 # Begin root-check IF 14 if [ $UID != 0 ]; 15 then 16 17 18 19 20 21 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

22 else 23 24 25 26 27 28 29 30 31 32 33 34 35 fi 36 # End root-check IF # # Restart gmond /etc/init.d/gmond restart # Copy relevant files to Ganglia cd /tmp/ tar -zxvpf /share/apps/src/nvidia_ganglia/ganglia-gmond_python_modules-3dfa553.tar.gz cd ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ cp python_modules/nvidia.py /opt/ganglia/lib64/ganglia/python_modules/ cp conf.d/nvidia.pyconf /etc/ganglia/conf.d/

1 #! /bin/bash 2 # 3 # copy_ganglia_gmond_python_frontend.sh 4 # BASH script to copy relevant files from ganglia-gmond_python_modules to Ganglia, 5 # apply patch for Ganglia web interface and restart necessary services 6 # Assumes that ganglia-gmond_python_modules-3dfa553.tar.gz is in /share/apps/src/nvidia_ganglia/ 7 # Must be root (or at least have sudo privilege) to run this script and run this only on front end 8 9 # Begin root-check IF

5 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

10 if [ $UID != 0 ]; 11 then 12 13 14 15 16 17 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."

18 else 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 fi 36 # End root-check IF # # Restart necessary services /etc/init.d/gmetad restart /etc/init.d/gmond restart cd /var/www/html/ganglia/ patch -p0 < /tmp/ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ganglia_web.patch # # Apply web patch for Ganglia to display custom graphs cd /tmp/ tar -zxvpf /share/apps/src/nvidia_ganglia/ganglia-gmond_python_modules-3dfa553.tar.gz cd ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ cp graph.d/*.php /var/www/html/ganglia/graph.d/

Upon pointing the browser to the http://FQDN/ganglia/ (e.g., http://paracuda.math.mtu.edu/ganglia/ the link will probably die or be changed to something else in due course), the display should include information about GPU as well, as shown in screenshots below:

With little more work, the rather unaesthetic looking Ganglia web interface can be made to look like a given institutions theme:

Thanks be to
Dr. Allan Struthers for letting his paracuda.math be used for this purpose; my friendly neighbors for their kindness in letting me borrow a NVIDIA Quadro 6000 card; Robert Alexander of NVIDIA (http://developer.nvidia.com/ganglia-monitoringsystem/), Bernard Li of Lawrence Berkeley National Laboratory and Jeremy Enos of National Center for Supercomputing Applications for developing this gmond Python module as well as making time to answer my questions.

Near Future Work


Work is currently underway to include all of the compute node related steps in the above described procedure in the local Rocks distribution, so that the compute nodes get them as soon as they are installed.

6 de 7

24/06/12 20:12

Gowtham Seventh Sense Ganglia gmond Python M...

http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...

Responses to Ganglia gmond Python Module For Monitoring NVIDIA GPU On Rocks 5.4.2
When Will We Be Able to Build Brains Like Ours? says:
2012.03.14 at 13:31:46 [...] model. These two models are at the extremes of simplicity and complex realism. Read more . . . Image via Wikipedia Sooner than you think and the race has lately caused a 'catght' When physiupload.wikimedia.org/wikipedia/commons/thumb /d/d2/Internet_map_1024.jpg/300px-Internet_map_1024.jpg" [...]

Opinions expressed in these pages are purely personal and do not reflect those of any other institution and/or individual 2002-2018 Gowtham All Rights Reserved

7 de 7

24/06/12 20:12

You might also like