Professional Documents
Culture Documents
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
Home
About
Showcase
Gallery
Ramblings
Say 'ello!
Seventh Sense
GO
CUDA/C Hello, World! | HPL 2.0 Benchmark With GCC 4.1.2 On Rocks 5.4.2 Most of these posts, especially the ones with any hint of technical jargon, are intended to be
Ganglia gmond Python Module For Monitoring NVIDIA GPU On Rocks 5.4.2
February 17th, 2012 @ 18:24:43 | BASH, Clusters, CUDA, Ganglia, GPU, Linux, MichiganTech, Python, Research, Rocks, Science, Technology
Disclaimer
The instructions/steps given below worked for me (and Michigan Technological University) running NPACI Rocks 5.4.2 (with CentOS 5.5) as has been a common practice for several years now, a full version of Operating System was installed. These instructions may very well work for you (or your institution), on Rocks-like or other linux clusters. Please note that if you decide to use these instructions on your machine, you are doing so entirely at your very own discretion and that neither this site, sgowtham.net, nor its author (or Michigan Technological University) is responsible for any/all damage intellectual and/or otherwise.
Note2Self. But if any of them float your boat, then feel free to sail along. If you feel so generous, improve my journey with your comments &/or thoughts!
'Coffee is still on the house, keep ya money to ya self & get out' said the same lady, with a smile, in Bruce Crossing IGA! #LifeIsGoodToday # 2012/06/17
"it's on the house" said the lady in IGA for the coffee in my hand. Stay classy, Bruce Crossing!! Pleasant journey so far #WeddingWeekend # 2012/06/15
Heading to Hodag Country to see @JCVertin & Melissa Becker say 'I Do', & hang with some long time no see friends #WeddingWeekend # 2012/06/15
Archives
By Date By Category
Akira Amy Bryant Cal Chong Dan Deepti Erika Jess Kalyan Karen Kyle Mahesh Maria Michael Mike Nagesh Nils Paul Randy
1 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
17 else 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 fi 55 # End root-check IF echo " Step #5: display current firewall rules" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #4: adding new rule for https" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #3: adding new rule for www" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #2: removing the current rule for https" /opt/rocks/bin/rocks sync host firewall localhost echo " Step #1: removing the current rule for www" echo echo " Step #0: display current firewall rules"
/opt/rocks/bin/rocks remove host firewall localhost chain=INPUT \ flags="-m state --state NEW --source &Kickstart_PublicNetwork;/&Kickstart_PublicNetmask;" \ protocol=tcp service=www action=ACCEPT network=public
/opt/rocks/bin/rocks remove host firewall localhost chain=INPUT \ flags="-m state --state NEW --source &Kickstart_PublicNetwork;/&Kickstart_PublicNetmask;" \ protocol=tcp service=https action=ACCEPT network=public
/opt/rocks/bin/rocks add host firewall localhost chain=INPUT \ flags="-m state --state NEW --source 0.0.0.0/0.0.0.0" \ protocol=tcp service=www action=ACCEPT network=public
/opt/rocks/bin/rocks add host firewall localhost chain=INPUT \ flags="-m state --state NEW --source 0.0.0.0/0.0.0.0" \ protocol=tcp service=https action=ACCEPT network=public
Upon pointing the browser to the http://FQDN/ganglia/, the web page should display the relevant information.
2 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
may be part of the hardware. For e.g., the cluster used in this case has two NVIDIA GeForce GTX 260 cards in each compute node. For testing purposes, only one compute node was installed also, one of the GTX 260 cards was replaced with a NVIDIA Quadro 6000. With more and more scientific & engineering computations tending towards GPU based computing, itd be useful to include their status/usage information in Ganglias web portal. To this effect, NVIDIA released gmond Python module for GPUs (made aware of it by one of Michigan Tech ITSS directors). The instructions given in the NVIDIA-linked pages do work as described however, Rocks 5.4.2 uses python 2.4 while one requires python 2.5 (or higher) to get the GPU metrics to show up in Ganglia.
18 else 19 20 21 22 23 24 25 26 fi 27 # End root-check IF cd /export/rocks/install/ rocks create distro echo cd /export/rocks/install/contrib/5.4/x86_64/RPMS/ wget http://ftp.osuosl.org/pub/fedora-epel/5/x86_64/python-ctypes-1.0.2-2.el5.x86_64.rpm
Re-install the compute node(s) [in this case, compute-0-0]. Without and with ctypes library, gmond (when run in debug mode, i.e. gmond -d9 -f), results in the following message.
3 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
1 #! /bin/bash 2 # 3 # install_nvidia_driver.sh 4 # BASH script to install NVIDIA driver in compute node(s) - save this in /share/apps/bin/ 5 # Assumes that NVIDIA-Linux-x86_64-285.05.33.run is located in /share/apps/src/nvidia_cuda/ 6 # Also, assumes that CUDA SDK 4.1.28 has been installed on front end in /share/apps/cuda/ 7 # Must be root to run this script and run this in all compute nodes from the front end via 8 # the command 9 # 10 # rocks run host '/share/apps/bin/install_nvidia_driver.sh' 11 # 12 13 # Begin root-check IF 14 if [ $UID != 0 ]; 15 then 16 17 18 19 20 21 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."
22 else 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 fi 38 # End root-check IF # 3: Creating missing symbolic links to necessary libraries cd /usr/lib64/ ln -sf libXmu.so.6.2.0 libXmu.so ln -sf libXi.so.6.0.0 libXi.so # 2: Updating /etc/ld.so.config echo "/share/apps/cuda/lib64" >> /etc/ld.so.conf echo "/share/apps/cuda/lib" >> /etc/ld.so.conf /sbin/ldconfig echo # 1. Install NVIDIA driver /share/apps/src/nvidia_cuda/NVIDIA-Linux-x86_64-285.05.33.run --silent
4 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
23 24 25 26 27 28 29 30 31 32 33
cd /tmp/ cp /share/apps/src/nvidia_ganglia/nvidia-ml-py-2.285.01.tar.gz .
34 fi 35 # End root-check IF
22 else 23 24 25 26 27 28 29 30 31 32 33 34 35 fi 36 # End root-check IF # # Restart gmond /etc/init.d/gmond restart # Copy relevant files to Ganglia cd /tmp/ tar -zxvpf /share/apps/src/nvidia_ganglia/ganglia-gmond_python_modules-3dfa553.tar.gz cd ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ cp python_modules/nvidia.py /opt/ganglia/lib64/ganglia/python_modules/ cp conf.d/nvidia.pyconf /etc/ganglia/conf.d/
1 #! /bin/bash 2 # 3 # copy_ganglia_gmond_python_frontend.sh 4 # BASH script to copy relevant files from ganglia-gmond_python_modules to Ganglia, 5 # apply patch for Ganglia web interface and restart necessary services 6 # Assumes that ganglia-gmond_python_modules-3dfa553.tar.gz is in /share/apps/src/nvidia_ganglia/ 7 # Must be root (or at least have sudo privilege) to run this script and run this only on front end 8 9 # Begin root-check IF
5 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
10 if [ $UID != 0 ]; 11 then 12 13 14 15 16 17 clear echo echo " echo " echo exit You must be logged in as root!" Exiting..."
18 else 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 fi 36 # End root-check IF # # Restart necessary services /etc/init.d/gmetad restart /etc/init.d/gmond restart cd /var/www/html/ganglia/ patch -p0 < /tmp/ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ganglia_web.patch # # Apply web patch for Ganglia to display custom graphs cd /tmp/ tar -zxvpf /share/apps/src/nvidia_ganglia/ganglia-gmond_python_modules-3dfa553.tar.gz cd ganglia-gmond_python_modules-3dfa553/gpu/nvidia/ cp graph.d/*.php /var/www/html/ganglia/graph.d/
Upon pointing the browser to the http://FQDN/ganglia/ (e.g., http://paracuda.math.mtu.edu/ganglia/ the link will probably die or be changed to something else in due course), the display should include information about GPU as well, as shown in screenshots below:
With little more work, the rather unaesthetic looking Ganglia web interface can be made to look like a given institutions theme:
Thanks be to
Dr. Allan Struthers for letting his paracuda.math be used for this purpose; my friendly neighbors for their kindness in letting me borrow a NVIDIA Quadro 6000 card; Robert Alexander of NVIDIA (http://developer.nvidia.com/ganglia-monitoringsystem/), Bernard Li of Lawrence Berkeley National Laboratory and Jeremy Enos of National Center for Supercomputing Applications for developing this gmond Python module as well as making time to answer my questions.
6 de 7
24/06/12 20:12
http://sgowtham.net/blog/2012/02/17/ganglia-gmond-py...
Responses to Ganglia gmond Python Module For Monitoring NVIDIA GPU On Rocks 5.4.2
When Will We Be Able to Build Brains Like Ours? says:
2012.03.14 at 13:31:46 [...] model. These two models are at the extremes of simplicity and complex realism. Read more . . . Image via Wikipedia Sooner than you think and the race has lately caused a 'catght' When physiupload.wikimedia.org/wikipedia/commons/thumb /d/d2/Internet_map_1024.jpg/300px-Internet_map_1024.jpg" [...]
Opinions expressed in these pages are purely personal and do not reflect those of any other institution and/or individual 2002-2018 Gowtham All Rights Reserved
7 de 7
24/06/12 20:12