EDG fmon tutorial

  1. Introduction
  2. Install package
  3. Configure metrics
  4. Write sensors
  5. Access repository


Introduction
The EDG WP4 Fabric Monitoring framework provides a facility to collect data from distributed nodes and to store them centrally. The following picture gives an overview of the different components involved:
fmon components
The package provides an agent (edg-fmon-agent, usually referred as Monitoring Sensor Agent - MSA) running sensors on each node to monitor, and a central server (edg-fmon-server) to collect data. The server receives samples as they are measured by MSA, and stores them in a database. Many types of consumers can access the repository, including correlation engines, plotting tools, alarm display, etc. Programming interfaces are provided to add new sensors and to access the database.

We will discuss in this tutorial how to extend the set of sensors, how to deploy them, and how to access data in the repository.



Install package
The first action is to install the tutorial package on your computer. This package is basically the EDG release with custom config files for this tutorial.

The normal EDG distribution is available as a rpm. But to avoid the need of root access, this tutorial is provided as a tarball that can be installed anywhere. File organization is the same as for the rpm, under a chosen root directory. The commands given below assume you put the package in your home directory.


  • You can either build it (if you have the edg-fabricMonitoring source package) with:
    	make tutorial
    	cp edg-fmon-tutorial.tar.gz ~
    	
  • Or you can download the tarball.
    	wget -P ~ http://wwwinfo.cern.ch/pdp/monitoring/tutorial/edg-fmon-tutorial.tar.gz
    	


To unpack the files, execute the commands:
	cd
	tar -xzvf edg-fmon-tutorial.tar.gz
	
This creates edg-fmon-tutorial directory, which includes all the files for the following exercises. At the end of the tutorial, if you need to clean-up the system, just type:
	rm ~/edg-fmon-tutorial.tar.gz
	rm -rf ~/edg-fmon-tutorial
	
Now that the package is installed, we need to set some environment variables (like the path to the edg-fmon package ,as it is not in the standard place, and the server URL). A script with the default variables can be sourced to initialize the environment properly. It must be done in all shell windows in which you plan to use fmon software. Under bash, use:
	cd ~/edg-fmon-tutorial/sbin
	. ./edg-fmon-tutorial-setup.sh
	
Finally we can launch the MSA:
	~/edg-fmon-tutorial/etc/init.d/edg-fmon-agent start
	
All the run time files (log files, repository cache, etc) are stored in ~/edg-fmon-tutorial/var/fmon. We can check if everything was fine by having a quick look to the log file:
	more ~/edg-fmon-tutorial/var/fmon/edg-fmon-agent.log
	
Of course, if you want to query status or stop the agent, you can use one of:
	~/edg-fmon-tutorial/etc/init.d/edg-fmon-agent status
	~/edg-fmon-tutorial/etc/init.d/edg-fmon-agent stop
	



Configure metrics
The configuration file of the MSA is written in ~/edg-fmon-tutorial/etc/edg-fmon-agent.conf.

This file is an ASCII representation of the configuration tree, consisting of key/value pairs organized hierarchically. So that it is easily readable (for us and for the MSA), it has a quite strict rule: the number of leading tabulations of each line correspond to the level of the tree node. Then follows the configuration node name (the key), and possibly a value. Take care that only tabs (and no spaces) are present before the first word on each line. Lines beginning with a dash (#) are skipped (this can be used to add comments). Blanks lines are allowed.

Now that this tedious but necessary description has been done, we can have a look at the content of the file.

The one distributed with the package contains only information related to general MSA setup (repository server name, etc). But the 'sensors', 'metrics', and 'samples' subtrees are empty. We are going to fill in the gaps now. The following example configures a metric checking if the program 'xeyes' is running on the machine.
  • The 'sensors' section holds information about the sensor processes that we want to use. This includes the path to the executable, and the list of metric classes that it can measure.

    Let's add the 'systemCheck' sensor, which measures the metric we need:
    		sensor1
    			CommandLine     $(EDG_LOCATION)/libexec/edg-fmon-sensor-systemCheck
    			MetricClasses
    				daemonCheck
    				spaceUsed
    				executeScript
    	
    The name 'sensor1' is the reference that is used in the log file to identify the sensor. This part of the configuration needs to be filled once for all for each sensor. The list of metrics measured by a sensor is more or less static (this sensor has 3 metric classes defined).

  • Then, the 'metrics' section lists the metrics we want to instanciate. For each metric, we need an arbitrary number to be used as metric id, the metric class it corresponds to (it must be declared in one of the sensors previously listed), and some initialization parameters. The MSA launches the sensors needed for the given metric instances, and initialize them according to this configuration. To continue with our example, we define metric '1', based on metric class 'daemonCheck' configured for 'xeyes':
    		1
    			MetricClass daemonCheck
    			Parameters
    				name	xeyes
    	
    The metric id is '1', and it is used in the repository to reference the samples. It is of type 'daemonCheck', which has been described before, and it runs with parameter 'name' set to 'xeyes'.

  • Finally, we need to trigger the measurements of the wanted metrics at the given frequencies (although sensors can send data on their own behalf, but it is not the common behavior). Metrics are grouped in samples, so that measurements can be synchronized if necessary. Now we add in 'Samples':
    		sample1
    			Timing	5 0
    			Metrics
    				1
    	
    This means we want to trigger a sampling of metric '1' every 5 seconds, with zero second offset (it is the time the agent waits before the first sample). This does not imply necesserally that there is a measurement stored in the database every 10 seconds, because a sensor can stay quiet after a sampling request (although it is not common, and this is not true for our metric).


After we made these modifications, the configuration file should look like:
MSA

	General
		LocalCache
			Path	$(EDG_LOCATION_VAR)/agent_db
			
	Transport
		UDP
			Server     $EDG_FMON_SERVER_NAME
			Port       $EDG_FMON_SERVER_PORT
			
	Sensors
		sensor1
			CommandLine     $(EDG_LOCATION)/libexec/edg-fmon-sensor-systemCheck
			MetricClasses
				daemonCheck
				spaceUsed
				executeScript
	Metrics
		1
			MetricClass daemonCheck
			Parameters
				name    xeyes

	Samples
		sample1
			Timing  5 0
			Metrics
				1
	
When saved, the MSA re-reads the configuration automatically, as you can see in the log file. And the measurements should actually appear in the database. You can verify that easily in the local cache located in ~/edg-fmon-tutorial/var/fmon/agent_db/. A flat file system is used, one file per metric and per day, within each subdirectory (one per node).

Let's launch a subscriber, to have a 'live' feeling of what's going on.
	~/edg-fmon-tutorial/sbin/edg-fmon-subscribe node=$HOSTNAME metric=1
	
This dumps the latest values of the metric, and then a message is displayed each time a new sample is available:
	------------------------------------------------
	callback: pcitpdp47:1 --> 1037096424 : 0
	------------------------------------------------
	
It contains MSA identifier (usually the node name, here 'pcitpdp47'), the metric identifier ('1'), the timestamp ('1037096424'), and the metric value ('0'). Once you launch xeyes, the value changes:
	------------------------------------------------
	callback: pcitpdp47:1 --> 1037096800 : 1
	------------------------------------------------
	
Note that the value is actually the number of running daemons among all the users. If you launch other instances of xeyes the value will increase (2,3,4..). There is the possibility to count the instances of the process for a given user. To do that, the metric parameter user has to be set to the wanted user name.


Exercise:
  • Change the sampling frequency of the metric.
  • Add a new metric counting the number of instances of mingetty process owned by root.



Write sensors
Each sensor is a separate process launched by MSA when it starts. MSA then uses an ASCII protocol through the standard input/ouput of the sensor to initialize metrics, trigger measurements and collect samples. In this way, it is fairly flexible to add new sensors, and to extend the capabilities of the monitoring system.

Several ways exist to add a new sensor, and we will now go through them.
  • Simple script
    In the case of a simple metric that is not measured frequently, we can write a script that returns a value on standard output and plug it directly as it through the 'sensorSystemCheck' sensor. This sensor implement the metric class 'executeScript' which has the described behaviour. For example, if we want to count the number of console user 'root' on the machine, the simplest would be to do
    	who | grep -c root
    	
    Here is the corresponding declaration in the configuration file 'Metrics' section:
    	Metrics
    		2
    			MetricClass executeScript
    			Parameters
    				command		who | grep -c root
    	

  • Implement the protocol in your favorite scripting language
    This is the second possibility to write a sensor. You have to implement the somewhat trivial sensor protocol in your script, so that it understands MSA requests. You need for this the protocol documentation.

    Implementing the protocol is out of the scope of this tutorial, but we can have a quick look at how it works by launching a sensor from the command line:
    	~/edg-fmon-tutorial/libexec/edg-fmon-sensor-systemCheck
    	
    If we now type:
    	NFO [ENTER]
    	
    The sensor outputs some description of the metric classes it knows (formatted in a peculiar way):
    	NFO
    	NFO spaceUsed   	Estimate file space usage recursively in a given directory (param. 'path').
    	NFO daemonCheck 	Count the number of processes with a given name (param. 'name') running in
    				the system under given user (opt. param. 'user').
    	NFO executeScript	Execute an external script (param. 'command') and returns the output of the
    				script as value.
    	NFO
    	
    It is possible to instanciate metrics:
    	INI 1 spaceUsed path="/bin"             [ENTER]
    	INI 2 daemonCheck name="xeyes"  	[ENTER]
    	
    And then we can sample them:
    	GET 1 2                         	[ENTER]
    	
    Which outputs:
    	PUT 01 1 1037117543 6388
    	PUT 01 2 1037117543 0
    	
    This is the syntax the sensor uses to store samples in the database:
    	PUT 01 metric_id timestamp value
    	

  • Use the C++ api
    Altough not convenient for every task, this method is very easy to implement. You only need to write a 'sample' method for every metric class you want to implement, and optionnally init functions.

    As example, we are going to write a metric that returns an index of CPU power available. A possible measurement code could be to simply time a big loop in a low priority process:
    	#include <stdlib.h>
    	#include <sys/time.h>
    	#include <sys/resource.h>
    
    	double timeLoop(){
    		int i;
    		double a=0;
    
    		struct timeval t1,t2;
    		struct timezone tz;
    
    		// set process priority very low
    		setpriority(PRIO_PROCESS,0,20);
    
    		// now time a little loop 
    		gettimeofday(&t1,&tz);
    		for(i=0;i<10000000;i++) {
    			a+=1/((float)rand());
    		}
    		gettimeofday(&t2,&tz);	
    
    		// elapsed time in seconds
    		return t2.tv_sec-t1.tv_sec+(t2.tv_usec-t1.tv_usec)/1000000.0;
    	}
    	
    We now need to interface the sampling code to the MSA. The API provided implements a C++ class named metricBase. All we have to do is to create a new class that inherits from it, to implement the virtual method sample(), and to register it in the loop program. Use the example provided as a starting point:
    	~/edg-fmon-tutorial/src/sensorExample.cpp
    	
    And add the timeLoop() function to it, as well as:
    	// our timing metric
    	class timeLoopMetric: public metricBase{
    		public:
            	int sample(){
                    	char value[10];
                    	snprintf(value,10,"%.3f",timeLoop());
                    	storeSample01(value);
                    	return 0;
            	}
    	};
    
    	// factory function to create instances of the new class
    	metricBase *timeLoopMetricFactory(metricParams *p){
    		return new timeLoopMetric;
    	}
    	
    The MSA interface currently accepts strings only. That is why we need to format the floating point number to an ASCII number. Then we send the sample to the MSA with the storeSample01() function, which takes a null terminated string as argument. But this is not enough, we have also to register the metric in the main() function (after the existing registerMetric() call), with the following piece of code:
    	registerMetric("timeLoop","This metric times the execution of a low priority loop",timeLoopMetricFactory);
    	
    This tells the interface that we have a new metric class named timeLoop. The second argument is the description of the class as shown uppon NFO request from MSA, and the last is the reference to the factory function for the new C++ metricBase derived class

    To build the new sensor:
    	cd ~/edg-fmon-tutorial/src
    	g++ sensorExample.cpp -lsensorAPI -L. -o ../bin/sensorExample
    	
    This creates a new executable sensor ~/edg-fmon-tutorial/bin/sensorExample, which you can use as shown in previous section of the tutorial. The fast test is to launch it from the command line. An example output is shown:
    	> ~/edg-fmon-tutorial/bin/sensorExample 	[ENTER]
    	
    	LOG 0 INFO 2002-11-13 13:43:05 	Sensor started
    	LOG 0 INFO 2002-11-13 13:43:08 	Sensor implemented with API version 1.01
    	
    	> NFO            	[ENTER]
    		
    	NFO myMetric This is an example metric
    	NFO timeLoop This metric times the execution of a low priority loop
    	NFO
    	
    	> INI 1 timeLoop 	[ENTER]
    	> GET 1           	[ENTER]
    
    	PUT 01 1 1037191395 4.123
    	
    As you may have noticed, the example file includes a definition of myMetric class. It is more complete than the one we implemented in the sense that it redefines more methods of metricBase class. The comments in the file should help you to understand what these methods do.

    Another interesting part of the code is the main loop: so that the API can process I/O messages exchanged with MSA, the function checkAndProcessMessages should be called periodically. The timeout parameter provided guaranties that the function does not block indefinitely, so the user can add code to the main loop.


Exercise:
  • Pass the number of loops to time as a metric parameter.
  • Configure the metric in the MSA.



Access repository
Currently, the only API available for the repository is a C API.

We have seen previously how to use edg-fmon-subscribe, a repository subscriber that dumps on the screen the value of subscribed metrics. The simplest to write a repository subscriber is to have a look at repositoryAPI.h and to the source code of edg-fmon-subscribe, which is the file MR_Client.c. To select the repository, set the environment variable MR_SERVER_URL correctly (this is done when you source edg-fmon-tutorial-setup.sh):
	export MR_SERVER_URL=http://my_server:12409
By default, a local connexion to port 12409 is done. edg-fmon-agent and edg-fmon-server use by default this port for queries and subscriptions. The port number is set via variable MR_SOAP_PORT before launching MSA or repository. Note that for the tutorial we use port 12411 instead (to avoid conflicts with possible running systems). To build the executable:
cd ~/edg-fmon-tutorial/src
gcc MR_Client.c -lrepository -I. -L.
The current interface uses some structures defined in edg_monitoring_types.h, but a simpler interface (C and some scripting languages) will be delivered.




Contact: Sylvain Chapeland
22th May 2003