Package installation
- With source tarball
Under Linux, the following commands may be used :
tar -xzf edg-fabricMonitoring.x.x.x.src.tar.gz
cd edg-fabricMonitoring.x.x.x
make
make install
If you don't have a C++ compiler on your system, compilation of CPP files can
be avoided (but please note that sensorLinuxProc requires C++ installed to compile) :
make CPP=0
By default, files are installed in /opt/edg/fabricMonitoring. You can change
the root installation path. The following command installs files in
my/install/root/directory/fabricMonitoring :
make install prefix=my/install/root/directory
- With RPM
Use the rpm -i command, like in the following example:
rpm -i edg-fabricMonitoring.x.x.x-1.i386.rpm
By default, files are installed in /opt/edg/fabricMonitoring. You can change
the root installation path. The following command installs files in
my/install/root/directory/fabricMonitoring :
rpm -i edg-fabricMonitoring.x.x.x-1.i386.rpm
--prefix=my/install/root/directory
Use rpm -e edg-fabricMonitoring.x.x.x to remove the package.
Repository and log files are not destroyed with this action. They should be removed manually if needed.
Configuration and running
- For the monitored nodes:
MSA is installed with default sensors and configuration.
You should edit install_dir/etc/MSA.cfg to setup server, samples, and other configuration.
Then launch:
install_dir/etc/init.d/edg-fmon-agent start
Options are detailed in edg-fmon-agent script.
- For the server node:
Launch:
install_dir/etc/init.d/edg-fmon-server start
Options are detailed in edg-fmon-server script.
By default, files containing the samples are stored in /var/fmonServer.
- Automatic startup at boot time:
To setup automatic startup of MSA and fmonServer when machine
boots, run once chkconfig to register the startup
scripts.
If you don't want fmonServer to be launched, type
rm /etc/rc.d/init.d/edg-fmon-server
If you don't want MSA to be launched, type
rm /etc/rc.d/init.d/edg-fmon-agent
These files are symbolic links to the control scripts in install_dir/etc/init.d.
This should be done before launching chkconfig.
Notes
-
If you want to distribute the software with your configuration, get the source tarball, extract
it and modify etc/MSA.cfg. Then rebuild a distribution with
make rpm.
The new RPM file includes now your configuration, and the agent is able to run without reconfiguration.
You may also include your sensors in the rpm distribution. Use
make rpm "sensors=/path/mysensor1 /path/mysensor2".
|
Sensors are launched by MSA, and communicate with it using the interface specified in
sensorAPI.pdf.
Available sensors are listed below. Configuration parameters are indicated in italics when applicable, and are mandatory unless specified optionnal.
The default configuration file (in ../etc/MSA.cfg) may help you to understand how to configure metrics.
MSA internal sensor
This sensor is special. It is embeded in the agent to implement self-monitoring capabilities.
The following metric classes are available:
MSA.Alive | The first number return is always "1" (can be used as a heart beat). Then follows the number of running sensors out of the number that should be running (x/y). |
MSA.Footprint |
The sample output gives:
agent uptime (in seconds),
total cpu used (1/100th second),
agent resources used :
cpu over last interval,
vsize (kB),
rss (kB),
%mem used,
sensors resources used (total):
cpu over last interval,
vsize (kB),
rss (kB),
%mem used.
|
MSA.HeartBeatTimeout | This metric requires a configuration parameter named timeout. The value of this parameter is returned each time the metric is sampled. This can be used as a timetout limit to implement a contact lost alarm. |
MSA.SensorCheck | This metric requires a configuration parameter named timeout. The value of this parameter is
the maximum response time of the sensor to the 'CHK' command issued when this metric is sampled.
The metric returns the number of sensors which have not replied before timeout, and the number of sensors checked.
|
edg-fmon-sensor-linuxProc
This sensor gathers information from /proc. This is available only with Linux.
The following metric classes are available:
system.uptime | the elapsed number of seconds since boot time |
system.bootTime | the time of last machine boot, in standard timestamp format |
system.existingProcesses | the number of processes existing |
system.createdProcesses | the number of processes created in last timeinterval (average per second and total) |
system.numberOfSockets | the number of sockets in use (total, TCP, UDP
, RAW) |
system.CPUutil | CPU utilisation in percent over last interval (User,
Nice, System, Idle), time interval (seconds), counters discrepencies |
system.contextSwitches | number of context switches during last interv
al (average per second and total) |
system.interrupts | number of interrupts during last interval (average
per second and total) |
system.swapIO | number of swap pages read and write (average per secon
d and total) |
system.pagingIO | number of pages read and write (average per second a
nd total) |
system.networkIO | number of kilobytes read and write (total since boo
t time and average per second over last interval) on the given interface (parameter interface, example: lo, eth0) |
system.memoryUsed | RSS memory in use (kilobytes) and the corresponding percentage of total RAM |
system.DiskIO | name, type, size(MB), used(%), read(kB/s), write(kB/s)
, use(%) for each partition over last interval (1 sample per partition) |
system.DiskStat | name, read(kB/s), write(kB/s), use(%) for each disk over last interval
(one disk per sample if parameter 'multiline' set to 1, a list for all disks otherwise) |
edg-fmon-sensor-systemCheck
This sensor measures various quantities using common command line utilities.
The following metric classes are available:
spaceUsed |
Estimate file space usage recursively in a given directory.
This metric is measured using command 'du -s path'.
Symbolic links are not dereferenced. If some subdirectories
are not accessible, the sampling fails.
The returned value is the size used in kilobytes.
Parameters:
          path     |     Path of root directory to scan (e.g. '/tmp'). |
|
daemonCheck |
Counts the number of instances of a given process.
It is based on the output of command 'ps -fC name'.
The returned value is the number of matching processes.
Parameters:
          name     |     The command name of the process (e.g. 'inetd'). |
          user     |     Only count the processes matching the
given user name (e.g. 'root'). Optionnal. By default, count all. |
|
executeScript |
Executes a command in the shell. The returned value is the output of the command.
It is reformated in one single line of maximum 2000 characters.
Parameters:
          command     |     The shell command to execute (e.g. 'ls /bin', or 'ls -l /tmp | grep -c root'). |
|
serviceStatus |
Executes '/sbin/service status' for a given service.
Returns 1 if the service is running, 0 otherwise.
Parameters:
          service     |     The service name to be checked (as known by chkconfig). |
|
edg-fmon-sensor-fileUtil
This sensor provides various file-related utilities.
file.dump |
Once initialized, this metric stores in the monitoring what is read from
the file, line by line. This metric does not need to be sampled, it sends
samples as the file is filled. This works well with named pipes too.
Beware that the whole content of the file is dumped at the beginning.
If you do not wish such behavior, please use file.tail metric instead.
Parameters:
          file     |     Path to the file to read from. |
Output format:
          (string)     |     What is written to the file, one sample per line. |
An example utilisation is to get the LCFG log messages in the monitoring DB. To get LCF
log messages, redirect them to a named pipe. The following entry is required in the
syslog.conf file to route LCFG events to the sensor:
          local3.* |/var/obj/tmp/monitor.fifo
You might need to create the pipe with mkfifo.
|
file.redump |
This metric re-dumps the content of a given file each time it is sampled.
There is one sample per line.
Parameters:
          file     |     Path to the file to read from. |
Output format:
          (string)     |     The content of the file, one sample per line. |
|
file.tail |
This metric returns strings appened to a given file, in
the same way the 'tail' command works. This metric only outputs
what is appended once initialized. If you want the full content
of the file, use the file.dump metric.
This metric is useful to trace log messages, etc.
You don't need to sample this metric. Samples are sent as the file grows.
Parameters:
          file     |     Path to the file to read from. |
Output format:
          (string)     |     What is written to the file, one sample per line. |
|
file.processAccounting |
This metric returns process accounting information appened to a
given file. This metric only outputs what is appended once
initialized. The binary information is converted to ASCII readable data.
You don't need to sample this metric. Samples are sent as the file grows.
Parameters:
          file     |     Path to the file to read from. (typically /var/log/pacct) |
          filter_uid     |     reports accounting information only for user with the given uid. Optionnal. |
          filter_gid     |     reports accounting information only for users belonging to group with the given gid. Optionnal. |
Output format:
          (string)     |     Accounting command name.
|           (long)     |     Accounting process exitcode.
|           (int)     |     Accounting user ID.
|           (int)     |     Accounting group ID.
|           (int)     |     Controlling tty.
|           (long)     |     Beginning time. (seconds since 1970)
|           (int)     |     Accounting user time.
|           (int)     |     Accounting system time.
|           (int)     |     Accounting elapsed time.
|           (int)     |     Accounting average memory usage.
|           (int)     |     Accounting chars transferred.
|           (int)     |     Accounting blocks read or written.
|           (int)     |     Accounting minor pagefaults.
|           (int)     |     Accounting major pagefaults.
|           (int)     |     Accounting number of swaps.
|
|
file.size |
This metric measures the size of a given regular file (no directory, symbolic link, etc).
Parameters:
          file     |     Path to the file to measure. |
Output format:
          (int)     |     The file size in bytes. -1 if file does not exist (or is not a regular file). |
|
Contact
|
|
|