NAME

ce-all - CE Information Provider


COPYRIGHT

(c) by Florian Schintke <schintke@zib.de>,

Thomas Röblitz <roeblitz@zib.de>,

Jörg Meltzer <meltzer@zib.de>

Konrad-Zuse-Zentrum für Informationstechnik Berlin, 2001, 2002


AUTHOR

Author(s):

Thomas Röblitz <roeblitz@zib.de>

Florian Schintke <schintke@zib.de>

Jörg Meltzer <meltzer@zib.de>


Assumptions & Tools

Assumptions
We assume that Globus has been installed.

Used Tools
OS: host, which

PBS: qstat

LSF: lsid, bqueues, lshosts, bhosts, bjobs

Condor: condor_q, condor_status, condor_version, condor_config_val

Globus: globus-hostname


Parameters

ARGS
Cluster
 '-cluster <cluster>' specifies the name of the server. The server name supplied
 is only honored for the PBS batch system. If the PBS server named in <cluster>
 doesn't exist an attempt is made to use the default PBS server for this client.

Cluster batch system bin path

 '-cluster-batch-system-bin-path <cluster-batch-system-bin-path>' specifies
 the path where the commands of the cluster batch system can be found.

Globus hostname script

 '-globus-hostname-script <globus-hostname-script>' specifies the complete
 path to the a script that prints out the hostname of the gatekeeper
 (e.g. 'globus-hostname').

Globus config file

 '-globus-config-file <globus-config-file>' specifies the path + filename of
 the Globus configuration files (e.g. globus-jobmanager.conf).

Grid mapfile

 '-auth-users-from-grid-mapfile' reads authorized users from the grid-mapfile
 rather than from the static configuration file.

Host info bin path

 '-host-info-bin' specifies the location of the hosts info script.

Local resource management system

 '-lrms <lrmsstring> specifies the my Ressource Management System (default: PBS).

Maximum cputime

 '-maxcputime <hh:mm:ss>' defines the maximum cpu time for a job
 submitted to the ce (Condor only).

Maximum wallclocktime

 '-maxwalltime<hh:mm:ss>' defines the maximum wall clock time allowed for
 jobs submitted to the ce (Condor only).

Queue

 '-queue <queue1 queue2...>' specifies non rms submission queues
 or a set of condor central managers (CONDOR_HOST).

Static config file

 '-static <static>' specifies the name of the file that contains static
 information.

B <VO file> '-vo <VO>' specifies the name of the file which contains the VO TAG Info.

Resource management system execution queue

 '-rms-execution-queue <queue> specifies the rms execution queue
 (stats are only shown if set in -queue switch) (PBS & LSF only).

Resource management system submission queues

 '-rms-submission-queues <queue1> <queue2> ...'
 specifies rms submission queues (PBS & LSF only).

CESE Bindings

 '-cesebind <configfile> | queueregex se | queueregex se directory'
  specifies the cesebind configuration

Ttl

 '-ttl <ttl>' specifies the value for entryTtl.

Dynamic subcluster

 '-sshworkernode <node>' specifies the node from which to take subcluster information.
 Feature is disabled.


Notes

Notes-LSF

General:

 LSF allows various ways to configure a queue, we will try to find out the CE's
 values directly using values by bqueues command, usually we have a node by node
 configuration, so we view a queue as a set of nodes and accumulate missing
 bqueues values with the nodes values instead (bhost, lshost command).

MaxRunningJobs:

 Unless a queue specifies the MaxRunningJobs we take the value from the nodes
 MaxRunningJobs. In case the admin employed a user/group based policy and
 absolutely no information for this attribute can be found we use the value of
 the total cpus.

TotalCPUs:

 TotalCPUs differ from queue to queue, the value an accumulation of each host
 taken from bqueues -l field 'HOSTS: {hosts}'. Each hosts Cpus are determined
 first from lshosts command or second bhosts MaxRunningJobs.

Notes-PBS

Notes-CONDOR

 See README.Condor


Program Sequence

(when script is configured for all record types)

set defaults & initialize variables

parse cmdline

obtain static attributes values

create ce records

create cluster record

create subcluster records

create filesystem records

print ce records

print cluster record

print subcluster records

print remote filessystem records

call host info script


Configuration & initialisation functions

see subprocedures:  setGlobalDefaults determineLrmssetLrmsDefaultsprocessCommandLineParameters

setGlobalDefaults

 Here we set some global defaults.

determineLrms

 try to get lrms information from the program name
 and check for '-lrms' commandLine option
 aborts: unknown lrms is found in commandline
         scriptname are commandline option are mutually exclusive

setLrmsDefaults

PBS & LSF:

 set wp4 rms variables

CONDOR:

 set defaults for config files and policy variables

processCommandLineParameters

 parse command line parameters
 aborts: Cluster batch systems bin path could not be found,
         wp4 rms managed queues are misconfigured.


Accumulation of static data

The procedures

getStaticDatareadStaticInformationgetCeid

obtain all relevant static data, which is integrated to CE Records and cluster records

getStaticData

PBS:

 post:
 the Cluster name is defined to be the gatekeeper host. ClusterArg,
 if specified, is used as the hostname of the PBS server to query.
 The server name as returned by the server will be used to set
 Server for subsequent PBS commands. The LrmsVersion is either
 pbs_version info taken from the
   qstat -B -f 
 shell command, or "-" if info could not be found
 the ServerParam is specified as @ and the name of the Server
   $ServerParam = "@$Server"
 [Max|Default][CPU|Wall]TimeServer is the value in seconds of
   resources_[max|default].[cput|walltime] from
   qstat -B -f $Server" command or if the value is missing "-"
 @AllQueues array contains queuenames from qstat -Q -f $ServerParam
 aborts:
         no queues could be found on the cluster,
         no node information found

LSF:

 post:
 The Cluster name is defined to be the gatekeeper host.
 the LrmsVersion value is taken from lsid shell command
 node and the running jobs information are gathered here in the hashs %Jobs and %Nodes
 since calling bjobs and bhost/lshost shell calls are expensive in run-time
 nodes and jobs have references to each other (see Readme.Implementationdetails)
 which enables us to view the queue as a set of nodes
 @AllQueues contains all queues found by bqueues
 aborts:
         no queues could be found on the cluster,
         no node information found

CONDOR:

 post:
 The Cluster name is defined to be the gatekeeper host.
 the LrmsVersion value is taken from condor_version shell command
 @AllQueues contains the collector names (CEIdNames) of all condor hosts
   specified with condor_config_val -pool CONDOR_HOST -name CONDOR_HOST COLLECTOR_NAME
 %CondorHosts hash contains a map collector name (CE name) -> CONDOR_HOST
 aborts:
 no pools could be found

readStaticInformation

Here we perform all configfile reading.

getCeid
 We return a provisional CEID which is the same for each queue in the
 following course of the program.
 The string is gatekeeper_host:gatekeeper_port/jobmanager-lrms.
 If we have to determine the gatekeeper host, gatekeeper port and have
 a globus config file specified in commandline we read out the globus variables
 otherwise we try to read the values from static config files first CE using
 the current lrms.
 aborts: GlobusGatekeeperHost or GlobusGatekeeperPort is undefined.
readAuthUsersFromGridMapfile
 Read a list of user records, which will be added to the
 GlueCEAccessControlBaseRules of all found CE records.
readVOInputTags

bla bla bla bla bla
readStaticConfigFile
 We can redefine or add new CE/subcluster/cluster/filesystem information by
 adding a static config files information to dynamically created CE records.
 Read out static config file by fetching either Computing Elements,
 cluster, subCluster or remoteFileSystems.
 At first, check the line for an object identifier.
 Second, scan the dn string for sub keys and store their value
 Third, check found keys values and figure out what kind of record
 the dn points to.
 Fourth, now that we are context aware, read the attribute/value pairs
 and store the attributename in the key of the hash of the identified hash
 and the location specified by the full dn string and their attribute value to
 the value of the same hash.
 aborts: Static config file could not be opened.
getSubClusterInfo(file | dn)
 This procedure enables us to include subcluster and filesystem information from
 a periodically updated remote host config file (by admin).
 DN-STRING construction for subcluster and filesystem record:
 Read out remote host config file specified by first parameter.
 There are two modes, if we allow only one subcluster
 (WP1 people can operate with only one) then we create a subcluster record
 and set the subcluster unique id rdn to the clusters unique id.
 Otherwise, if we allow more than one subcluster then we process an additional
 hostname parameter from the file, which represents the subcluster unique id.
 We create the filesystem dn by adding the filesystem name rdn to the
 subclusters dn.
 Alternatively a dynamic mechanism is provided, given a dn string of a
 subcluster node, the procedure will cause ssh to hop to the node specified by
 the GlueSubClusterUniqueID. Information will be read from stdout of the
 subcluster info script. 
 NOTE: The effective subcluster id is dependent of
 the use of WP1 mode, 
 If in WP1 mode the workernode must be set wit `-sshworkernode` option.
 
 ASSIGNING Attributes:
 The attributes GlueSubClusterUniqueID unique id and GlueSubClusterName
 are set after we found out the subclusters subcluster unique id rdn.
 Then we fetch cpu mem and filesystem info (see README) and set attributes.


Creation of CE Records

We add dynamical information from either cluster batch system commands, or from static information we gathered before.

see subprocedures:  staticCEDataToQueueglueCEToQueueglueCEInfoToQueueglueCEPolicyToQueueglueCEStateToQueue

staticCEDataToQueue

Resolve the static config file CE records dn string regular expression and integrate the content to matching actual CE records.

cesebindsToQueue

Append ce se bindings specified by commandline tuples/triples which are stored in Array CESEBinds.

jobsToQueue

For each CE add all job records to a CE when the jobs queue and the CEName are equal.

nodesToQueue

For each CE add all node records to a CE that have a QUEUE flag for the CEName.

glueCEToQueue

For each CE add the GlueCE attributes to the CE.

glueKeysToQueue

For all CEs add the hostingclusters cluster id to the queues GlueForeignKeys.

glueCEInfoToQueue

For each CE add the GlueCEInfo attributes to the CE.

LSF:

 Count CPUs of each of the queues nodes.

glueCEPolicyToQueue

For each CE add the GlueCEPolicy attributes to the CE.

PBS:

Process 'qstat -Q -f queue@server' for policy information.

 Get GlueCEPolicyPriority from line
   Priority = aPriority
 Get GlueCEPolicyMaxTotalJobs from line
   max_queuable = aMaxQueuable
 Get GlueCEPolicyMaxRunningJobs from line
   max_running = aMaxRunning
 Get GlueCEMaxCPUTime and GlueCEMaxWallClockTime from line
   resources_max.cput = aCPUTime
   resources_max.walltime = aWallClockTime
   (for full description and fallbacks see README)

LSF:

Process 'bqueues ClusterParam -l' for policy information.

 Get GlueCEMaxCPUTime and GlueCEMaxWallClockTime from line following
 CPULIMIT                 RUNLIMIT
 and GlueCEPolicyPriority and GlueCEPolicyMaxRunningJobs from line following
 PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV
 We also set GlueCEPolicyMaxTotalJobs to GlueCEPolicyMaxRunningJobs.
 (for full description and fallbacks see README)

CONDOR:

Process condor_status and condor_config_val for policy information.

 Get GlueCEPolicyMax[CPU|Wall]Time from commandline option or default value.
 Get GlueCEPolicyMaxRunningJobs by counting machines in condor_status command.
 Get GlueCEPolicyMaxTotalJobs as product of CondorMaxTotalJobsFactor and GlueCEPolicyMaxRunningJobs.
   (if defined CondorMaxTotalJobsFactor is dynamic `condor_config_val MAX_JOBS_RUNNING`).
 Get GlueCEPolicyPriority from $PriorityUndefined variable.

glueCEStateToQueue

For each CE add the GlueCEState attributes to the CE.

see subprocedures:  checkImmediateJobStartcheckIllegalValues

PBS:

 Process qstat -Q -f queue@server command
 Get GlueCEStateFreeCPUs is the sum of all nodes free cpus.
 Get GlueCEStateRunningJobs from line
   state_count= ...Running:num_running Exiting:num_exiting;
 Get GlueCEStateTotalJobs from line
   total_jobs=num_total_jobs
 GlueCEStateWaitingJobs is difference between total and running jobs.
 Get GlueCEStateWorstResponseTime & GlueCEStateEstimatedResponseTime
   see README
 Get GlueCEStateStatus processing lines
    enabled = is_enabled
    started = is_started

LSF:

 Process bqueues -l -m cluster
 Get GlueCEStateFreeCPUs is the sum of all nodes free cpus.
 Get GlueCEStateRunningJobs, GlueCEStateStatus, GlueCEStateTotalJobs from line following
   PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV
 GlueCEStateWaitingJobs is difference between total and running jobs.
 Get GlueCEStateWorstResponseTime & GlueCEStateEstimatedResponseTime
   see README
 Get GlueCEStateStatus processing lines
    enabled = is_enabled
    started = is_started

see: setLSFCPUPowersetLsfRemainingResponse

CONDOR:

 Process condor_q -global -pool $Server
 Get GlueCEStateFreeCPUs from command `condor_status -avail -long` counting Cpus.
 Get GlueCEStateRunningJobs, GlueCEStateTotalJobs, GlueCEStateWaitingJobs
   from command `condor_q -global` looking for regexp
   numJobs jobs; numIdle idle, numRunning running.
 GlueCEStateStatus is defined as "Production"
 GlueCEStateWorstResponseTime GlueCEStateEstimatedResponseTime is calculated
   in procedure computeResponseTimes with a static approximation.

see: computeResponseTimes

checkImmediateJobStart(dn)
  All Lrms check if they can immediately run a job.
  If we have free cpus and the queue policy allows more jobs to run then
  we can consider immediate job start (ie. we set the response times to 0).
  Before allowing immediate job start we also require that the
  fraction of waiting jobs in the system is less than or equal to IdleJobThreshold.

see: glueCEStateToQueue

checkIllegalValues(dn)
 All Lrms check if number of running jobs is greater than
 max_running jobs and total jobs greater than max_total jobs.
 The max value is increased to match the actual value of the
 CE specified by param dn.
setLsfCPUPower
 We compute for every queue a value for the CPUPOWER attribute.
 We traverse the queues nodes and accumulate the cpu power of the nodes.
 Each queue can use 1/number_hosted_queues part of a cpu.
 If all nodes are used exclusivly by a queue then CPUPOWER = TotalCpus.
setLsfRemainingResponse(dn)
 We compute the remaining wallClockTime for each queue
computeResponseTimes(dn)
 This general formula is used if no jobinfo is used to compute
 estimated- worstResponseTime.
 We add the GlueCEStateWorstResponseTime and GlueCEStateEstimatedResponseTime
 attribute to the CE specified by param dn.

computeExecutionQueueSizes

 This procedure is used by rms and determines the suggested values for
 execution queue attributes.

rmsStatetoQueue

 Add WP4 rms state information (submission/execution/normal) of a queue
 using the RMSSTATE flag.
 Shared attributes of rms submission queues are recalculated,
 since they depend on execution queues values.
calculateSubmissionQueue(dn)
 Set running jobs of the CE specified by parameter dn
 to execution queues running jobs
 Level maximums for running and total jobs
 Decrease wall clock time of a submission queue to execution queue value.

activateQueues

Activate all queues set in '-queue' for printing using PRINTABLEQUEUE flag.


Cluster Attributes Integration

 We allow multiple sources of record definition, the admin may
 create multiple cluster / subCluster / remoteFileystem records each with
 different regular expressions. Therefore we have to unify these records.

see: unifyClusterDataaddCEsaddSubclustersunifySubClusterDataunifyFileSystemData

unifyClusterData

This procedure resolves the regular expressions in %Clusters hash and copies entries to %UnifiedCluster.

addCEs

 Add hosted CE's to the cluster.

addSubClusters

 For all CE, get subcluster information if a subcluster node specified 
 in the static configfile matches one of the CE's nodes.
 For each CE determine at least one SubCluster.

unifySubClusterData

This procedure resolves the regular expressions in %SubCluster hash and copies entries to %UnifiedSubCluster.

unifyFileSystemData

This procedure resolves the regular expressions in %FileSystems hash and copies entries to %UnifiedFileSystem.


Print Procedures

 Each record type has got its own print procedure.

see:  printCeInformationprintClustersprintSubclustersprintFileSystemsprintHostInfo

printCeInformation

 This procedure traverses the hash of all CE and calls the printCE procedure
 if lrms, rms state information and the mds-vo-name of the CE apply to the
 current environment.
 If we use WP4 rms queues the execution queue should only be printed if set in '-queue'.

printCE

 This procedure prints the CE information.
 We print all data for the ce specified by parameter $ce.
 1. print the dn
 2. print the objectclasses
   - print cetop,
   - ce
   - schemaversion
   - all other classes using the tags $AllCE{$ce}{OBJECTCLASS_class}
 3. print the attributes
   - schemaversion
   - remaining attributes in sorted order
 4. print timestamps

printClusters

 If the information provider is configured to print cluster records we will append
 the records matching to the CE's GlueCEHostingClusters.
 Note prints only the first cluster if $Wp1SubClusterMode is set.
 1. print the dn
 2. print the objectclasses
   - print clustertop,
   - cluster
   - schemaversion
   - all other classes using the tags $UnifiedCluster{$cl}{OBJECTCLASS_class}
 3. print the attributes
   - schemaversion
   - remaining attributes in sorted order
 4. print timestamps

printSubClusters

 If the information provider is configured to print cluster records
 we will append the subCluster records matching to the CE's GlueCEHostingClusters
 from hash %UnifiedSubCluster.
 Note prints only the first cluster if $Wp1SubClusterMode is set.
 1. print the dn
 2. print the objectclasses
   - print clustertop,
   - subcluster
   - schemaversion
   - all other classes using the tags $UnifiedSubCluster{$scl}{OBJECTCLASS_class}
 3. print the attributes
   - schemaversion
   - remaining attributes in sorted order
 4. print timestamps

printFileSystems

 Print all fileSystems from hash %UnifiedFileSystem.
 1. print the dn
 2. print the objectclasses
    - clustertop
    - hostremotefilesystem
    - schemaversion
 3. print the attributes
   - schemaversion
   - remaining attributes in sorted order
   - all other classes using the tags $UnifiedFileSystem{$fileSys}{OBJECTCLASS_class}
 4. print timestamps

printHostInfo

 Print host information if HostInfoBinArg is specified in commandline.

printCesebindings

 Print CE SE binding record.