Pool Setup and Administration

Overview

In the dCache context, a pool is a contiguous disk area within a local filesystem which is controlled by a PoolCell. Multiple pools can coexist on a single host. For each pool, a separate PoolCell must be present in the PoolDomain (JVM). Disk areas of different pools must not overlap. A pool area can not span multiple disk partitions.

For simplicity we currently only support one PoolDomain per host, which is no restriction because this PoolDomain may contain any number of PoolCells.
This constraint is introduced by the temporary startup procedure and not by the dCache software which would allow multiple PoolDomains on a single host.

Preparing the pool area

Before a directory is accepted as <poolRootDirectory> for a pool disk area, a certain subdirectory structure has to be created. The <poolRootDirectory> must contain : Both subdirectories and the setup file must be empty when the PoolCell is started the first time.

Defining/Starting a pool

All Pools on a single host must be defined in one single <PoolListFile>. This file needs to be in the 'jobs' directory. It contains all local pools, one per line. No empty lines or comment lines whatsoever are allowed. The line starts with the name of the pool followed by one or more blank characters followed by the full path of the <poolRootDirectory>. The file is only scanned on startup of the Domain resulting in the fact that changes in this file are not effective as long as the Domain is not restarted. Nevertheless there is a way to make arbitrary changes while the Domain is active.
When coming up, the Domain creates a PoolCell for each entry in the <PoolListFile>. This cell assumes the pool structure describes above. If inconsitencies are detected the PoolCell will exit, and the pool name is not registered centrally which means the pool is not available.

Defining pools before the Domain is started

As long as the Domain is not running the <PoolListFile> may bw adjusted to the local needs. It will be scanned as soon as the Domain comes up.

Defining pools while the Domain is running

If changes need to be made on an active Domain the following steps are required : Pools, created on the flight, need to be added into the <PoolListFile> as well, to make them permanent.

Customizing a pool

Each individual PoolCells has to be configured separately. The following Pool Parameters need to be set for a PoolCell before is can perform reasonable work.

This chapter describes the commands to configure a PoolCell. All changes become active immediatly after the command has been issued, but they are not yet permanent. A restart of the Cell will restore the old configuration. The update command makes changes permanent .

update -force -perm

Maximum disk diskspace

set max diskspace <areaInBytes>
Defines the maximum space, the pool is allowed to occupy.

StorageClass attraction

In order to accept storage requests from clients or from the connected HSM's the pool needs information about which <HSM Instances> and which <storageClasses> within the particular HSM it is responsible for. Together with this pair (HSM/storageClass) a preference needs to be defined to allow a weighting of requests. The preference may be different for reads and writes.
The <storageClass> is a term which is defined by the related HSM and it is used by the dCache in a transient manner. The <storageClass> is the string representation of an HSM dependent organizational unit which is not of interrest for the dCache except for its name.
  define class <HSM> <storageClass>
        [-readpref=<preferenceForReads>]
        [-writepref=<preferenceForWrite>]
        [-pending=<maxNumberOfPendingFile>]
        [-expire=<maxNumberOfSeconds>]

<HSM>

Is the name of the HSM, this <storageClass> belongs to. The name is treated case insensitive. If not specified otherwise HSM is the brandname of the HSM which currently can be enstore, OSM and Eurostore. The brandname can be overwritten in the related Pnfs subdirectory by specifying a different name in the <hsmInstance> tag. This allows to have different HSMs of the same type running concurrently.

<storageClass>

Is the name of an HSM dependent organizational unit. The way, the storageClass string has to be constructed is defined in the StorageInfoExtractor pluggin of the corresponding HSM. For OSM and Enstore the structure is
StorageSystemStorageClass
OSM <Store>:<StorageGroup>
Enstore <storageGroup>.<fileFamily>

-readpref/-writepref=<preference>

Defines the attraction for the specified <HSM>-<storageClass> pair. The number must be integer positive non-zero if attraction for the particular pair should take effect. The larger the number, the stronger the attraction. The I/O direction is seen from the clients perspective.

-pending=<maxPendingFiles>

Defines the maximum number of files waiting to be flushed to the HSM. If this number is reached or if the age of the oldest file in the set exceeds the maximum expiration time, all files withing the HSM storageClass pair are flushed to the HSM.

-expire=<maxNumberOfSeconds>

Defines the age for the oldest file within the set of files related to this HSM storageClass pair. If this time exceeds the -expire time or if the number of these files exceed the -pending parameter all files of this set are flushed to the HSM.

Attaching HSMs to the Pool

Currently HSM are attached to Pools by scripts or executables which are called by the PoolCell as soon as data has to be written to an HSM or read from an HSM. The path of the executable and possible parameters must be defined.
   hsm set <hsmBrand> -command=<fullCommandPath>
   hsm set <hsmBrand> -pnfs=<fullPnfsMountpoint>
   hsm set <hsmBrand> -<key>=<value>
The first variant of the hsm command is mandatory to define the HSM and to allow access to this HSM with the help of the script or executable specifed.
The second version is agreed on to have a standard way to let the command learn where to find the pnfsMountpoint. The last version of the command allows to add arbitrary key value pairs which are forwarded to the actual command call, which will look like
<fullCommandPath> put/get <pnfsId> <localFileName> \
                        -si=<storageInformation> \
                        -pnfs=<fullPnfsMountpoint> \
                        -<key>=<value> \
                        more key values ... 
The <storageInformation> is private data from the HSM StorageInfo Extractor-pluggin to the HSM. Its not used by the dCache at all. Enstore and OSM agreed on a list of key value pairs separated by a semicolon.