This document is intended to describe the tasks and the assumed behavior of the dCache PoolManager. It will start with the abstract behavior of a PoolManager and will end with the most recent implementation, the PoolManager(Version III) together with its PoolSelectionUnit.
The PoolManagers main duty is to find appropriate locations for files to be intermediately stored either when coming from an backend HSM, to be delivered to a client, or from a client to be delivered to the backend HSM.
Been more specific, the PoolManager may get two types of requests :Select Read Pool
The Select Read Pool request assumes that the PM makes a particular file available for being moved to the client. In case, the PM determines that the requested file is already stored in a Pool it returns the name of this pool after performing some double checking for its existence. In case the requested file seems not to be available in a pool, the PM tries to find a pool which matches some configurable constrains before instructing this pool to fetch the file from the HSM. If the pool succeeds in doing so, the PM returns the name of this pool.Select Write PoolThe Select Write Pool request is in so far much simpler because the PM only has to find a pool which is willing and capable of taking the data from the client and to subsequently move the file to the HSM.
In order to select a reasonable pool for storing the required data, the PoolManager may collect data from various places.The incoming request provides the following information which may influence the decision of the PM.The Pool
- The Data Direction determines whether the file already exists and needs to be read or if it is going to be created. To be precise, the expression Data Direction is no longer correct insofar a newly created file can be written as well as being read.
- The HSM Type and HSM Name which allows to support multiple different types and/or instances of HSMs
- The StorageClass is the organizational unit of the the underlying HSM. For DESY it's the combination of the OSM store and the OSM storage group (<store>:<storageGroup>) and at FERMI the ENSTORE file family is used.
- The Clients Host Name resp. the name of the interface, the client whats to be connected to, to do the data transfer.
- The dCache Class is an arbitrary string stored in a tag of a Pnfs subdirectory.
The dCache pool provides three types of information :The PnfsManager
- The Pool Up message is sent frequently and unrequested from the pool to the PM as some kind of keep alive message. The PM is assumed not to chose this pool for any operation if this message hasn't been received for several minutes. As most 'keep alive' protocols, this as well is nor reliable in the sense that we know exactly whether a pool is up or not. It can only be regarded as a hint. The Pool Down message indicates for sure that the particular pool is down.
- The Cost-Check message is a request to a Pool. The reply contains the cost for the pool in terms of space and load to perform the requested operation. The request may or may not contain the size of the file to store (La vie est drole). The Pool is free not to reply at all.
- The Check-File messages is a request to a Pool which checks for the existence of the specified file. In addition, the Pool is assumed to disallow its garbage collector to remove the requested file for a couple of minutes.
The Pnfs Filesystem holds information about the assumed location of each file. This information is distributed on request by the PnfsManager. The PM is assumed to query this information to determine if a requested file is already staged somewhere.Internal (private) PM DatabaseThe abstract definition of the PM doesn't specify if, and if yes, which additional configuration the PM may use to perform its decisions. It neither specifies HOW the PM can be configured or how this configuration may be made available for web applications or line command interpreters. So it, the PM, may chose whatever method appears to be suitable for the size and complexity of the related dCache installation. Later in this document the PoolManagerV3 implementation is discussed which should cover most environments.
The PoolManager Version III is an implementation of the PoolManager concept which is intended to cover medium complex environments. Although it provides a full fledged PM implementation, it defines an API for the static configuration part, the PoolSelectionUnit, allowing it to be changed if needed. The default PoolSelectionUnit implementation is the PoolSelectionUnitV1 described below.
PoolManagerV3 implements PoolManager concept . PoolSelection Interface (PSI) . . . PoolSelectionV1 implements PSI . . . . . . The PoolSelection Interface
The PoolSelection Interface allows to encapsulate the initial (static) selection of pools out of the full set of available pools. The selection process takes the five information units provided by the incoming request. In addition it selects the pools according to the selection class which can have the values :As a result, the Selection Unit returns a list of pool lists in descending priority order.
- write : select pools which are allowed to store files which are intended to go into the HSM.
- read : select pools which are allowed to deliver files which are already staged.
- cache : select pools which are allowed to stage files if not yet done.
For the given parameters, pool-a and pool-b have the highest priority followed by pool-c and pool-d and so forth. The absolute value of a priority is of no value. Only the relative relation counts.
Priority Pool Pool ... 200 pool-a pool-b ... 10 pool-c pool-d ... 3 pool-e pool-f ... The actual selection mechanism of the PoolSelectionUnitV1 is described elsewhere.
PM VIII concept for FETCH
The following steps are performed to fulfill the FETCH request. The sequence of these steps and the way it's done is not configurable. Solely the implementation of the PoolSelecionUnit may be chosen.Step 1 : Query to the PnfsManager
In order to find out if the requested file is already stored in one of the pools, the PMV3 sends a 'getCacheLocation' request to the PnfsManager which replies with a list of pools which are assumed to hold the file. In case the reply doesn't contain any active pool, the PM proceeds with Step 4.Step 2 : Static information queries (PoolSelectionUnit)
The PMV3 calls its PoolSelectionUnit interface, providing the required information from the request. The selection class is set to Read because the PM has to check if the pool list resulting from Step 1 contains a pool which is allowed to deliver the file. The implementation of the PoolSelecionUnit does whatever it needs to do and returns a list of pool lists in descending priority order as described in the The PoolSelection Interface.Step 3 : Merging Step 1 & 2
PM steps over the PoolSelection output (Step 2) from highest to lowest priority and builds intersections with the list resulting from Step 1. As soon as this intersection is not empty, this set of pools is checked for the file. If none of the pools return a positive answer the pool selection output loop continues. If multiple pools appear to host the file, one of them is chosen at random and sent back to the source of the request. If the loop ends without a result the PM proceeds with Step 4.Step 4 : Instructing a pool to fetch the file
We end up here is case the PM didn't find the file in any of the active pools.
The PM reissues the request to the Pool Selection Unit, this time using the Selection Class Cache because it needs to get pools which are allowed to stage files. Again the PM loops over the newly created priority list from highest to lowest. For each row (horizontal) it sends "cost check" requests to the listed pools. The pool with the lowest cost is chosen. In case none of the pools replies at all or none of them is willing to fetch the file, the loop continues.
Remark : The loop stops as soon as a row contains a pool which is willing to stage the file. This means that the PM doesn't check if the next row possibly contains a pool which would present an even lower cost. So : static configuration overwrites dynamic behavior.Step 5 : No Pool found
If none of the steps above could come to a positive result, the PM replies with either :error 19 "No read pools available for <class>@<hsm>" or error 20 "No reply from cost-check for <class>@<hsm>"PM VIII concept for STORE
The PMV3 calls its PoolSelectionUnit interface, providing the required information from the request. The selection class is set to Write because the PM need to find a pool which is allowed to store precious files.
The PM steps over the PoolSelection output from highest to lowest priority, sending 'check cost' requests to all pools in a row (horizontal). As soon as at least one of the pools replies positively, the loop stops and one of the resulting pools is chosen at random. If no pool results from the loop, the PM replies witherror 19 "No read pools available for <class>@<hsm>" or error 20 "No reply from cost-check for <class>@<hsm>"
The PoolManager V3 allows to specify an arbitrary class to do the static pool selection as long as this class implements the PoolSelectionUnit interface. The default implementation is the PoolSelectionV1 which should be sufficiently sophisticated to support a medium size installation. The PSV1 has a notation of the following objects :Selection Object
- The Storage Unit reflects the Storage Class of the underlying HSM together with the name of this HSM or HSM instance. The name is constructed as follows :
<StorageClass>@<hsmName>The following wildcards are recognized :
*@<hsmName> Any Storage Unit of the specified HSM *@* Any Storage Unit - The Net Unit represents a Hostname (IPnumber) or a whole subnet with arbitrary subnet mask.
- The Unit Group is a set of Storage Units or Net Units. Units may be members of more then on Unit Group but Unit Groups can not themselves be member of a Unit Group. A Unit which is not member of a Unit Group is of no use. Although a Unit Group may mix Storage Units and Net Units this would result in unnecessary complexity.
- The Pool is the representation of a storage pool.
- The Pool Group can hold one of more pools. A pool may be member of several Pool Groups but a Pool Group can not itself be a member of a Pool Group.
- The Link assigns Storage/Net Unit Groups to Pools or Pool Groups. In addition it gets a priority (positive integer or zero) for each of the three Selection Classes. Though a link may be used in different ways, it's recommended to let it point from exactly one Storage Unit Group and one Net Unit Group to one Pool Group.
Storage
Unit
Group<- Link -> Pool Net
Unit
Group<- readpref=..
writepref=..
cachepref=... Group Selection Process
Incoming Parameters
- Storage Class and hsm
- IP number of client
- Selection Class : cache, read, write
Resolving units
Get all Storage Unit Groups which have the Storage Class as its member and get all Net Unit Groups which match the IP number of the client.Resolving Links
Get all links pointing exactly to these Storage Units Groups, Net Unit Groups pairs.Sorting Links
Sort the links according to the Selection Class. Highest top.Resolve Pool Groups
Group links with identical priorities concerning the Storage Class. Now resolve the Pool Groups the links are pointing to, into the actual Pools. This procedure results in a list of list of pools, sorted according to the priority of the specified Selection ClassConfiguration Example : Minimal
The highlighted items are arbitrary names or values.# # define all known pools first # (not defined pools will not be recognized) # psu create pool pool-1 ... psu create pool pool-x # # create read and write pool groups # psu create pgroup write-pools psu create pgroup read-pools # # add the pools to the pgroups # psu addto pgroup write-pools pool-1 psu addto pgroup write-pools pool-2 ... psu addto pgroup read-pools pool-a psu addto pgroup read-pools pool-b ... # # define the wildcard storage and net unit # psu create unit -store *@* psu create unit -net 0.0.0.0/0.0.0.0 # # define the wildcard unit groups # psu create ugroup world-net psu create ugroup all-stores # psu addto ugroup world-net 0.0.0.0/0.0.0.0 psu addto ugroup all-stores *@* # # create the read and write links # psu create link write-link world-net all-stores psu create link read-link world-nett all-stores # # let them point to the read/write pgroups # psu add link write-link write-pools psu add link read-link read-pools # # set the preferences # psu set link write-link -writepref=10 -readpref=1 -cachepref=0 psu set link read-link -writepref=0 -readpref=10 -cachepref=10