The dCache PoolManager

Scope of this document

This document is intended to describe the tasks and the assumed behavior of the dCache PoolManager. It will start with the abstract behavior of a PoolManager and will end with the most recent implementation, the PoolManager(Version III) together with its PoolSelectionUnit.

Task Description

The PoolManagers main duty is to find appropriate locations for files to be intermediately stored either when coming from an backend HSM, to be delivered to a client, or from a client to be delivered to the backend HSM.
Been more specific, the PoolManager may get two types of requests :

Select Read Pool

The Select Read Pool request assumes that the PM makes a particular file available for being moved to the client. In case, the PM determines that the requested file is already stored in a Pool it returns the name of this pool after performing some double checking for its existence. In case the requested file seems not to be available in a pool, the PM tries to find a pool which matches some configurable constrains before instructing this pool to fetch the file from the HSM. If the pool succeeds in doing so, the PM returns the name of this pool.
Select Write Pool
The Select Write Pool request is in so far much simpler because the PM only has to find a pool which is willing and capable of taking the data from the client and to subsequently move the file to the HSM.

The PoolManagers sources of information

In order to select a reasonable pool for storing the required data, the PoolManager may collect data from various places.

The Request

The incoming request provides the following information which may influence the decision of the PM.
The Pool
The dCache pool provides three types of information :
The PnfsManager
The Pnfs Filesystem holds information about the assumed location of each file. This information is distributed on request by the PnfsManager. The PM is assumed to query this information to determine if a requested file is already staged somewhere.
Internal (private) PM Database
The abstract definition of the PM doesn't specify if, and if yes, which additional configuration the PM may use to perform its decisions. It neither specifies HOW the PM can be configured or how this configuration may be made available for web applications or line command interpreters. So it, the PM, may chose whatever method appears to be suitable for the size and complexity of the related dCache installation. Later in this document the PoolManagerV3 implementation is discussed which should cover most environments.

The PoolManager Version III implementation

The PoolManager Version III is an implementation of the PoolManager concept which is intended to cover medium complex environments. Although it provides a full fledged PM implementation, it defines an API for the static configuration part, the PoolSelectionUnit, allowing it to be changed if needed. The default PoolSelectionUnit implementation is the PoolSelectionUnitV1 described below.

PoolManagerV3 implements PoolManager concept
. PoolSelection Interface (PSI) .
. . PoolSelectionV1 implements PSI . .
. . .
.

The PoolSelection Interface

The PoolSelection Interface allows to encapsulate the initial (static) selection of pools out of the full set of available pools. The selection process takes the five information units provided by the incoming request. In addition it selects the pools according to the selection class which can have the values : As a result, the Selection Unit returns a list of pool lists in descending priority order.
Priority PoolPool ...
200 pool-a pool-b ...
10 pool-c pool-d ...
3 pool-e pool-f ...
For the given parameters, pool-a and pool-b have the highest priority followed by pool-c and pool-d and so forth. The absolute value of a priority is of no value. Only the relative relation counts.

The actual selection mechanism of the PoolSelectionUnitV1 is described elsewhere.

PM VIII concept for FETCH

The following steps are performed to fulfill the FETCH request. The sequence of these steps and the way it's done is not configurable. Solely the implementation of the PoolSelecionUnit may be chosen.

Step 1 : Query to the PnfsManager

In order to find out if the requested file is already stored in one of the pools, the PMV3 sends a 'getCacheLocation' request to the PnfsManager which replies with a list of pools which are assumed to hold the file. In case the reply doesn't contain any active pool, the PM proceeds with Step 4.

Step 2 : Static information queries (PoolSelectionUnit)

The PMV3 calls its PoolSelectionUnit interface, providing the required information from the request. The selection class is set to Read because the PM has to check if the pool list resulting from Step 1 contains a pool which is allowed to deliver the file. The implementation of the PoolSelecionUnit does whatever it needs to do and returns a list of pool lists in descending priority order as described in the The PoolSelection Interface.

Step 3 : Merging Step 1 & 2

PM steps over the PoolSelection output (Step 2) from highest to lowest priority and builds intersections with the list resulting from Step 1. As soon as this intersection is not empty, this set of pools is checked for the file. If none of the pools return a positive answer the pool selection output loop continues. If multiple pools appear to host the file, one of them is chosen at random and sent back to the source of the request. If the loop ends without a result the PM proceeds with Step 4.

Step 4 : Instructing a pool to fetch the file

We end up here is case the PM didn't find the file in any of the active pools.
The PM reissues the request to the Pool Selection Unit, this time using the Selection Class Cache because it needs to get pools which are allowed to stage files. Again the PM loops over the newly created priority list from highest to lowest. For each row (horizontal) it sends "cost check" requests to the listed pools. The pool with the lowest cost is chosen. In case none of the pools replies at all or none of them is willing to fetch the file, the loop continues.
Remark : The loop stops as soon as a row contains a pool which is willing to stage the file. This means that the PM doesn't check if the next row possibly contains a pool which would present an even lower cost. So : static configuration overwrites dynamic behavior.

Step 5 : No Pool found

If none of the steps above could come to a positive result, the PM replies with either :
error 19 "No read pools available for <class>@<hsm>"
   or
error 20 "No reply from cost-check for <class>@<hsm>"

PM VIII concept for STORE

The PMV3 calls its PoolSelectionUnit interface, providing the required information from the request. The selection class is set to Write because the PM need to find a pool which is allowed to store precious files.
The PM steps over the PoolSelection output from highest to lowest priority, sending 'check cost' requests to all pools in a row (horizontal). As soon as at least one of the pools replies positively, the loop stops and one of the resulting pools is chosen at random. If no pool results from the loop, the PM replies with
error 19 "No read pools available for <class>@<hsm>"
   or
error 20 "No reply from cost-check for <class>@<hsm>"

The PoolSelection V1 implementation

The PoolManager V3 allows to specify an arbitrary class to do the static pool selection as long as this class implements the PoolSelectionUnit interface. The default implementation is the PoolSelectionV1 which should be sufficiently sophisticated to support a medium size installation. The PSV1 has a notation of the following objects :

Selection Object

Storage
Unit
Group
<- Link -> Pool
Net
Unit
Group
<- readpref=..
writepref=..
cachepref=..
. Group

Selection Process

Incoming Parameters

Resolving units

Get all Storage Unit Groups which have the Storage Class as its member and get all Net Unit Groups which match the IP number of the client.

Resolving Links

Get all links pointing exactly to these Storage Units Groups, Net Unit Groups pairs.

Sorting Links

Sort the links according to the Selection Class. Highest top.

Resolve Pool Groups

Group links with identical priorities concerning the Storage Class. Now resolve the Pool Groups the links are pointing to, into the actual Pools. This procedure results in a list of list of pools, sorted according to the priority of the specified Selection Class

Configuration Example : Minimal

The highlighted items are arbitrary names or values.
#
#  define all known pools first
#  (not defined pools will not be recognized)
#
psu create pool pool-1
...
psu create pool pool-x
#
#  create read and write pool groups
#
psu create pgroup write-pools
psu create pgroup read-pools
#
# add the pools to the pgroups
#
psu addto pgroup write-pools pool-1
psu addto pgroup write-pools pool-2
...
psu addto pgroup read-pools pool-a
psu addto pgroup read-pools pool-b
...
#
#  define the wildcard storage and net unit
#
psu create unit -store *@*
psu create unit -net   0.0.0.0/0.0.0.0
#
#  define the wildcard unit groups
#
psu create ugroup world-net
psu create ugroup all-stores
#
psu addto ugroup world-net   0.0.0.0/0.0.0.0
psu addto ugroup all-stores  *@*
#
#  create the read and write links
#
psu create link write-link world-net all-stores
psu create link read-link  world-nett all-stores
#
# let them point to the read/write pgroups
#
psu add link write-link write-pools
psu add link read-link  read-pools
#
#   set the preferences
#
psu set link write-link -writepref=10 -readpref=1  -cachepref=0
psu set link read-link  -writepref=0  -readpref=10 -cachepref=10

Patrick Fuhrmann (patrick.fuhrmann@desy.de) $Date: 2005/04/29 12:38:57 $