Release Notes : dCache 1.4.8 - 1.5.1

Content


New dCap server mover (higher bandwidth, lower CPU cost)

Using the java 1.4 nio feature we could significantly increase the transfer rate and/or reduce the CPU load for transfers using the dCap protocol.

In addition, transfers can be stopped in case the clients appears dead.

The nio mover allows an existing file to be opened in r/w mode. (The rest of the system doesn't allow this yet)

The following changes have to be applied to the pool.batch file in order to activate the new mover code:

#
#    T h e   P o o l s
#
define context MoverMap endDefine
   movermap define DCap-3  diskCacheV111.movers.DCapProtocol_3_nio
endDefine
#
define context startPools endDefine
  create diskCacheV111.pools.MultiProtocolPool2 ${0} \
       "!MoverMap ${1} \
        -version=4 \
        -${2} -${3} -${4} -${5} -${6} -${7}"
endDefine


dCap Protocol

send and receive buffer sizes

With dCache 1.4.8 and dCap library version 1.2.24, client and server allow to specify the send/recv buffer sizes.

Mechanism

The default and the maximum send and receive buffers sizes are configured with the start of the MultiProtocolPool2. defaultSend/RecvBufferSize, maxSend/RecvBufferSize. The dCap API allows to specify these values for both, the client and the server. While the client library tries to set the chosen values as specified, the server doens't allow to exceed the defined maxSend/RecvBufferSize.
Server Configuration
OptionCell ContextDescription
defaultSendBufferSize dCap3-defaultSendBufferSize Server default send buffer size
defaultRecvBufferSize dCap3-defaultRecvBufferSize Server default recv buffer size
maxSendBufferSize dCap3-maxSendBufferSize Server maximum send buffer size
maxRecvBufferSize dCap3-maxRecvBufferSize Server maximum recv buffer size
The context values are checked for each connection.
API calls
API callDefault Value
dc_setTCPSendBuffer( int newSize ) 256K
dc_setTCPReceiveBuffer( int newSize ) 256K
dccp Application
BufferOptionDefault Value
Send Buffer Size -s 256K
Receive Buffer -r 256K

Preload support of open64

Starting with dCap version-1-2-25 we support preload for the open64 call as well. As a consequence, at least for Linux, most (non-static) system tools are working together with the preload library. e.g. cp, tar, less... . To improve performance in conjunction with the preload library one may use the DCACHE_RAHEAD and DCACHE_RA_BUFFER environment variables to tune read ahead behaviour.

Simplified dcap: syntax

In addition of supporting the generic URL syntax :
       dcap://<serverHost>[:<portNumber>]/pnfs/<Domain>/<FileName>
      
we now allow to omit the serverhost and portnumber part. In this case the domain part (second entry in the path) is prepended by dcache and used as door host name. The default port number is 22125.

Example

        dcap:///pnfs/desy.de/zeus/users/patrick ... 
              is identical to
        dcap://dcache.desy.de:22125/pnfs/desy.de/zeus/users/patrick/...
      
At DESY we have assigned a virtual IP number to dcache.desy.de which is mapped by address translation to a set of hosts and ports in round robin manner. (F5)

Support of stat (fstat) by dcap:/// filename syntax

We now support the dc_stat and dc_fstat library calls (plus PRELAOD) with the dcap:// syntax as well.

Checksum support for write

In case, a file is written without intervening seeks or reads, the dCap library calculates an adler32 checksum and sends it to the server together with the close call. The server stores this checksum in pnfs. Its delivered to the HSM flush and restore operation.

Improved request distribution (anti clumping)

With 1.4.8 the dCache changes the way requests are distributed among server pools. These changes should enable the dCache to properly react on bunches of requests coming in within seconds. To be backward compatible, the software stays with the old methods as long as the options below are not explicitly given to the specified cells.
CellOptionValue
DCapDoorpoolProxyPoolManager
PoolManagercostModule diskCacheV111.poolManager.CostModuleV1
MultiProtocolPool2version4

PoolManager : improved 'retry' command

NAME

rc retry - let the system retry a restore request
SYNOPSIS

rc retry <pnfsId> [-update-si]
rc retry * [-force-all] [-update-si]
OPTIONS


Large File Store support (no HSM backend)

dCache version 1.4.8 provides an experimental version of the so called Large File Store capability which allows to store files which are not scheduled for beeing stored on the backend HSM. The dCache system may be operated in hybrid mode, supporting HSM connected and non HSM connected pools, while a pool itself can only be either of both. If configured as LFS, a pool may chose to declare newly incoming files as precious or as volatile. If precious, files are only removed in case the file entry disappears from pnfs. If volatile, files are removed if space is running short. To support LFS properly, the LazyCleaner must be replaced by the CleanerV2 cell (See below).
OptionValueDescription
lfsnone The pool assumes to be connected to an HSM.
preciuos The pool is not connected to an HSM and declares incoming files as precious.
volatile The pool is not connected to an HSM and declares incoming files as volatile. (Supported with 1.5.2)
Pool behaviour on LargeFileStore files Concerning the LargeFileStore a Pool may be configured in either of the following modes : Persistent Cleaner
The Cleaner module is responsible for removing deleted files from pools. For an HSM backended dCache this task is insofar not important as deleted files can't be accessed by clients and consequently will disappear from pools anyway through the regular aging process. This is different for LFS pools where files stay precious for their lifetime and therefor must be explicitly removed after they have been deleted from pnfs. An additional complexity arises though the fact that pools might be down at the time a files is deleted from pnfs, so that the Cleaner Module is not able to perform the actual remove process. The cells/CleanerV2 will be able to store those remove requests until they could be actually performed.
   create diskCacheV111.cells.CleanerV2 cleaner \
       "default -trash=${trash}        \
                -db=${config}/cleaner  \
                -refresh=300"
Make sure to create the ???/config/cleaner directory before the cleaner is started.
R/W allowed
Having an HSM as storage backend we can't allow a file to be modified after it has been written, mainly because the file could already be on tape. Without an HSM backend this limitation would be no longer necessary. Nevertheless we will stay with it for awhile.
Moving LFS files into HSM backended dCache pools
We are planing to allow a file to be moved from a directory assigned to a LargeFileStore pool to a directory backended by an HSM which then would store the file to the HSM. This as well is a mid term plan mainly because pnfs has to be modified to catch the move operation.
Volatile LFS
In case a non volatile pool removes a file from its repository, the file has to be removed from the filesystem as well. Because the pool only know the I-node (pnfsid) of the file, the filesystem has to remove the fileentry by i-node. For regulare filesystem this is a non standard operation. Possibly the pnfs code has to be changed to support this remove operation.

Improved Pool Inventory behaviour (CacheRepository2)

When a pool is starting up, it may find its repository in nearly any condition. Control files could be missing, the expected data filesize might not match the filesize found on disk, a.s.o. With 1-4-8 the behaviour of the inventory scanner has been improved as described here.

Pool Repository Disk Layout
LocationNameDescriptionRemarks
<poolBase>/data/<pnfsId> Datafile Original Raw Datafile FS lastModified == dCache lastAccessed
<poolBase>/control/<pnfsId> Status File cached,precious,receiving.client,receiving.store FS lastModified == dCache cached
<poolBase>/control/SI-<pnfsId> SI File Storage Information (HSM specific)  

Inventory Procedure

The inventory procedure loops over all files in the <poolBase>/data directory which match the pnfsId format. Nonematching files are reported in the logfile but are ignored. The pnfsid is added to the repository of the pool if the following conditions are true.
  • for each <poolBase>/data/<pnfsId> file, a <poolBase>/control/<pnfsId> must exist, must be readable and must contain one of the following keywords : cached,precious,receiving.client,receiving.store
  • for each <poolBase>/data/<pnfsId> file, a <poolBase>/control/SI-<pnfsId> must exist, must be readable and must be deserializable.
  • the pnfs filesize stored in the SI file must match the filesize of the datafile.
In case, one of the obove conditions is false, and the pool option -recover-control is not set, the inventory process is disrupted and the pool is disabled. If -recover-control is set and one of the requirements above is not satisfied, a recovery procedure for the particular pfns file is initiated.

Result of the recovery procedure and its actions
ResultAction
No reply from PnfsManagerRetry infinitly
Pnfs File not foundRemove file from repository
SI filesize != datafile filesizeError Condition

The behaviour on an Error Condition is determined by the -recover-anyway pool option. If set, the pool is starting up, and the related pnfs file is set BAD. If the option is not set the pool is disabled.

BAD files are reported by the rep ls -l=e pool command.


dCapDoor : 'getPool' retry machanism

The dcache-1-4-8-beta-2 DCapDoorInterpreter is able to automatically retry a 'getStorageInfo' and 'getPool' request. While the 'getStorageInfo' retry period is set to 60 seconds, the 'getPool' request timeout is configurable by an DCapDoor option and by the poolRetry and dCapDoor-poolRetry context variables. Variable dCapDoor-poolRetry overwrites variablepoolRetry overwrites option poolRetry. BTW: checkout m-DCapDoorInterpreterV3 for a list of all DCapDoor options.

CellOption
Context
value
LoginManagerkeepAlivekeep alive trigger in seconds
LoginManager poolRetry
dCapDoor-poolRetry
retry period for pool requests

Remark : This new feature is only useful in case the PnfsManager or the PoolManager had been out of service for awhile. It won't help if a file can't be fetched from the HSM because the fetch request is handled by the PoolManager. So additional retries will be added to the PoolManager queue for that particual file and if the request is supended due to permanant HSM errors, only the client count for the file will be inceasing.


New cost calculation scheme.

Generic Considerations

Starting with 1.5.1, all costs are calculated solely within the PoolManager and no longer in the pools itself. This is essential to support request anti clumping. Pools frequently are sending all necessary information about space usage and queue length to the PoolManagers CostModule. The cost module can be regarded as a cache for all this information. So its no longer necessary to send the 'get cost' requests to the pools for each client request. The CostModule interpolates the expected costs until a new precise information package is coming from the pools.

In addition, the PoolCellInfo now is just the CellInfo plus the PoolCostInfo. The WebCollectorV3 has been modified accordingly. The WebCollectorV0 has been removed.

Pool 2 Pool transfer client

The pool 2 pool client transfers are now added to the total cost of a pool and they are reported to the 'pool request' web page as well.

Although client pool 2 pool transfers seem to be handled within regular queues, they are not. Queuing both, p2p server and queue requests, has a (even though small) probability of deadlocks. So, p2p client requests are never actually queued but they start immediately after they have been requested. The p2p client 'max number of transfers' is only used to calculate the costs for those transfers.

Customizable Cost Calculation

Calculating the cost for a data transfer is still done in two steps. First, the CostModule merges all information about space and transfer queues of the pools to calucate the performance and space costs separately. Second, depending on the type of client request, these two numbers are merged by linear combination to build the actual cost for each pool. The first step has been isolated within a separate loadable class. The second step is still done somewhere in the PoolManager code. The next development step will be to add the second calculation as well to the customizable (loadable) class.

The default cost calcuated can be overwritten by the PoolManager 'create option' :

           -costCalculator=<newCostCalculator>
           The default is CostCalculationV5
      
The goal is to compare different pool costs without intermediatly calculating scalar values for performance and space.

LRU time used for calculating space cost.

In order to get some kind of global fair share aging of cached files, the cost, to remove a file from cache, now depends on the 'last accessed' timestamp of the LRU file. So the new space cost calculation is done as follows :
Definitionvalue
fsfree pool space (not yet occupied by data)
GAPConfigured per Pool
LRUage of L(east) R(ecent) U(sed) file (in seconds)
m60
cbreakeven * 7 * 24 * 3600

ConditionCost
fs < GAPFilesize / fs
fs >= GAPLRU < m1 + c / m
LRU >= m1 + c / LRU

The value c takes care that cost values keep within a reasonable interval for very young LRU files. (pathological) The breakeven value is equivalent to ( cost - 1 ) for the LRU file of the age of one week. In other words, if breakeven is set to 0.1, the cost will be 1.1 if the LRU file has not been used for one week.

REMARK

To be backward compatible, this feature is only enabled if the breakeven (pool) value is below 1.0.

      pool (admin) > set breakeven 0.1
      
This has to be done on all connected pools. Values may differ from pool to pool. The higher the value will be, the later files on this pool will be declared 'removable'.


Pool 2 Pool to same host suppression.

Under some conditions, it might not be desirable to allow a pool 2 pool transfer within the same host. We now allow to suppress those transfers. Two changes are need in order to enable this feature.

A The pool has to learn on which host it is located : So the corresponding xxx.poollist file needs the additional tag per pool.

   <poolName> <AbsolutePathToPool>  tag.hostname=<hostname>  <rest>
   

B The pool manager cell has to be configured accordingly.

   #
   #  the pool manager shouldn't care whether or not p2p
   #  source and destination pool are on the same host.
   #
   rc set sameHostCopy notchecked 
   #
   #  the pool manager should, under no circumstances,
   #  initiate a p2p transfer between two pools on the 
   #  same host.
   #
   rc set sameHostCopy never
   #
   #  the pool manager preferres pools on different hosts
   #  but may as well chose pools on the same host if 
   #  necessary. (DEFAULT)
   # 
   rc set sameHostCopy besteffort
   #
   

New Pool 2 Pool costcut parameters and behaviour

Pre 1-5-2 dCache versions tend to run into pool 2 pool transfer orgies if p2p on cost was enabled and the majority of pools had the costcut exceeded. This has been fixed with 1.5.2. Moreover, additional customizable parameters have been added to allow enhanced steering of pool transfer behaviour on high costs. For details see Pool2PoolSteering
The PoolManager.conf is backward compatible in the sense that pre 1.5.2 PoolManager.conf files are understood by the new PoolManager, but as soon as the new PoolManager rewrites the content of PoolManager.conf (save command), it can't be used by pre 1.5.2 PoolManagers.

Related Command Summary

rc set p2p on|off|oncost
rc set fallback oncost on|off
rc set stage oncost on|off
rc set slope <p2p source/destination slope>
rc set max copies <maxNumberOfP2pCopies>
rc set sameHostCopy never|besteffort|notchecked

set costcuts [-<option>=<value> ... ]

   Options  |  Default  |  Description
 -------------------------------------------------------------------
     idle   |   0.0     |  below 'idle' : 'reduce duplicate' mode
     p2p    |   0.0     |  above : start pool to pool mode
     alert  |   0.0     |  stop pool 2 pool mode, start stage only mode
     halt   |   0.0     |  suspend system
   fallback |   0.0     |  Allow fallback in Permission matrix on high load

     A value of zero disabled the corresponding value

#
#   DEPRECATED
#
set costcut <minCut>|* [<maxCut>] # DEPRECATED
#
#  use 'set costcuts -idle=... -p2p=...' instead.
#

Limited checksum support.

Whenever the dCap library sends a checksum together with the close operation of a newly created file, the server stores this information in the pnfs database (level2).
   <format version>,<statistics>
   :<options>,c=1:<adler32(hex)>;
   <Pool Locations>
   
This information is part of the StorageInfo and is delivered to the HSM flush and restore operations to allow checking the consistency of the dataset.

Large file support.

In order to be independed of the pnfs (nfs2) implementation, we store the size of the file in the pnfs database (level2).
   <format version>,<statistics>
   :<options>,l=<filesize>;
   <Pool Locations>
   
If present, the storage info takes the filesize from this value and no longer from the filesize in pnfs.

Pool Modification Event from PoolManager

The PoolManager can be enabled to send PoolStatusChangedMessage to and arbitrary pool. Those events are triggered whenever a pool is started, stopped, enabled, disabled and if it appears dead for whatever reason. The following options need to be specified in order to enable this feature.
   -poolStatusRelay=<destinationCellName> -watchdog
The getPoolMode, getDetailCode() and getDetailMessage are only valid if getMode() == DOWN. The PoolStatusChangedMessage holds the following information.
MethodTypeValueDescription
getMode()intDOWN | RESTARTCurrent pool state
getPoolMode()PoolV2ModePool Mode Bits disable,strict ...
getDetailCode()intany number > 0 reason for pool being disabled
getDetailMessage()Stringany string or null reason for pool being disabled
The following messages are sent on the following occation :
    #
    # Pool starts up :
    #
    dcache0-0;status=DOWN;mode=disabled(fetch,store,stage,p2p-client,p2p-server,);code=(1,Initiallizing)
    #
    # pool finished 'repository scan'
    #
    dcache0-0;status=RESTART;code=(0)
    #
    # pool is disabled by commandline interface or PoolModifyModeMessage
    #
    #   pool disable -rdonly 111 "Testing rdonly"
    #
    dcache0-0;status=DOWN;mode=disabled(store,stage,p2p-client,);code=(111,Testing rdonly)
    #
    # pool is enabled by commandline interface or PoolModifyModeMessage
    #
    #   pool enable
    #
    dcache0-0;status=RESTART;code=(0)
    #
    # pool is shutdown
    #
    dcache0-0;status=DOWN;mode=disabled(dead,);code=(666,Shutdown)
    dcache0-0;status=DOWN;mode=disabled(dead,);code=(666,PingThread terminated)
    #
    # pool is dead (network disconnect, kill -9, host stucks ... )
    # (takes about 10 minutes)
    #
    dcache0-0;status=DOWN;code=(666,DEAD)
    #    

Fine grained 'pool disable'.

Pools may now be disablee in various modes by the pool command set as well as by the PoolModifyModeMessage. The basic disable, and with that all its variants, will prevent the pool from sending Up messages to the PoolManager and from replying on PoolCheckable messages. Consequently this pool doesn't exist any more for any automatic pool selection done by the PoolManager. This does of course not prevent other appliations, e.g. the ReplicaManager, from sending P2P messages to disabled pools. Depending on the type of 'disable' the pool will do as requested or will send an error to the source of the requestor.

Disable Modes

ModeCommand SetPoolV2Mode optionsPurpose
PLAIN-DISABLED All operations are allowed (except PoolCheckable)
strict-strictDISABLED_STRICT No operations are allowed
read only-rdonlyDISABLED_RDONLY All write operations are disallowed
stage-stageDISABLED_STAGE Staging (hsm -> pool) is disallowed
store-storeDISABLED_STORE Storing (client -> pool) is disallowed
fetch-fetchDISABLED_FETCH Fetching (pool -> client) is disallowed
Pool 2 pool client-p2p-clientDISABLED_P2P_CLIENT Pool can't be used as client for p2p
Options may be combined.

New Command Set

     pool disable [options] [  []]
      OPTIONS :
          -fetch    #  disallows fetch (transfer to client)
          -stage    #  disallows staging (from HSM)
          -store    #  disallows store (transfer from client)
          -p2p-client
          -rdonly   #  := store,stage,p2p-client
          -strict   #  := disallows everything
          
     DEPRICATED :
     
        The pool command 
        
          "pool disablemode strict|fuzzy"
          
        is no longer valid and is silenty ignored.
        
   

Message passing interface

     EXAMPLE :
     
      int  modeValue  = PoolV2Mode.DISABLE_STAGE |
                        PoolV2Mode.DISABLE_STORE |
                        PoolV2Mode.DISABLE_P2P_CLIENT ;
      
      PoolV2Mode mode = new PoolV2Mode( modeValue ) ;
      
      PoolModifyModeMessage msg = new PoolModifyModeMessage( "myPool" , mode ) ;
      
      
   

This and That


Patrick Fuhrmann patrick.fuhrmann@desy.de (Last Updated $Date: 2005/04/29 12:38:57 $)