Using the java 1.4 nio feature we could significantly increase the transfer rate and/or reduce the CPU load for transfers using the dCap protocol.In addition, transfers can be stopped in case the clients appears dead.
The nio mover allows an existing file to be opened in r/w mode. (The rest of the system doesn't allow this yet)
The following changes have to be applied to the pool.batch file in order to activate the new mover code:
# # T h e P o o l s # define context MoverMap endDefine movermap define DCap-3 diskCacheV111.movers.DCapProtocol_3_nio endDefine # define context startPools endDefine create diskCacheV111.pools.MultiProtocolPool2 ${0} \ "!MoverMap ${1} \ -version=4 \ -${2} -${3} -${4} -${5} -${6} -${7}" endDefine
send and receive buffer sizes
With dCache 1.4.8 and dCap library version 1.2.24, client and server allow to specify the send/recv buffer sizes.Mechanism
The default and the maximum send and receive buffers sizes are configured with the start of the MultiProtocolPool2. defaultSend/RecvBufferSize, maxSend/RecvBufferSize. The dCap API allows to specify these values for both, the client and the server. While the client library tries to set the chosen values as specified, the server doens't allow to exceed the defined maxSend/RecvBufferSize.Server ConfigurationAPI callsThe context values are checked for each connection.
Option Cell Context Description defaultSendBufferSize dCap3-defaultSendBufferSize Server default send buffer size defaultRecvBufferSize dCap3-defaultRecvBufferSize Server default recv buffer size maxSendBufferSize dCap3-maxSendBufferSize Server maximum send buffer size maxRecvBufferSize dCap3-maxRecvBufferSize Server maximum recv buffer size dccp Application
API call Default Value dc_setTCPSendBuffer( int newSize ) 256K dc_setTCPReceiveBuffer( int newSize ) 256K
Buffer Option Default Value Send Buffer Size -s 256K Receive Buffer -r 256K Preload support of open64
Starting with dCap version-1-2-25 we support preload for the open64 call as well. As a consequence, at least for Linux, most (non-static) system tools are working together with the preload library. e.g. cp, tar, less... . To improve performance in conjunction with the preload library one may use the DCACHE_RAHEAD and DCACHE_RA_BUFFER environment variables to tune read ahead behaviour.Simplified dcap: syntax
In addition of supporting the generic URL syntax :dcap://<serverHost>[:<portNumber>]/pnfs/<Domain>/<FileName>we now allow to omit the serverhost and portnumber part. In this case the domain part (second entry in the path) is prepended by dcache and used as door host name. The default port number is 22125.Example
dcap:///pnfs/desy.de/zeus/users/patrick ... is identical to dcap://dcache.desy.de:22125/pnfs/desy.de/zeus/users/patrick/...At DESY we have assigned a virtual IP number to dcache.desy.de which is mapped by address translation to a set of hosts and ports in round robin manner. (F5)Support of stat (fstat) by dcap:/// filename syntax
We now support the dc_stat and dc_fstat library calls (plus PRELAOD) with the dcap:// syntax as well.Checksum support for write
In case, a file is written without intervening seeks or reads, the dCap library calculates an adler32 checksum and sends it to the server together with the close call. The server stores this checksum in pnfs. Its delivered to the HSM flush and restore operation.
With 1.4.8 the dCache changes the way requests are distributed among server pools. These changes should enable the dCache to properly react on bunches of requests coming in within seconds. To be backward compatible, the software stays with the old methods as long as the options below are not explicitly given to the specified cells.
Cell Option Value DCapDoor poolProxy PoolManager PoolManager costModule diskCacheV111.poolManager.CostModuleV1 MultiProtocolPool2 version 4
NAME
rc retry - let the system retry a restore requestSYNOPSIS
rc retry <pnfsId> [-update-si]OPTIONS
rc retry * [-force-all] [-update-si]
- -update-si
The storage information used within a restore request is taken from the request itself. It may happen, for whatever reason, that the request contains an incomplete or wrong storage information. The -update-si newly requests the storage information from the PnfsManager.- -force-all
Without this option, only failed restore entries are retried. The -force-all option enforces to retry all pending requests in the Restore Controller list.
dCache version 1.4.8 provides an experimental version of the so called Large File Store capability which allows to store files which are not scheduled for beeing stored on the backend HSM. The dCache system may be operated in hybrid mode, supporting HSM connected and non HSM connected pools, while a pool itself can only be either of both. If configured as LFS, a pool may chose to declare newly incoming files as precious or as volatile. If precious, files are only removed in case the file entry disappears from pnfs. If volatile, files are removed if space is running short. To support LFS properly, the LazyCleaner must be replaced by the CleanerV2 cell (See below).Pool behaviour on LargeFileStore files Concerning the LargeFileStore a Pool may be configured in either of the following modes :
Option Value Description lfs none The pool assumes to be connected to an HSM. preciuos The pool is not connected to an HSM and declares incoming files as precious. volatile The pool is not connected to an HSM and declares incoming files as volatile. (Supported with 1.5.2) Persistent Cleaner
- none : the LargeFileStore capability is switched off. All newly written files are regarded precious and sent to the HSM backend following the configured rules.
- precious : newly create files are regarded precious but are not scheduled for the HSM store procedure. Consequently, these file are not part of the aging procedure and they will only disappear from the pool when deleted in pnfs. Consequently, pools holding LFS precious file may run out of space.
- volatile : newly create files are regarded cached and are not scheduled for the HSM store procedure. Though they will never be stored on tape, these file are part of the aging procedure and will be removed as soon as new space is needed.
The Cleaner module is responsible for removing deleted files from pools. For an HSM backended dCache this task is insofar not important as deleted files can't be accessed by clients and consequently will disappear from pools anyway through the regular aging process. This is different for LFS pools where files stay precious for their lifetime and therefor must be explicitly removed after they have been deleted from pnfs. An additional complexity arises though the fact that pools might be down at the time a files is deleted from pnfs, so that the Cleaner Module is not able to perform the actual remove process. The cells/CleanerV2 will be able to store those remove requests until they could be actually performed.R/W allowedcreate diskCacheV111.cells.CleanerV2 cleaner \ "default -trash=${trash} \ -db=${config}/cleaner \ -refresh=300"Make sure to create the ???/config/cleaner directory before the cleaner is started.Having an HSM as storage backend we can't allow a file to be modified after it has been written, mainly because the file could already be on tape. Without an HSM backend this limitation would be no longer necessary. Nevertheless we will stay with it for awhile.Moving LFS files into HSM backended dCache poolsWe are planing to allow a file to be moved from a directory assigned to a LargeFileStore pool to a directory backended by an HSM which then would store the file to the HSM. This as well is a mid term plan mainly because pnfs has to be modified to catch the move operation.Volatile LFSIn case a non volatile pool removes a file from its repository, the file has to be removed from the filesystem as well. Because the pool only know the I-node (pnfsid) of the file, the filesystem has to remove the fileentry by i-node. For regulare filesystem this is a non standard operation. Possibly the pnfs code has to be changed to support this remove operation.
When a pool is starting up, it may find its repository in nearly any condition. Control files could be missing, the expected data filesize might not match the filesize found on disk, a.s.o. With 1-4-8 the behaviour of the inventory scanner has been improved as described here.Pool Repository Disk Layout
Location Name Description Remarks <poolBase>/data/<pnfsId> Datafile Original Raw Datafile FS lastModified == dCache lastAccessed <poolBase>/control/<pnfsId> Status File cached,precious,receiving.client,receiving.store FS lastModified == dCache cached <poolBase>/control/SI-<pnfsId> SI File Storage Information (HSM specific) Inventory Procedure
The inventory procedure loops over all files in the <poolBase>/data directory which match the pnfsId format. Nonematching files are reported in the logfile but are ignored. The pnfsid is added to the repository of the pool if the following conditions are true.In case, one of the obove conditions is false, and the pool option -recover-control is not set, the inventory process is disrupted and the pool is disabled. If -recover-control is set and one of the requirements above is not satisfied, a recovery procedure for the particular pfns file is initiated.
- for each <poolBase>/data/<pnfsId> file, a <poolBase>/control/<pnfsId> must exist, must be readable and must contain one of the following keywords : cached,precious,receiving.client,receiving.store
- for each <poolBase>/data/<pnfsId> file, a <poolBase>/control/SI-<pnfsId> must exist, must be readable and must be deserializable.
- the pnfs filesize stored in the SI file must match the filesize of the datafile.
Result of the recovery procedure and its actions
Result Action No reply from PnfsManager Retry infinitly Pnfs File not found Remove file from repository SI filesize != datafile filesize Error Condition The behaviour on an Error Condition is determined by the -recover-anyway pool option. If set, the pool is starting up, and the related pnfs file is set BAD. If the option is not set the pool is disabled.
BAD files are reported by the rep ls -l=e pool command.
The dcache-1-4-8-beta-2 DCapDoorInterpreter is able to automatically retry a 'getStorageInfo' and 'getPool' request. While the 'getStorageInfo' retry period is set to 60 seconds, the 'getPool' request timeout is configurable by an DCapDoor option and by the poolRetry and dCapDoor-poolRetry context variables. Variable dCapDoor-poolRetry overwrites variablepoolRetry overwrites option poolRetry. BTW: checkout m-DCapDoorInterpreterV3 for a list of all DCapDoor options.
Cell Option
Contextvalue LoginManager keepAlive keep alive trigger in seconds LoginManager poolRetry
dCapDoor-poolRetryretry period for pool requests Remark : This new feature is only useful in case the PnfsManager or the PoolManager had been out of service for awhile. It won't help if a file can't be fetched from the HSM because the fetch request is handled by the PoolManager. So additional retries will be added to the PoolManager queue for that particual file and if the request is supended due to permanant HSM errors, only the client count for the file will be inceasing.
Generic Considerations
Starting with 1.5.1, all costs are calculated solely within the PoolManager and no longer in the pools itself. This is essential to support request anti clumping. Pools frequently are sending all necessary information about space usage and queue length to the PoolManagers CostModule. The cost module can be regarded as a cache for all this information. So its no longer necessary to send the 'get cost' requests to the pools for each client request. The CostModule interpolates the expected costs until a new precise information package is coming from the pools.In addition, the PoolCellInfo now is just the CellInfo plus the PoolCostInfo. The WebCollectorV3 has been modified accordingly. The WebCollectorV0 has been removed.
Pool 2 Pool transfer client
The pool 2 pool client transfers are now added to the total cost of a pool and they are reported to the 'pool request' web page as well.Although client pool 2 pool transfers seem to be handled within regular queues, they are not. Queuing both, p2p server and queue requests, has a (even though small) probability of deadlocks. So, p2p client requests are never actually queued but they start immediately after they have been requested. The p2p client 'max number of transfers' is only used to calculate the costs for those transfers.
Customizable Cost Calculation
Calculating the cost for a data transfer is still done in two steps. First, the CostModule merges all information about space and transfer queues of the pools to calucate the performance and space costs separately. Second, depending on the type of client request, these two numbers are merged by linear combination to build the actual cost for each pool. The first step has been isolated within a separate loadable class. The second step is still done somewhere in the PoolManager code. The next development step will be to add the second calculation as well to the customizable (loadable) class.The default cost calcuated can be overwritten by the PoolManager 'create option' :
-costCalculator=<newCostCalculator> The default is CostCalculationV5The goal is to compare different pool costs without intermediatly calculating scalar values for performance and space.LRU time used for calculating space cost.
In order to get some kind of global fair share aging of cached files, the cost, to remove a file from cache, now depends on the 'last accessed' timestamp of the LRU file. So the new space cost calculation is done as follows :
Definition value fs free pool space (not yet occupied by data) GAP Configured per Pool LRU age of L(east) R(ecent) U(sed) file (in seconds) m 60 c breakeven * 7 * 24 * 3600
Condition Cost fs < GAP Filesize / fs fs >= GAP LRU < m 1 + c / m LRU >= m 1 + c / LRU The value c takes care that cost values keep within a reasonable interval for very young LRU files. (pathological) The breakeven value is equivalent to ( cost - 1 ) for the LRU file of the age of one week. In other words, if breakeven is set to 0.1, the cost will be 1.1 if the LRU file has not been used for one week.
REMARK
To be backward compatible, this feature is only enabled if the breakeven (pool) value is below 1.0.
pool (admin) > set breakeven 0.1This has to be done on all connected pools. Values may differ from pool to pool. The higher the value will be, the later files on this pool will be declared 'removable'.
Under some conditions, it might not be desirable to allow a pool 2 pool transfer within the same host. We now allow to suppress those transfers. Two changes are need in order to enable this feature.A The pool has to learn on which host it is located : So the corresponding xxx.poollist file needs the additional tag per pool.
<poolName> <AbsolutePathToPool> tag.hostname=<hostname> <rest>B The pool manager cell has to be configured accordingly.
# # the pool manager shouldn't care whether or not p2p # source and destination pool are on the same host. # rc set sameHostCopy notchecked # # the pool manager should, under no circumstances, # initiate a p2p transfer between two pools on the # same host. # rc set sameHostCopy never # # the pool manager preferres pools on different hosts # but may as well chose pools on the same host if # necessary. (DEFAULT) # rc set sameHostCopy besteffort #
Pre 1-5-2 dCache versions tend to run into pool 2 pool transfer orgies if p2p on cost was enabled and the majority of pools had the costcut exceeded. This has been fixed with 1.5.2. Moreover, additional customizable parameters have been added to allow enhanced steering of pool transfer behaviour on high costs. For details see Pool2PoolSteering
The PoolManager.conf is backward compatible in the sense that pre 1.5.2 PoolManager.conf files are understood by the new PoolManager, but as soon as the new PoolManager rewrites the content of PoolManager.conf (save command), it can't be used by pre 1.5.2 PoolManagers.Related Command Summary
rc set p2p on|off|oncost rc set fallback oncost on|off rc set stage oncost on|off rc set slope <p2p source/destination slope> rc set max copies <maxNumberOfP2pCopies> rc set sameHostCopy never|besteffort|notchecked set costcuts [-<option>=<value> ... ] Options | Default | Description ------------------------------------------------------------------- idle | 0.0 | below 'idle' : 'reduce duplicate' mode p2p | 0.0 | above : start pool to pool mode alert | 0.0 | stop pool 2 pool mode, start stage only mode halt | 0.0 | suspend system fallback | 0.0 | Allow fallback in Permission matrix on high load A value of zero disabled the corresponding value # # DEPRECATED # set costcut <minCut>|* [<maxCut>] # DEPRECATED # # use 'set costcuts -idle=... -p2p=...' instead. #
Whenever the dCap library sends a checksum together with the close operation of a newly created file, the server stores this information in the pnfs database (level2).<format version>,<statistics> :<options>,c=1:<adler32(hex)>; <Pool Locations>This information is part of the StorageInfo and is delivered to the HSM flush and restore operations to allow checking the consistency of the dataset.
In order to be independed of the pnfs (nfs2) implementation, we store the size of the file in the pnfs database (level2).<format version>,<statistics> :<options>,l=<filesize>; <Pool Locations>If present, the storage info takes the filesize from this value and no longer from the filesize in pnfs.
The PoolManager can be enabled to send PoolStatusChangedMessage to and arbitrary pool. Those events are triggered whenever a pool is started, stopped, enabled, disabled and if it appears dead for whatever reason. The following options need to be specified in order to enable this feature.-poolStatusRelay=<destinationCellName> -watchdogThe getPoolMode, getDetailCode() and getDetailMessage are only valid if getMode() == DOWN. The PoolStatusChangedMessage holds the following information.The following messages are sent on the following occation :
Method Type Value Description getMode() int DOWN | RESTART Current pool state getPoolMode() PoolV2Mode Pool Mode Bits disable,strict ... getDetailCode() int any number > 0 reason for pool being disabled getDetailMessage() String any string or null reason for pool being disabled # # Pool starts up : # dcache0-0;status=DOWN;mode=disabled(fetch,store,stage,p2p-client,p2p-server,);code=(1,Initiallizing) # # pool finished 'repository scan' # dcache0-0;status=RESTART;code=(0) # # pool is disabled by commandline interface or PoolModifyModeMessage # # pool disable -rdonly 111 "Testing rdonly" # dcache0-0;status=DOWN;mode=disabled(store,stage,p2p-client,);code=(111,Testing rdonly) # # pool is enabled by commandline interface or PoolModifyModeMessage # # pool enable # dcache0-0;status=RESTART;code=(0) # # pool is shutdown # dcache0-0;status=DOWN;mode=disabled(dead,);code=(666,Shutdown) dcache0-0;status=DOWN;mode=disabled(dead,);code=(666,PingThread terminated) # # pool is dead (network disconnect, kill -9, host stucks ... ) # (takes about 10 minutes) # dcache0-0;status=DOWN;code=(666,DEAD) #
Pools may now be disablee in various modes by the pool command set as well as by the PoolModifyModeMessage. The basic disable, and with that all its variants, will prevent the pool from sending Up messages to the PoolManager and from replying on PoolCheckable messages. Consequently this pool doesn't exist any more for any automatic pool selection done by the PoolManager. This does of course not prevent other appliations, e.g. the ReplicaManager, from sending P2P messages to disabled pools. Depending on the type of 'disable' the pool will do as requested or will send an error to the source of the requestor.Disable Modes
Options may be combined.
Mode Command Set PoolV2Mode options Purpose PLAIN - DISABLED All operations are allowed (except PoolCheckable) strict -strict DISABLED_STRICT No operations are allowed read only -rdonly DISABLED_RDONLY All write operations are disallowed stage -stage DISABLED_STAGE Staging (hsm -> pool) is disallowed store -store DISABLED_STORE Storing (client -> pool) is disallowed fetch -fetch DISABLED_FETCH Fetching (pool -> client) is disallowed Pool 2 pool client -p2p-client DISABLED_P2P_CLIENT Pool can't be used as client for p2p New Command Set
pool disable [options] [[ ]] OPTIONS : -fetch # disallows fetch (transfer to client) -stage # disallows staging (from HSM) -store # disallows store (transfer from client) -p2p-client -rdonly # := store,stage,p2p-client -strict # := disallows everything DEPRICATED : The pool command "pool disablemode strict|fuzzy" is no longer valid and is silenty ignored. Message passing interface
EXAMPLE : int modeValue = PoolV2Mode.DISABLE_STAGE | PoolV2Mode.DISABLE_STORE | PoolV2Mode.DISABLE_P2P_CLIENT ; PoolV2Mode mode = new PoolV2Mode( modeValue ) ; PoolModifyModeMessage msg = new PoolModifyModeMessage( "myPool" , mode ) ;
- PoolSelectionUnitV1 accepts undefined pools. In case, the pool group default is defined, all not defined pools are accepted and added to that pool group. (Mainly for demo version).
- HSM specific info on LazyRestoreQueue Web Page. We allow to display HSM specific information on the (delayed) HSM Restore Queue Web page. For Fermi, the additional line will show the 'toString()' of the EnstoreStorageInfo (DONE) and at desy we will try to get the info from some kind of HSM proxy server. (NOT YET DONE)
- Sticky bit following p2p. If the sticky bit is set in pnfs all newly restaged files will get the bit set in the pools as well. This is now true for p2p copies as well.
- -strict-size For performance reasons we return from the close after a write operation without waiting until the filesize has been set in pnfs. The -strict-size option in the dCap door will halt the close until the size has been set. So that a fstat immediately after a close will return the correct value. -strick-size may as well be used as an extra option to the dCap library.
- AdvisoryDelete (SRM) is prepared : Pools accept PoolModifyPersistencyMessage to set all copies of a file in the cache 'cached'. PnfsManager removes file from filesystem if 'd=t' flag is set for this file.
- rc ls [regular Exprection] allows regular expression.
- rc suspend [on|off] allows to suspend all incoming requests.
- *.crcval The pool inventory run, removes dangling *.crcval files.