StashCache: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 6: Line 6:
The OSG found there was demand for an intermediate case - a few large files that had to be delivered to  
The OSG found there was demand for an intermediate case - a few large files that had to be delivered to  
all nodes during a grid job.  (Note this is a substantially different that the dCache case where
all nodes during a grid job.  (Note this is a substantially different that the dCache case where
each node gets a different data file.)  Examples of this use case are Genie files or  
each node gets a different data file.)  Examples of this use case are GENIE files or  
analysis template libraries for Nova, or in the case of mu2e, stopped muon ntuples,  
analysis template libraries for Nova, or in the case of mu2e, stopped muon ntuples,  
magnetic field maps, or sample data files would qualify.  (Currently the first two are
magnetic field maps, or sample data files would qualify.  (Currently the first two are

Revision as of 20:05, 4 May 2017

Introduction

Different disk systems are optimized for different demands. CVMFS is designed to distribute code releases. dCache is designed to deliver large datasets to the grid. The OSG found there was demand for an intermediate case - a few large files that had to be delivered to all nodes during a grid job. (Note this is a substantially different that the dCache case where each node gets a different data file.) Examples of this use case are GENIE files or analysis template libraries for Nova, or in the case of mu2e, stopped muon ntuples, magnetic field maps, or sample data files would qualify. (Currently the first two are on CVMFS.)

The OSG solution for this use case is called StashCache, a merge of the CVMFS interface with dCache storage. The CVMFS interface makes the files look like they are on a nfs-mounted disk and can be copied to the working area or opened in place. The files are kept in dCache so the space available is much larger than for pure CVMFS, which is limited to a few GB cache on the local node.

Like CVMFS, StashCache is filled by copying data to central location. The data migrates out to other cache locations every 30 min. After this latency, the files are available to all grid nodes on OSG.

Most grid nodes have a 1 GB local disk cache for StacheCache. If a small file is opened in StashCache, it might be copied to the node's cache by the CVMFS software. From there, it could be accessed repeatedly by the original job, and by other jobs that run on this node, efficiently. If a larger StashCache file is opened, it will stay in the site's main cache and data is transferred by xrootd.

Usage

Define read and write areas:

export MU2E_STASH_WRITE=/pnfs/mu2e/persistent/stash
export MU2E_STASH_READ=/cvmfs/mu2e.osgstorage.org/pnfs/fnal.gov/usr/mu2e/persistent

Files to be distributed are copied interactively to the appropriate area under the stash cache directory.

mkdir $MU2E_STASH_WRITE/users/$USER
cp foo $MU2E_STASH_WRITE/users/$USER/foo

Interactively, or on any grid node, the user can copy it locally if a fast disk response is needed

cp $MU2E_STASH_READ/users/$USER/foo .

or simply open it as a disk file.

The cache area is not executable. The StashCache developers state that the overall performance will be similar if you copy the file locally, or open it in the cache, so either approach may be used.