StashCache
Introduction
Different disk systems are optimized for different demands. CVMFS is designed to distribute code releases. dCache is designed to deliver large datastes to the grid. The lab found there was an intermediate case, a few large files that had to be delivered to all nodes during a grid job. (Note this is a substantially different that the dCache case where each node gets a different data file.) Examples of this use case are Genie files or analysis template libraries for Nova, or in the case of mu2e, stopped muon ntuples, magnetic field maps, or sample data files would qualify. (Currently the first two are on CVMFS.)
The OSG solution for this use case is called StashCache, a merge of the CVMFS interface with dCache storage. The CVMFS interface makes the files look like they are on a nfs-mounted disk and can be copied to the working area or opened in place. The files are kept in dCache so the space available is much larger than for pure CVMFS, which is limited to a few GB cache on the local node.
Like CVMFS, StashCache is filled by copying data to central location. The data migrates out to other cache locations at grid sites on the time scale of an hour. After this latency, the files are available to all grid nodes on OSG.
Usage
Define a read a write area
export MU2E_STASH_WRITE=/pnfs/mu2e/persistent/stash export MU2E_STASH_READ=/cvmfs/mu2e.osgstorage.org/pnfs/fnal.gov/usr/mu2e/persistent
Files to be distributed are copied interactively to the appropriate area under the stash cache directory.
mkdir $MU2E_STASH_WRITE/users/$USER cp foo $MU2E_STASH_WRITE/users/$USER/foo
Interactively, or on any grid node, the user can copy it locally if a fast disk response is needed
cp $MU2E_STASH_READ/users/$USER/foo .
or simply open it as a disk file.
The cache area is not executable.