Enstore: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 15: Line 15:
We can mix large and small files in our data.  The largest file that can be written is something like 1TB, but as a practical matter, all mu2e files should be less than about 20GB, and when there is an option, they should be 2-5 GB.
We can mix large and small files in our data.  The largest file that can be written is something like 1TB, but as a practical matter, all mu2e files should be less than about 20GB, and when there is an option, they should be 2-5 GB.


We write to tape through the tape-backed [[Dcache|dCache]].  We only write files that are properly [[FileNames|named]], organized and documented by the [http://mu2e.fnal.gov/public/hep/computing/grid_workflow/ mu2egrid scripts].
We write to tape through the tape-backed [[Dcache|dCache]].  We only write files that are properly [[FileNames|named]], organized and documented by the [[MCProdWorkflow|mu2egrid scripts]].


To read files from tape, we would usually access them by the  [[MCProdWorkflow|mu2egrid scripts]] or the /pnfs mount of [[Dcache|dCache]].  See also [[SAM|SAM]] to help with the file names and locations.
To read files from tape, we would usually access them by the  [[MCProdWorkflow|mu2egrid scripts]] or the /pnfs mount of [[Dcache|dCache]].  See also [[SAM|SAM]] to help with the file names and locations.

Revision as of 14:58, 6 April 2018

The Scientific Computing Division maintains a system of data tapes called enstore (manual, project), which allows us to "write data to tape". The tapes are StorageTek T10000KC and T10000KD which hold 5TB and 8TB each, respectively. The tapes are held in a library and retrieved and inserted into a tape drive by a robot arm. We currently (2017) share about 20 tape drives with all of the Intensity Frontier.

mu2e tapes are divided into several file families depending on the type of data: production or user data, raw or reco, beam data or sim. The files assigned to a file family will go to one set of tapes for that file family. It can be useful to segregate data like this so it can be treated specially. For example, raw data might be stored with two copies in different buildings, while this is unnecessary for sim data.

A tape drive can read at 250MB/s so one tape can be read in about 5h. In reality, tapes are rarely read all the way though - we typically access single files at a time. Typical access times are:

  • 1m to find an mount the tape
  • 1m to seek to the file
  • 10s to read the file
  • 1m to dismount and replace the tape

Once the file is off the tape, it has to be copied to tape-backed dCache. If the file is over 300MB, it was written to tape as lone file and the copy to dCache is immediate. If it is smaller, then it was rolled in a tarball with other small files in a system called Small File Aggregation (SFA). In this case, it has to be extracted fro the tarball before being written to dCache and this can add up to 15s latency.

We can mix large and small files in our data. The largest file that can be written is something like 1TB, but as a practical matter, all mu2e files should be less than about 20GB, and when there is an option, they should be 2-5 GB.

We write to tape through the tape-backed dCache. We only write files that are properly named, organized and documented by the mu2egrid scripts.

To read files from tape, we would usually access them by the mu2egrid scripts or the /pnfs mount of dCache. See also SAM to help with the file names and locations. Large numbers of files, such as those required by grid jobs, require prestaging to make sure they are off tape and on disk before reading them.

A list of complete file listings is available, updated each day (use wget, too big for a browser).