Enstore: Difference between revisions
No edit summary |
|||
Line 38: | Line 38: | ||
** Other files have been | ** Other files have been | ||
* SCD has asked us to identify which files on T10K media need to be migrated to LTO8 and which we can allow to expire. | * SCD has asked us to identify which files on T10K media need to be migrated to LTO8 and which we can allow to expire. | ||
In 10/2020, as part of migration from T10Kc to LTO8 tape formats, Rob deleted about 700TB of datasets, including most of the largest from CD3. | |||
[https://docs.google.com/spreadsheets/d/1oih26rLtdJUBVhiBPWEUjpymrriXzbH0bgpfNCLlu6k/edit?ts=5f82a612#gid=1393936171 google spreadsheet list] | |||
As part of the migration, we changed the SFA parameters from max file size of 300MB to 600MB and the target tarball size from 5GB to 8GB. | |||
[[Category:Computing]] | [[Category:Computing]] | ||
[[Category:Workflows]] | [[Category:Workflows]] | ||
[[Category:DataHandling]] | [[Category:DataHandling]] |
Revision as of 20:42, 27 October 2020
The Scientific Computing Division maintains a system of data tapes called enstore (manual, project), which allows us to "write data to tape". The tapes are held in libraries in Feynman Cmputing Center and the Grid Computing Center building. When files are needed, a robot arm retrieves them and inserts them in a tape drive.
There are currently (9/2018) two types of tapes and we are transitioning from the first to the second.
T10K StorageTek The T10000KC and T10000KD hold 5TB and 8TB each, respectively. A tape drive can read at 250MB/s so one tape can be read in about 5h. We currently (2017) share about 20 tape drives with all of the Intensity Frontier.
LTO Linear Tape Open The LTO8 drives have a quoted maximum speed between 300 MB/s and 350 MB/s depending on media type . The LTO8 tapes have a capacity of 12TB but there is a global unavailability of those tapes, so we're using "M8" (LTO7 formatted with LTO8) tapes which have a capacity of 9 TB. There is a total of 56 drives for the intensity frontier, but a large fraction are used for converting T10K data to LTO8.
Mu2e tapes are divided into several file families depending on the type of data: production or user data, raw or reco, beam data or sim. The files assigned to a file family will go to one set of tapes for that file family. It can be useful to segregate data like this so it can be treated specially. For example, raw data might be stored with two copies in different buildings, while this is unnecessary for sim data.
The maximum read rates of tapes are 100's of MB/s, but in reality, tapes are rarely read all the way though or efficiently - we typically access single files at a time. Typical access times are:
- 1m to find and mount the tape
- 1m to seek to the file
- 10s to read the file
- 1m to dismount and replace the tape
If requests for multiple files from a single tape are queued, then those requests will be grouped and ordered to improve the drive efficiency and reduce wear.
Once the file is off the tape, it has to be copied to tape-backed dCache. If the file is over 300MB, it was written to tape as lone file and the copy to dCache is immediate. If it is smaller, then it was rolled in a tarball with other small files in a system called Small File Aggregation (SFA). In this case, it has to be extracted fro the tarball before being written to dCache and this can add up to 15s latency.
We can mix large and small files in our data. The largest file that can be written is something like 1TB, but as a practical matter, all mu2e files should be less than about 20GB, and when there is an option, they should be 2-5 GB.
We write to tape through the tape-backed dCache. We only write files that are properly named, organized and documented by the mu2egrid scripts.
To read files from tape, we would usually access them by the mu2egrid scripts or the /pnfs mount of dCache. See also SAM to help with the file names and locations. Large numbers of files, such as those required by grid jobs, require prestaging to make sure they are off tape and on disk before reading them.
A list of complete file listings is available, updated each day (use wget, too big for a browser).
Status on 9/23/2020
TDR and CD3 era tapes were written on T10K media
- Since Sept 28, 2018 at ~11:50 AM writing to LTO8 media
- 12 TB per volume
- Recommended minimum file size 1.2 GB
- Selected files on T10K media are being migrated to LTO8
- Other files have been
- SCD has asked us to identify which files on T10K media need to be migrated to LTO8 and which we can allow to expire.
In 10/2020, as part of migration from T10Kc to LTO8 tape formats, Rob deleted about 700TB of datasets, including most of the largest from CD3. google spreadsheet list
As part of the migration, we changed the SFA parameters from max file size of 300MB to 600MB and the target tarball size from 5GB to 8GB.