FileFamilies

From Mu2eWiki
Revision as of 15:57, 12 November 2020 by Rlc (talk | contribs) (→‎List)
Jump to navigation Jump to search

Introduction

A file family is a set of files which are grouped exclusively on the same set of tapes. File families are used to indicate files that may be treated differently during data-handling operations. This might include tape library location, groupings for migration, deletion, duplication, copy offsite, groupings for access priority or dCache location or lifetime on disk. For example, we expect to group raw data, reconstructed data, and simulations on different sets of tapes for data security and operational efficiency.

List

All the files of a file family are grouped exclusively on the same set of tapes. This creates different groups of physical tapes.

Here are the mu2e file families. Knowledge about these file families is built into the production and upload procedures so the typical users does not need to understand this in any detail.

  • phy-raw (planned) Raw data from the central or subdetector DAQ system. Data written here is copied onto two different tapes.
  • phy-rec (planned) Reconstructed raw data
  • phy-ntd (planned) Production ntuples of reconstructed raw data
  • phy-sim Monte Carlo simulated or reconstructed art files. These are official collaboration samples only, originated, produced, validated, and documented by physics groups intended for long-term use by many collaborators. Examples are the TDR and CD3 samples. The username associated with the files will be the production username "mu2e".
  • phy-nts non-art format ntuples of phy-sim
  • phy-etc configuration files, tarballs of log files, backups, and other files
  • usr-dat User data-based reconstruction or ntuples
  • usr-sim Monte Carlo simulated or reconstructed art files. These samples are produced by one or a few individuals for use in their personal studies. They are probably for short-term use, not documented publically, and not used by many collaborators. The username associated with these files will be the person most likely to understand how they were created and how they should be used if questions come up a year or two later - the intellectual owner of the data.
  • usr-nts Non-art format ntuples of usr-sim
  • usr-etc Other user-created tarballs of log files, backups
  • tst-cos Testbeam and cosmic data created before detector commissioning. This would include raw data formats as well as various possible derived formats and tarballs. Data written here are stored on tape with two automatic copies of each file.

For real data taking, more file families will be created to hold raw data, reconstructed data, and ntuples, etc.

When uploading files, you will need to specify the file family. Collaboration production will use "phy-*", users will use usr-sim for Monte Carlo art files, usr-nts for ntuples, and usr-etc for tarballs and anything else.

Determining file family

In the file name convention the first two fields are the data_tier (a logical grouping based on physics contents) and the user name (either "mu2e" for collaboration files, or a person's username for everything else). We have decided by convention that these two fields will determine which file family a file will be steered to. This is a useful feature which allows the file family to be determined automatically. The table below reflects the coded convention. The case of real detector data is likely to be more complex, and will be decided when needed.

data_tier user mu2e other users
raw phy-raw (planned) usr-dat (planned)
rec phy-rec (planned) usr-dat (planned)
ntd phy-ntd (planned) usr-dat (planned)
ext ? usr-dat (planned)
rex ? usr-dat (planned)
xnt ? usr-dat (planned)
cnf phy-etc usr-etc
daq phy-sim usr-sim
sim phy-sim usr-sim
mix phy-sim usr-sim
dig phy-sim usr-sim
mcs phy-sim usr-sim
nts phy-nts usr-nts
log phy-etc usr-etc
bck phy-etc usr-etc
etc phy-etc usr-etc
job N/A N/A