FileFamilies: Difference between revisions
(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
__TOC__ | |||
==Introduction== | ==Introduction== | ||
A file family is a set of files which are grouped exclusively on the same | A file family is a set of files which are grouped exclusively on the same | ||
set of tapes. File families are used to indicate files | set of tapes. File families are used to indicate groups of files | ||
that may be treated differently during data-handling | that may be treated differently during data-handling | ||
operations. This might include tape library location, | operations. This might include tape library location, | ||
Line 13: | Line 15: | ||
==List== | ==List== | ||
Here are the mu2e file families. Knowledge about these file families is built into the production and upload procedures so the | Here are the standard mu2e file families. Knowledge about these file families is built into the | ||
[[AnalysisWorkflow|production]] and [[Upload|upload]] procedures so the | |||
typical users does not need to understand this in any detail. | typical users does not need to understand this in any detail. | ||
<ul> | <ul> | ||
<li><strong>phy-raw</strong> | <li><strong>phy-raw</strong> Raw data from the central or subdetector DAQ system. Data written here is copied onto two different tapes. | ||
<li><strong>phy-rec</strong> | <li><strong>phy-rec</strong> Reconstructed beam/cosmic data | ||
<li><strong>phy-ntd</strong> Production non-art format ntuples of reconstructed beam/cosmic data | |||
<li><strong>phy-sim</strong> Monte Carlo simulated or reconstructed art files. These are official collaboration samples only, originated, produced, validated, and documented by physics groups intended for long-term use by many collaborators. Examples are the TDR and CD3 samples. The username associated with the files will be the production username "mu2e". | <li><strong>phy-sim</strong> Monte Carlo simulated or reconstructed art files. These are official collaboration samples only, originated, produced, validated, and documented by physics groups intended for long-term use by many collaborators. Examples are the TDR and CD3 samples. The username associated with the files will be the production username "mu2e". | ||
<li><strong>phy-nts</strong> non-art format ntuples of phy-sim | <li><strong>phy-nts</strong> non-art format ntuples of phy-sim | ||
<li><strong>phy-etc</strong> configuration files, | <li><strong>phy-etc</strong> configuration files, | ||
tarballs of log files, backups, and other files | tarballs of log files, backups, and other files | ||
<li><strong>usr-dat</strong> User data-based reconstruction or ntuples | |||
<li><strong>usr-sim</strong> Monte Carlo simulated or reconstructed art files. | <li><strong>usr-sim</strong> Monte Carlo simulated or reconstructed art files. | ||
These samples are produced by one or a few individuals for use in their | These samples are produced by one or a few individuals for use in their | ||
Line 34: | Line 38: | ||
<li><strong>usr-nts</strong> Non-art format ntuples of usr-sim | <li><strong>usr-nts</strong> Non-art format ntuples of usr-sim | ||
<li><strong>usr-etc</strong> Other user-created tarballs of log files, backups | <li><strong>usr-etc</strong> Other user-created tarballs of log files, backups | ||
<li><strong>tst-cos</strong> Testbeam and cosmic data created before detector commissioning. This would include raw data | <li><strong>tst-cos</strong> (to be deprecated, use phy-raw) Testbeam and cosmic data created before detector commissioning. This would include raw data | ||
formats as well as various possible derived formats and tarballs. Data written here are stored on tape with two automatic copies of each file. | formats as well as various possible derived formats and tarballs. Data written here are stored on tape with two automatic copies of each file. | ||
</ul> | </ul> | ||
When uploading files, you will | When uploading files, you must follow the [[Upload|upload]] recipes. The scripts will steer your data to the correct file family based on the file name, see the next section. | ||
the file family. | |||
Currently (9/2021) all families have a width of 2, except usr-nts which has a width of 3. The width is how many tape drives can be used in parallel to write file to this family. | |||
==Determining file family== | ==Determining file family== | ||
Line 55: | Line 57: | ||
|style="width:40%"| other users | |style="width:40%"| other users | ||
|- | |- | ||
| | | raw || phy-raw || phy-raw | ||
|- | |- | ||
| rec || phy-rec | | rec || phy-rec || usr-dat | ||
|- | |- | ||
| ntd || phy-ntd | | ntd || phy-ntd || usr-dat | ||
|- | |- | ||
| ext || ? || | | ext || ? || usr-dat | ||
|- | |- | ||
| rex || ? || | | rex || ? || usr-dat | ||
|- | |- | ||
| xnt || ? || | | xnt || ? || usr-dat | ||
|- | |- | ||
| cnf || phy-etc || usr-etc | | cnf || phy-etc || usr-etc | ||
|- | |- | ||
| sim || phy-sim || usr-sim | | sim || phy-sim || usr-sim | ||
|- | |||
| dts || phy-sim || usr-sim | |||
|- | |- | ||
| mix || phy-sim || usr-sim | | mix || phy-sim || usr-sim | ||
Line 89: | Line 91: | ||
| job || N/A || N/A | | job || N/A || N/A | ||
|} | |} | ||
==A default family== | |||
The computing division has created a default file family for Mu2e called '''sfu_archive'''. This would contain files uploaded using the ''fife_utils'' tools. These tools are intended for convenience of the user who just wants to backup an area to tape quickly. The tools do not understand the Mu2e location convention, filename convention, or sam metadata standards, so in general we do not recommend using these tools. | |||
Latest revision as of 22:01, 8 September 2021
Introduction
A file family is a set of files which are grouped exclusively on the same set of tapes. File families are used to indicate groups of files that may be treated differently during data-handling operations. This might include tape library location, groupings for migration, deletion, duplication, copy offsite, groupings for access priority or dCache location or lifetime on disk. For example, we expect to group raw data, reconstructed data, and simulations on different sets of tapes for data security and operational efficiency.
List
Here are the standard mu2e file families. Knowledge about these file families is built into the production and upload procedures so the typical users does not need to understand this in any detail.
- phy-raw Raw data from the central or subdetector DAQ system. Data written here is copied onto two different tapes.
- phy-rec Reconstructed beam/cosmic data
- phy-ntd Production non-art format ntuples of reconstructed beam/cosmic data
- phy-sim Monte Carlo simulated or reconstructed art files. These are official collaboration samples only, originated, produced, validated, and documented by physics groups intended for long-term use by many collaborators. Examples are the TDR and CD3 samples. The username associated with the files will be the production username "mu2e".
- phy-nts non-art format ntuples of phy-sim
- phy-etc configuration files, tarballs of log files, backups, and other files
- usr-dat User data-based reconstruction or ntuples
- usr-sim Monte Carlo simulated or reconstructed art files. These samples are produced by one or a few individuals for use in their personal studies. They are probably for short-term use, not documented publically, and not used by many collaborators. The username associated with these files will be the person most likely to understand how they were created and how they should be used if questions come up a year or two later - the intellectual owner of the data.
- usr-nts Non-art format ntuples of usr-sim
- usr-etc Other user-created tarballs of log files, backups
- tst-cos (to be deprecated, use phy-raw) Testbeam and cosmic data created before detector commissioning. This would include raw data formats as well as various possible derived formats and tarballs. Data written here are stored on tape with two automatic copies of each file.
When uploading files, you must follow the upload recipes. The scripts will steer your data to the correct file family based on the file name, see the next section.
Currently (9/2021) all families have a width of 2, except usr-nts which has a width of 3. The width is how many tape drives can be used in parallel to write file to this family.
Determining file family
In the file name convention the first two fields are the data_tier (a logical grouping based on physics contents) and the user name (either "mu2e" for collaboration files, or a person's username for everything else). We have decided by convention that these two fields will determine which file family a file will be steered to. This is a useful feature which allows the file family to be determined automatically. The table below reflects the coded convention. The case of real detector data is likely to be more complex, and will be decided when needed.
data_tier | user mu2e | other users |
raw | phy-raw | phy-raw |
rec | phy-rec | usr-dat |
ntd | phy-ntd | usr-dat |
ext | ? | usr-dat |
rex | ? | usr-dat |
xnt | ? | usr-dat |
cnf | phy-etc | usr-etc |
sim | phy-sim | usr-sim |
dts | phy-sim | usr-sim |
mix | phy-sim | usr-sim |
dig | phy-sim | usr-sim |
mcs | phy-sim | usr-sim |
nts | phy-nts | usr-nts |
log | phy-etc | usr-etc |
bck | phy-etc | usr-etc |
etc | phy-etc | usr-etc |
job | N/A | N/A |
A default family
The computing division has created a default file family for Mu2e called sfu_archive. This would contain files uploaded using the fife_utils tools. These tools are intended for convenience of the user who just wants to backup an area to tape quickly. The tools do not understand the Mu2e location convention, filename convention, or sam metadata standards, so in general we do not recommend using these tools.