Difference between revisions of "Upload"

From Mu2eWiki
Jump to navigation Jump to search
(Created page with "== Introduction == Keeping all Intensity Frontier data on disks is not practical, so large datasets must be written to tape. At the same time, the data must always be avail...")
 
 
(57 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
==  Introduction ==
 
==  Introduction ==
Keeping all Intensity Frontier data on disks is not practical,
+
mu2e has several forms of [[Disks|disk space]] available and large aggregated data disk systems are available in [[Dcache|dCache]]. 
so large datasets must be written to tape.  At the same time, the data
+
But we also write a large part of the files we produce to tape, which is less expensive, and can hold much more data.
must always be available and delivered efficiently. 
+
We usually write data to tape for one or more of the following reasons
The solution is coordinating several subsystems:
+
* to make room for new activity
 +
* to keep it safe for longer than a few months
 +
* to make a permanent record
  
<ul>
+
The tape system is called [[Enstore|enstore]] and consists of several tape libraries and many tape drives with good connections to dCache.  We write to tape by copying files into tape-backed dCache and they are copied automatically to tape. The files will, on the scale of weeks if they are unused, be deleted off disk so they are only on tape.  We can get them copied from tape to disk again by [[Prestage|prestaging]] them.
<li><b>dCache</b>: a set of disk servers, a database of files
 
on the servers, and services to deliver those files with high throughput
 
<ul>
 
<li>[scratchDcache.shtml <b>scratch dCache</b>] : a dCache where least used files are purged as space is needed.
 
<li><b>tape-backed dCache</b>: a dCache where all files are on tape and are
 
cycled in and out of the dCache as needed
 
</ul>
 
<li><b>pnfs</b>: an nfs server behind the /pnfs/mu2e parition
 
which looks like a file system to users, but
 
is actually a interface to the dCache file database.
 
<li><b>Enstore</b>: the Fermilab system of tape and tape drive management
 
<li><b>SAM</b>: Serial Access to Metadata, a database of file metadata
 
and a system for managing large-scale file delivery
 
<li><b>FTS</b>:File Transfer Service, a process which manages the intake
 
of files into the tape-backed dCache and SAM.
 
<li><b>jsonMaker</b>: a piece of mu2e code which helps create
 
and check metadata when creating a SAM record of a file
 
<li><b>SFA</b>: Small File Aggregation, enstore can tar up small
 
files into a single large file before it goes to tape,
 
to increase tape efficiency.
 
</ul>
 
  
 +
'''All data written to tape must follow conventions and must be written through production scripts.'''  You may want to familiarize yourself with the links in this list. The recipes below will guide you through the steps of a recipe.  If you are using production scripts for simulation, most of these conventions are provided for you.
 +
* all files are named by [[FileNames|mu2e name conventions]]
 +
* all files will have a [[SAM|SAM record]] with [[SamMetadata|SAM metadata]], including file location in [[Dcache|dCache]]
 +
* all files are uploaded using [[FileTools|standard tools]], see especially [[FileTools#printJson|printJson]].
  
The basic procedure is for the user to run the jsonMaker
+
==Upload Concepts==
on a data file to make the  
+
Generally there are the following conceptual steps to upload a file.  The practical recipes using the production scripts are in the following sections.
[http://json.org json file] ,
+
# rename the files by the [[FileNames|standard convention]]
then copy both the data file and the json
+
# use [[FileTools#printJson|printJson]] to generate a [http://json.org json file] containing the [[SamMetadata|SAM metadata]] for each data file.
into an FTS area in </a "scratchDcache.shtml">scratch dCache</a>  
+
# declare the files to SAM database using <code>[[FileTools#mu2eFileDeclare|mu2eFileDeclare]]</code>
called a dropbox.
+
# copy the files to tape-backed [[Dcache|dCache]] using <code>[[FileTools#mu2eFileUpload|mu2eFileUpload]]</code>
The json file is
+
# include the final tape location into the SAM record, using <code>[[FileTools#mu2eDatasetLocation|mu2eDatasetLocation]]</code>
essentially a set of metadata fields with the
 
corresponding values.  The
 
[http://mu2esamgpvm02.fnal.gov:8787/fts/status FTS] 
 
will see the file with its json
 
file, and copy the file to a permanent location in
 
the tape-backed dCache and use the json to create a metadata record in SAM.
 
The tape-backed dCache will migrate the file to tape quickly
 
and the SAM record will be updated with the tape location.
 
Users will use [sam.shtml SAM] to read the files
 
in tape-backed dCache.
 
  
 +
Please also see the comments about file sizes in the [[JobPlan|job planning page]].
  
Since there is some overhead in uploading, storing and retrieving
+
In the standard [[MCProdWorkflow|MC workflow]], the first 2 steps are done for you by the grid job scriptPlease skip down to that [[Upload#MC workflow, art files|specific workflow]] for the remainder of that recipe.
each file, the ideal file size is as large as reasonable.
 
This size limit should be determined by how long an executable
 
will typically take to read the fileThis will vary according to  
 
exe settings and other factors, so a conservative estimate
 
should be used.    A file should be sized so that the longest
 
jobs reading it should take about 4 to 8 hours to run, which
 
generally provides efficient large-scale job processing.
 
A grid job that reads a few files in 4 hours is nearly as efficient,
 
so you can err on the small size.
 
You definately want to avoid a single
 
job section requiring only part of a large file.
 
Generally,
 
file sizes should not go over 20 GB in any case because they get
 
less convenient in several ways.
 
Files can be concatenated
 
to make them larger, or split to make them smaller.
 
Note - we have agreed that a subrun will only appear in one file.
 
Until we get more experience with data handling, and
 
see how important these effects are, we will
 
often upload files in the size we make them or find them.
 
  
Once files have been moved into the FTS directories,  
+
For uploading the log files in the standard [[MCProdWorkflow|MC workflow]], the log files need to be tarred up firstPlease skip down to that [[Upload#MC workflow, log files|specific workflow]] for the remainder of that recipe.
please do not try to move or delete them since this will
 
confuse the FTS and require a hand cleanupOnce files
 
are on tape, there is an expert procedure to delete them,  
 
and files of the same name can then be uploaded to replace
 
the bad files.
 
  
<!********************************************************>
+
For uploading random files not associated with the standard [[MCProdWorkflow|MC workflow]], please see [[#Random files|that workflow]]
==  Recipe==
 
  
If you are about to run some new Monte Carlo in the official framework,
+
If you have never uploaded files of a particular type before, you may get a permission denied error during a mkdir command. In this case, please contact [mailto:kutschke@fnal.gov,gandr@fnal.gov,rlc@fnal.gov mu2eDataAdmin].
then the upload will be built into the scripts and documented
 
with the mu2egrid
 
[ Monte Carlo submission] process.
 
<font color=red>this is under development,  
 
please ask Andrei for the status</font>
 
  
 +
==MC workflow, art files==
 +
In the standard MC [[MCProdWorkflow|workflow]], there are three times you might upload files:
 +
* after generating the fcl, uploading the fcl files is part of [[GenerateFcl|the generate fcl procedure]]
 +
* after producing art files (including concatenation if needed), which is described in this section
 +
* upload log files as an archive, which is handled [[#MC workflow, log files|in the following section]]
  
Existing files on local disks can be uploaded using the following steps.
+
After the jobs have completed, and you have checked the output by running [[SubmitJobs#Checking_output|mu2eCheckAndMove script]], the output datasets will be below a "good" directory like the following, where you will be working:
The best approach would be to read quickly through the rest of this
+
cd /pnfs/mu2e/persistent/users/mu2epro/workflow/project_name/good
page for concepts then focus on the
+
Below this directory, there are directories for each cluster, and below that directories for each job.
[uploadExample.shtml upload examples]  page.
+
Each output art file named "a.b.c.d.e.f" should have a associated json file called "a.b.c.d.e.f.json" produced as part of the grid job and containing the SAM record metadata.
  
<ul>
+
There are two steps.  First, declare the files to the SAM database
<li>choose values for the [#metadata SAM Metadata] , including
+
<pre>
      the appropriate [#ff file family]
+
mu2eClusterFileList --dsname <dataset> --json <cluster_directory>  | mu2eFileDeclare
<li>record the above items in a json file fragment
+
</pre>
that will apply to all the files in your dataset
+
where <code>dataset</code> is the dataset name of the files to find and upload and
<li>[#name rename]  your files by the upload convention
+
the <code>cluster_directory</code> is one of the cluster subdirectories.
(This can also be done by jsonMaker in the next step.)
+
If you see errors while declaring files, check that you have
<li>setup an offline release and run the [#jsonMaker jsonMaker] 
+
a [[Authentication|valid certificate]].
to write the json file, which will include the fragment from the previous step
 
<li>use "ifdh cp" to copy the data file and the full json file to the FTS area
 
/pnfs/mu2e/scratch/fts (This step can also can be done by jsonMaker.)
 
<li>use [sam.shtml SAM]  to access the file or its metadata
 
</ul>
 
  
 +
The second step is to move the files to the final location in tape-backed dCache:
 +
<pre>
 +
mu2eClusterFileList --dsname <dataset> <cluster_directory>  | mu2eFileUpload --ifdh --tape
 +
</pre>
 +
If you see a permission denied error during a mkdir command, please contact [mailto:kutschke@fnal.gov,gandr@fnal.gov,rlc@fnal.gov mu2eDataAdmin]. 
 +
 +
Currently (5/2018) we are seeing an increasing number of problems reading and writing files using the nfs interface to dCache.  If you see extreme slowness, you can put in a ticket and ask to have the dCache nfs server restarted.  Using the "--ifdh" switch will cause the data to be transferred by more reliable protocols.
 +
 +
If you want a list of the files in their final location, instead of an expensive ls with wildcards, please use a [[FileTools|file tool]]
 +
<pre>
 +
mu2eDatasetFileList --tape <dataset>
 +
</pre>
 +
Don't forget <code>--tape</code> is a binary option so doesn't take a n "=".
 +
 +
 +
The third step is to tell SAM where the files are in the tape system, to add their "location" to the SAM record.
 +
<pre>
 +
mu2eDatasetLocation --add=tape <dataset>
 +
</pre>
 +
Since it takes about a day, or sometimes more, for a file to migrate to tape and establish its tape location, after being copied to tape-backed Cache, it makes sense to wait a day before running this command
 +
 +
This command should be as many times as needed in order to get the "Nothing to do" message, which means all the files in the dataset now have their location recorded:
 +
<pre>
 +
> mu2eDatasetLocation --add=tape sim.mu2e.cd3-pions-cs1.v563.art
 +
  No virtual files in dataset sim.mu2e.cd3-pions-cs1.v563.art. Nothing to do on Mon Nov 21 18:11:29 2016.
 +
  SAMWeb times: query metadata = 0.00 s, update location = 0.00 s
 +
  Summary1: out of 0 virtual dataset files 0 were not found on tape.
 +
  Summary2: successfully verified 0 files, added locations for 0 files.
 +
  Summary3: found 0 corrupted files and 0 files without tape labels.
 +
</pre>
 +
 +
==MC workflow, log files==
 +
After the desired datasets have been extracted from job outputs in
 +
a <code>good</code> area the <code>mu2eClusterArchive</code> can be
 +
used to save the rest of the files, usually logs and histograms,  to tape.
 +
 +
The <code>mu2eClusterArchive</code>
 +
script by default archives job logs.  "Non-interesting" files, such as the TFileService file with names like "nts.*.root"
 +
can either be deleted with e.g.
 +
<pre>
 +
mu2eClusterFileList --dsname <nts dataset name>  <cluster directory> | xargs rm -f
 +
mu2eClusterFileList --dsname <nts dataset name> --json <cluster directory> | xargs rm -f
 +
</pre>
 +
or archived together with the logs (the recommended production procedure):
 +
<pre  style="font-size:80%">
 +
> mu2eClusterArchive  --allow nts.gandr.cd3-pions-g4s1.v567.root  <cluster directory>
 +
1      Mon Nov 21 17:59:05 2016  Working on /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
 +
Mon Nov 21 17:59:06 2016  Try 1: archiving /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
 +
Mon Nov 21 17:59:06 2016  Archiving /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
 +
Mon Nov 21 17:59:06 2016  Registering /pnfs/mu2e/tape/usr-etc/bck/gandr/my-test-s1/v567/tbz/f4/9e/bck.gandr.my-test-s1.v567.002700_00000001.tbz in SAM
 +
Creating a dataset definition for bck.gandr.my-test-s1.v567.tbz
 +
Mon Nov 21 17:59:07 2016  Removing  /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
 +
Done archiving 1 directories. Encountered 0 tar errors.
 +
</pre>
 +
If you are archiving a cluster whose art output was later concatenated before uploading you should also remove these redundant art files:
 +
<pre>
 +
mu2eClusterFileList --dsname <art dataset name>  <cluster directory> | xargs rm -
 +
</pre>
 +
 +
'''Note''' that the directory to be archived is moved
 +
from <code>good</code> into a parallel subdirectory
 +
of <code>archiving</code> before any processing is
 +
done. This is to prevent race conditions with other scripts that
 +
can be working on the same files.  If you get an error from
 +
<code>mu2eClusterArchive</code>, you can recover by moving directory
 +
back into "good" before trying to archive it again.  You may need to delete the output tar file that was being written to the tape-backed dCache area (usually if the error occurred during the creation of this file). The file name should be in the command print.
 +
 +
If you see a permission denied error during a mkdir command, please contact [mailto:kutschke@fnal.gov,gandr@fnal.gov,rlc@fnal.gov mu2eDataAdmin].
 +
 +
 +
To record tape label information for a recently archived dataset:
 +
<pre>
 +
mu2eDatasetLocation --add=tape bck.gandr.my-test-s1.v567.tbz
 +
</pre>
 +
If there is no tape label, re-run the command later.  You may need
 +
to wait a day before a new file acquires a tape label.
 +
 +
If you want a list of the files in their final location, instead of an expensive ls with wildcards, please use a [[FileTools|file tool]]
 +
<pre>
 +
mu2eDatasetFileList --tape bck.gandr.my-test-s1.v567.tbz
 +
</pre>
 +
Don't forget <code>--tape</code> is a binary option so doesn't take a n "=".
 +
 +
 +
==Random files==
 +
 +
Random files are files that are not created in the process of simulation in the mu2egrid package. For simulation, which is almost all cases, please follow the workflows above. 
 +
 +
There are two categories of files:
 +
* data files which you expect to read back as ready-to-use data files.  These are usually art files or ntuple files. This also include fcl file which will be used to feed grid jobs.
 +
* all other data such as txt, logs, scripts, analysis areas, etc. should be put into tarballs ( tarred, extension tar, tar and gzipped, extension tgz,  or tar and bzip2, extension tbz). Tarballs should be between 0.5 and 2 GB, as an important guideline, but there are no strict limits.
 +
 +
Test beam data is a gray area, where we have used both tarred and untarred files. The choice depends on whether the data will be read back as input data in grid jobs (which should be uploaded as individual files) or archives (upload as tarballs). 
 +
 +
Here are the steps to upload files.
 +
 +
<ol>
 +
<li>Please see [[FileNames|file names]] documentation.  You should end up with a 5-dot dataset name like these examples</li>
 +
sim.batman.beam-mytarget.v0.art
 +
bck.batman.node123.2014-06-04.tgz
 +
Here "batman" represents your username.
 +
<li>Setup tools:</li>
 +
<pre>
 +
(setup an appropriate Offline version)
 +
setup mu2efiletools
 +
setup dhtools
 +
</pre>
 +
</ol>
 +
 +
At this point, you have several more steps to do.  You have a choice of two methods:
 +
* use <code>mu2eFileMoveToTape</code>, which does them in one command, but the command takes up to one day to finish
 +
* perform the steps individually, but you have to perform the last one after waiting a day.
 +
There is a day delay in both methods because the system will write files to tape as needed, or once a day.  We have to wait for the file to go to tape before we can write its tape location in the SAM database, completing its record.
 +
The <code>mu2eFileMoveToTape</code> method will only write the files to tape, but the multi-step method you can also write the files to a location (determined by the file name) on persistent or scratch dCache.
 +
 +
=== Option 1, one command, block for a day===
 +
<ol start=3>
 +
<li> Let this one command do all steps</li> 
 +
  mu2eFileMoveToTape <files>
 +
</ol>
 +
where each of the files complies with the Mu2e naming convention.
 +
The command will not exit until the files are on tape and a location has been determined, finishing the record.
 +
This can take a day or longer, therefore it is strongly recommended to run
 +
this under VNC or a terminal multiplexer, such as tmux or screen.
 +
Then you will be able to start the command, detach the session if needed,
 +
then re-connect to the same session again to check the progress.
 +
 +
The mu2eFileMoveToTape command is robust against interruptions.
 +
If you started to move your files and the operation
 +
was interrupted, you can re-run this script on the same files.  Once
 +
everything is done, the original files are removed.  A file not
 +
removed means something was not done.  In that case, first, make
 +
sure that no other mu2eFileMoveToTape process is running on the same
 +
files anywhere (not just on the current compute node), then
 +
try re-running the command.
 +
Do not remove the file by hand, because you will be left
 +
with incomplete SAM info, and possibly a damaged copy on tape.
 +
 +
An [[An example of how to archive files to tape|example]] use case.
 +
 +
=== Option 2, perform individual steps===
 +
<ol start=3>
 +
<li> Generate the metadata file for the data file.</li> 
 +
generate a json metadata file for each data file:
 +
<pre>
 +
printJson <datafile> > <json file>
 +
</pre>
 +
 +
The command for an art file will be like:
 +
<pre>
 +
printJson --no-parents sim.batman.beam-mytarget.v0.00001002_000005.art > sim.batman.beam-mytarget.v0.00001002_000005.art.json
 +
</pre>
 +
The command for art files takes a few seconds since it has to open the file and extract run and subrun numbers.
 +
 +
The command for a different type of file, like a backup tarball might look like:
 +
<pre>
 +
printJson --no-parents bck.batman.node123.2014-06-04.0000.tgz > bck.batman.node123.2014-06-04.0000.tgz.json
 +
</pre>
 +
 +
<li> Declare files to SAM</li>
 +
ls *.json | mu2eFileDeclare
 +
If you see errors while declaring files, check that you have a [[Authentication|valid certificate]].
 +
 +
<li> move the files to tape-backed dCache:</li>
 +
ls *.art | mu2eFileUpload --tape
 +
If you see a permission denied error during a mkdir command, please contact [mailto:kutschke@fnal.gov,gandr@fnal.gov,rlc@fnal.gov mu2eDataAdmin].
 +
 +
<li> After a day or two, come back to the project.  By this time, the files will have migrated to tape, and you can record the final tape location:
 +
</li>
 +
mu2eDatasetLocation --add=tape <dataset>
 +
Since it is hard to predict exactly when all files will go to tape, you may need to re-run this command occasionally until you get the message "Nothing to do".
 
</ol>
 
</ol>
The following is some detail you should be aware of in general, but
+
 
a detailed knowledge is not required.
+
 
 +
[[Category:Computing]]
 +
[[Category:Workflows]]
 +
[[Category:DataHandling]]

Latest revision as of 05:44, 5 May 2021

Introduction

mu2e has several forms of disk space available and large aggregated data disk systems are available in dCache. But we also write a large part of the files we produce to tape, which is less expensive, and can hold much more data. We usually write data to tape for one or more of the following reasons

  • to make room for new activity
  • to keep it safe for longer than a few months
  • to make a permanent record

The tape system is called enstore and consists of several tape libraries and many tape drives with good connections to dCache. We write to tape by copying files into tape-backed dCache and they are copied automatically to tape. The files will, on the scale of weeks if they are unused, be deleted off disk so they are only on tape. We can get them copied from tape to disk again by prestaging them.

All data written to tape must follow conventions and must be written through production scripts. You may want to familiarize yourself with the links in this list. The recipes below will guide you through the steps of a recipe. If you are using production scripts for simulation, most of these conventions are provided for you.

Upload Concepts

Generally there are the following conceptual steps to upload a file. The practical recipes using the production scripts are in the following sections.

  1. rename the files by the standard convention
  2. use printJson to generate a json file containing the SAM metadata for each data file.
  3. declare the files to SAM database using mu2eFileDeclare
  4. copy the files to tape-backed dCache using mu2eFileUpload
  5. include the final tape location into the SAM record, using mu2eDatasetLocation

Please also see the comments about file sizes in the job planning page.

In the standard MC workflow, the first 2 steps are done for you by the grid job script. Please skip down to that specific workflow for the remainder of that recipe.

For uploading the log files in the standard MC workflow, the log files need to be tarred up first. Please skip down to that specific workflow for the remainder of that recipe.

For uploading random files not associated with the standard MC workflow, please see that workflow

If you have never uploaded files of a particular type before, you may get a permission denied error during a mkdir command. In this case, please contact mu2eDataAdmin.

MC workflow, art files

In the standard MC workflow, there are three times you might upload files:

  • after generating the fcl, uploading the fcl files is part of the generate fcl procedure
  • after producing art files (including concatenation if needed), which is described in this section
  • upload log files as an archive, which is handled in the following section

After the jobs have completed, and you have checked the output by running mu2eCheckAndMove script, the output datasets will be below a "good" directory like the following, where you will be working:

cd /pnfs/mu2e/persistent/users/mu2epro/workflow/project_name/good

Below this directory, there are directories for each cluster, and below that directories for each job. Each output art file named "a.b.c.d.e.f" should have a associated json file called "a.b.c.d.e.f.json" produced as part of the grid job and containing the SAM record metadata.

There are two steps. First, declare the files to the SAM database

 mu2eClusterFileList --dsname <dataset> --json <cluster_directory>  | mu2eFileDeclare

where dataset is the dataset name of the files to find and upload and the cluster_directory is one of the cluster subdirectories. If you see errors while declaring files, check that you have a valid certificate.

The second step is to move the files to the final location in tape-backed dCache:

mu2eClusterFileList --dsname <dataset> <cluster_directory>  | mu2eFileUpload --ifdh --tape

If you see a permission denied error during a mkdir command, please contact mu2eDataAdmin.

Currently (5/2018) we are seeing an increasing number of problems reading and writing files using the nfs interface to dCache. If you see extreme slowness, you can put in a ticket and ask to have the dCache nfs server restarted. Using the "--ifdh" switch will cause the data to be transferred by more reliable protocols.

If you want a list of the files in their final location, instead of an expensive ls with wildcards, please use a file tool

mu2eDatasetFileList --tape <dataset>

Don't forget --tape is a binary option so doesn't take a n "=".


The third step is to tell SAM where the files are in the tape system, to add their "location" to the SAM record.

mu2eDatasetLocation --add=tape <dataset>

Since it takes about a day, or sometimes more, for a file to migrate to tape and establish its tape location, after being copied to tape-backed Cache, it makes sense to wait a day before running this command

This command should be as many times as needed in order to get the "Nothing to do" message, which means all the files in the dataset now have their location recorded:

> mu2eDatasetLocation --add=tape sim.mu2e.cd3-pions-cs1.v563.art
  No virtual files in dataset sim.mu2e.cd3-pions-cs1.v563.art. Nothing to do on Mon Nov 21 18:11:29 2016.
  SAMWeb times: query metadata = 0.00 s, update location = 0.00 s
  Summary1: out of 0 virtual dataset files 0 were not found on tape.
  Summary2: successfully verified 0 files, added locations for 0 files.
  Summary3: found 0 corrupted files and 0 files without tape labels.

MC workflow, log files

After the desired datasets have been extracted from job outputs in a good area the mu2eClusterArchive can be used to save the rest of the files, usually logs and histograms, to tape.

The mu2eClusterArchive script by default archives job logs. "Non-interesting" files, such as the TFileService file with names like "nts.*.root" can either be deleted with e.g.

mu2eClusterFileList --dsname <nts dataset name>  <cluster directory> | xargs rm -f
mu2eClusterFileList --dsname <nts dataset name> --json <cluster directory> | xargs rm -f

or archived together with the logs (the recommended production procedure):

> mu2eClusterArchive   --allow nts.gandr.cd3-pions-g4s1.v567.root  <cluster directory>
1       Mon Nov 21 17:59:05 2016  Working on /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
Mon Nov 21 17:59:06 2016  Try 1: archiving /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
Mon Nov 21 17:59:06 2016  Archiving /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
Mon Nov 21 17:59:06 2016  Registering /pnfs/mu2e/tape/usr-etc/bck/gandr/my-test-s1/v567/tbz/f4/9e/bck.gandr.my-test-s1.v567.002700_00000001.tbz in SAM
Creating a dataset definition for bck.gandr.my-test-s1.v567.tbz
Mon Nov 21 17:59:07 2016  Removing  /pnfs/mu2e/scratch/users/gandr/workflow/pion-test/archiving/20161121-1759-bwOu/11986465
Done archiving 1 directories. Encountered 0 tar errors.

If you are archiving a cluster whose art output was later concatenated before uploading you should also remove these redundant art files:

mu2eClusterFileList --dsname <art dataset name>  <cluster directory> | xargs rm -

Note that the directory to be archived is moved from good into a parallel subdirectory of archiving before any processing is done. This is to prevent race conditions with other scripts that can be working on the same files. If you get an error from mu2eClusterArchive, you can recover by moving directory back into "good" before trying to archive it again. You may need to delete the output tar file that was being written to the tape-backed dCache area (usually if the error occurred during the creation of this file). The file name should be in the command print.

If you see a permission denied error during a mkdir command, please contact mu2eDataAdmin.


To record tape label information for a recently archived dataset:

mu2eDatasetLocation --add=tape bck.gandr.my-test-s1.v567.tbz

If there is no tape label, re-run the command later. You may need to wait a day before a new file acquires a tape label.

If you want a list of the files in their final location, instead of an expensive ls with wildcards, please use a file tool

mu2eDatasetFileList --tape bck.gandr.my-test-s1.v567.tbz

Don't forget --tape is a binary option so doesn't take a n "=".


Random files

Random files are files that are not created in the process of simulation in the mu2egrid package. For simulation, which is almost all cases, please follow the workflows above.

There are two categories of files:

  • data files which you expect to read back as ready-to-use data files. These are usually art files or ntuple files. This also include fcl file which will be used to feed grid jobs.
  • all other data such as txt, logs, scripts, analysis areas, etc. should be put into tarballs ( tarred, extension tar, tar and gzipped, extension tgz, or tar and bzip2, extension tbz). Tarballs should be between 0.5 and 2 GB, as an important guideline, but there are no strict limits.

Test beam data is a gray area, where we have used both tarred and untarred files. The choice depends on whether the data will be read back as input data in grid jobs (which should be uploaded as individual files) or archives (upload as tarballs).

Here are the steps to upload files.

  1. Please see file names documentation. You should end up with a 5-dot dataset name like these examples
  2. sim.batman.beam-mytarget.v0.art bck.batman.node123.2014-06-04.tgz Here "batman" represents your username.
  3. Setup tools:
  4. (setup an appropriate Offline version)
    setup mu2efiletools
    setup dhtools
    

At this point, you have several more steps to do. You have a choice of two methods:

  • use mu2eFileMoveToTape, which does them in one command, but the command takes up to one day to finish
  • perform the steps individually, but you have to perform the last one after waiting a day.

There is a day delay in both methods because the system will write files to tape as needed, or once a day. We have to wait for the file to go to tape before we can write its tape location in the SAM database, completing its record. The mu2eFileMoveToTape method will only write the files to tape, but the multi-step method you can also write the files to a location (determined by the file name) on persistent or scratch dCache.

Option 1, one command, block for a day

  1. Let this one command do all steps
  2. mu2eFileMoveToTape <files>

where each of the files complies with the Mu2e naming convention. The command will not exit until the files are on tape and a location has been determined, finishing the record. This can take a day or longer, therefore it is strongly recommended to run this under VNC or a terminal multiplexer, such as tmux or screen. Then you will be able to start the command, detach the session if needed, then re-connect to the same session again to check the progress.

The mu2eFileMoveToTape command is robust against interruptions. If you started to move your files and the operation was interrupted, you can re-run this script on the same files. Once everything is done, the original files are removed. A file not removed means something was not done. In that case, first, make sure that no other mu2eFileMoveToTape process is running on the same files anywhere (not just on the current compute node), then try re-running the command. Do not remove the file by hand, because you will be left with incomplete SAM info, and possibly a damaged copy on tape.

An example use case.

Option 2, perform individual steps

  1. Generate the metadata file for the data file.
  2. generate a json metadata file for each data file:
    printJson <datafile> > <json file>
    

    The command for an art file will be like:

    printJson --no-parents sim.batman.beam-mytarget.v0.00001002_000005.art > sim.batman.beam-mytarget.v0.00001002_000005.art.json
    

    The command for art files takes a few seconds since it has to open the file and extract run and subrun numbers.

    The command for a different type of file, like a backup tarball might look like:

    printJson --no-parents bck.batman.node123.2014-06-04.0000.tgz > bck.batman.node123.2014-06-04.0000.tgz.json
    
  3. Declare files to SAM
  4. ls *.json | mu2eFileDeclare If you see errors while declaring files, check that you have a valid certificate.
  5. move the files to tape-backed dCache:
  6. ls *.art | mu2eFileUpload --tape If you see a permission denied error during a mkdir command, please contact mu2eDataAdmin.
  7. After a day or two, come back to the project. By this time, the files will have migrated to tape, and you can record the final tape location:
  8. mu2eDatasetLocation --add=tape <dataset> Since it is hard to predict exactly when all files will go to tape, you may need to re-run this command occasionally until you get the message "Nothing to do".