MCProdWorkflow: Difference between revisions
No edit summary |
|||
Line 70: | Line 70: | ||
==Concatenate output dataset== | ==Concatenate output dataset== | ||
If you need to concatenate the output datasets, please follow the [[Concatenate|instructions]] for each dataset. This procedure is essentially the same as the main job: generate fcl, submit and recover jobs | If you need to concatenate the output datasets, please follow the [[Concatenate|instructions]] for each dataset. This procedure is essentially the same as the main job: generate fcl, submit and recover jobs, and you will end up with the new datasets as output, and those will be uploaded in the next step. |
Revision as of 20:25, 11 April 2017
Introduction
This workflow is for production-style simulation jobs. It can be used for stage-1 jobs which start with a generator, or later simulation stages which start with the output files of previous stages. It will result in the output files being concatenated and uploaded to tape and properly documented through the SAM database. It is intended for cases where the output needs to be saved for more than a month or so, or that might be used by many collaborators, or needs to be carefully documented. If your work doesn't need to be uploaded or is more personal or temporary, you can follow the scratch workflow which does not concatenate or upload. Most commonly this procedure would be part of a collaboration simulation effort and would be run out of the mu2epro account, but is can be run in a personal account.
You will need to plan the project in some detail before starting this production workflow.
- the physics
- the basic fcl to perform the job
- the output dataset names
- the job plan in terms of number and length of jobs, etc.
This page assumes that the user is familiar with the basic infrastructure and its references:
- Simulation, fcl
- file names, file tools, SAM
- grids, dCache, data transfer, enstore
- grid, job planning, monitoring
- prestaging, concatenation, mu2egrid
The basic steps, expanded below are
- prestage input files, if needed
- generate a set of fcl files
- register the fcl dataset with SAM, and copy fcl files to dCache
- submit jobs
- check output and recover failed jobs
- concatenate output files, if needed, for each dataset
- generate a set of fcl files
- register the fcl dataset with SAM, and copy fcl files to dCache
- submit jobs
- check output and recover failed jobs
- upload output files
- tar and upload log files
The mu2egrid
and related packages provide Mu2e-specific code
required for submitting jobs and manipulating files. Most scripts support the
--help
option. Look for the
--dry-run
and --verbose
options to
show what will be done without performing the action.
Directories
It is useful to have a working area:
/mu2e/data/users/$USER/projects/my_project
and areas for the jobs main fcl
/mu2e/data/users/$USER/projects/my_project/fcl/job
and for each concatenation
/mu2e/data/users/$USER/projects/my_project/fcl/output1
For official collaboration work, the output will go to
/pnfs/mu2e/persistent/users/mu2epro/workflow/project_name/STATUS
for individual's work, the output will go to
/pnfs/mu2e/persistent/users/$USER/workflow/project_name/STATUS
where STATUS is
outstage
for output from grid jobsgood
for jobs that have been checked and passedfailed
for jobs that have been checked and passed
Prestage input dataset
Prestaging makes sure the input dataset has been copied off tape to disk, so it is ready to use. If there is no input dataset, or it is known to be on disk (in scratch dCache for example) skip this step.
We recommend starting the prestaging as soon as possible since it can take several days. Please follow the prestage instructions for the input dataset.
Generate fcl
Please follow the instructions for generating fcl.
Submit Jobs
Please follow the instructions for submitting jobs.
Concatenate output dataset
If you need to concatenate the output datasets, please follow the instructions for each dataset. This procedure is essentially the same as the main job: generate fcl, submit and recover jobs, and you will end up with the new datasets as output, and those will be uploaded in the next step.