MCProdWorkflow: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
Line 3: Line 3:
This workflow is for production-style simulation jobs. It can be used for stage-1 jobs which start with a generator, or later simulation stages which start with the output files of previous stages.  It will result in the output files being concatenated and uploaded to tape and properly documented through the SAM database.  It is intended for cases where the output needs to be saved for more than a month or so, or that might be used by many collaborators, or needs to be carefully documented.  If your work doesn't need to be uploaded or is more personal or temporary, you can follow the [[MCScrWorkflow|scratch workflow]] which does not concatenate or upload.  Most commonly this procedure would be part of a collaboration simulation effort and would be run out of the [[Mu2epro|mu2epro account]], but is can be run in a personal account.  
This workflow is for production-style simulation jobs. It can be used for stage-1 jobs which start with a generator, or later simulation stages which start with the output files of previous stages.  It will result in the output files being concatenated and uploaded to tape and properly documented through the SAM database.  It is intended for cases where the output needs to be saved for more than a month or so, or that might be used by many collaborators, or needs to be carefully documented.  If your work doesn't need to be uploaded or is more personal or temporary, you can follow the [[MCScrWorkflow|scratch workflow]] which does not concatenate or upload.  Most commonly this procedure would be part of a collaboration simulation effort and would be run out of the [[Mu2epro|mu2epro account]], but is can be run in a personal account.  


The user should prepare the physics, and relevant fcl in detail and perform basic test before starting this procedure.
You will need to plan the project in some detail before starting this production workflow.
* the physics
* the basic fcl to perform the job
* the output [[FileNames|dataset names]]
* the [[JobPlan|job plan]] in terms of number and length of jobs, etc.


This page assumes that the user is familiar with the basic infrastructure and its references:
This page assumes that the user is familiar with the basic infrastructure and its references:
* [[Simulation]], [[FclIntro|fcl]]
* [[Simulation]], [[FclIntro|fcl]]
*[[Grids|grid]], [[JobPlan|job planning]], [[OfflineOps|monitoring]]
* [[FileNames|file names]], [[FileTools|file tools]], [[SAM]]
* [[FileNames|file names]], [[FileTools|file tools]], [[SAM]]
* [[Grids|grids]], [[Dcache|dCache]], [[Enstore|enstore]]
* [[Grids|grids]], [[Dcache|dCache]], [[DataTransfer|data transfer]], [[Enstore|enstore]]
* [[Grids|grid]], [[JobPlan|job planning]], [[OfflineOps|monitoring]]
* [[Prestage|prestaging]], [[Concatenate|concatenation]], [https://cdcvs.fnal.gov/redmine/projects/mu2egrid/wiki mu2egrid]  
* [[Prestage|prestaging]], [[Concatenate|concatenation]], [https://cdcvs.fnal.gov/redmine/projects/mu2egrid/wiki mu2egrid]  


The basic steps, expanded below are  
The basic steps, expanded below are  
* if needed, prestage input files
* prestage input files, if needed
* generate a set of fcl files
* generate a set of fcl files
* register the fcl dataset with SAM, and copy fcl files to dCache
* register the fcl dataset with SAM, and copy fcl files to dCache
* submit jobs
* submit jobs
* check output and recover failed jobs
* check output and recover failed jobs
* if needed, concatenate output files
* concatenate output files, if needed
* upload output files
* upload output files
* tar and upload log files
* tar and upload log files
Line 28: Line 32:
show what will be done without performing the action.
show what will be done without performing the action.


==Collecting Inputs==
It is useful to have a working area:
<pre>
/mu2e/data/users/$USER/projects/my_project
# input dataset, if needed
and areas for the jobs main fcl
export INPUTDS=sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art
/mu2e/data/users/$USER/projects/my_project/fcl/job
#
and for each concatenation
# project, similar to cd3-beam or cd3-cosmic
/mu2e/data/users/$USER/projects/my_project/fcl/output1
export PROJECT=abc-phys
# stage, like g4s1
export STAGE=g4s1
export WORKDIR=/mu2e/data/users/$USER/projects/$PROJECT
export FCLDIR=$WORKDIR/fcl
# user or mu2e for mu2epro for collaboration work
DSOWNER=mu2e
# a unique string for versions in case project has restarts or variations
DSCONF=v0
#
# tags, one each for for output steams, like dsregion, mubeam, truncated, crv
# output datas
export OUTDS1=out1
export CATSTAGE=cs1
export CATDIR1=/mu2e/data/users/$USER/projects/$PROJECT/fcl/out1
</pre>


<pre>
mkdir -p $FCLDIR
mkdir -p $CATDIR1
# repeated ..


==Prestage input dataset==


</pre>
Prestaging makes sure the input dataset has been copied off tape to disk,
so it is ready to use.  If there is no input dataset, or it is known to be on disk (in scratchdCache for example)
skip this step.


==1) Prestage inputs==
We recommend starting the prestaging as soon as possible since it can take several days.  Please follow the [[Prestage|prestage]] instructions for the input dataset.
 
We recommend starting the prestaging as soon as possible since it can take several days.  Please follow the [[Prestage|prestage]] instructions for $INPUTDS.


==2) Generate fcl ==
==2) Generate fcl ==


Please follow the
Please follow the [[GenerateFcl|instructions]] for generating production fcl.

Revision as of 16:57, 5 April 2017

Introduction

This workflow is for production-style simulation jobs. It can be used for stage-1 jobs which start with a generator, or later simulation stages which start with the output files of previous stages. It will result in the output files being concatenated and uploaded to tape and properly documented through the SAM database. It is intended for cases where the output needs to be saved for more than a month or so, or that might be used by many collaborators, or needs to be carefully documented. If your work doesn't need to be uploaded or is more personal or temporary, you can follow the scratch workflow which does not concatenate or upload. Most commonly this procedure would be part of a collaboration simulation effort and would be run out of the mu2epro account, but is can be run in a personal account.

You will need to plan the project in some detail before starting this production workflow.

  • the physics
  • the basic fcl to perform the job
  • the output dataset names
  • the job plan in terms of number and length of jobs, etc.

This page assumes that the user is familiar with the basic infrastructure and its references:

The basic steps, expanded below are

  • prestage input files, if needed
  • generate a set of fcl files
  • register the fcl dataset with SAM, and copy fcl files to dCache
  • submit jobs
  • check output and recover failed jobs
  • concatenate output files, if needed
  • upload output files
  • tar and upload log files

The mu2egrid and related packages provides Mu2e-specific code required for submitting jobs and manipulating files. Most scripts support the --help option. Look for the --dry-run and --verbose options to show what will be done without performing the action.

It is useful to have a working area:

/mu2e/data/users/$USER/projects/my_project

and areas for the jobs main fcl

/mu2e/data/users/$USER/projects/my_project/fcl/job

and for each concatenation

/mu2e/data/users/$USER/projects/my_project/fcl/output1


Prestage input dataset

Prestaging makes sure the input dataset has been copied off tape to disk, so it is ready to use. If there is no input dataset, or it is known to be on disk (in scratchdCache for example) skip this step.

We recommend starting the prestaging as soon as possible since it can take several days. Please follow the prestage instructions for the input dataset.

2) Generate fcl

Please follow the instructions for generating production fcl.