Difference between revisions of "GenerateFcl"

From Mu2eWiki
Jump to navigation Jump to search
 
(47 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
The fcl might drive a simulation project that starts with a generator, or a later stage of simulation starting
 
The fcl might drive a simulation project that starts with a generator, or a later stage of simulation starting
 
with the art file output of an earlier stage, or a file concatenation project, or even an analysis project.
 
with the art file output of an earlier stage, or a file concatenation project, or even an analysis project.
 +
See [[SimulationFCL]] for the fcl patterns used in MDC2020.
  
 
You would typically come to this page through the [[MCProdWorkflow]] procedure.
 
You would typically come to this page through the [[MCProdWorkflow]] procedure.
 +
 +
To learn how to generate and reconstruct physics processes see [[SimulationFCL]].
  
 
==Preparation==
 
==Preparation==
Line 12: Line 15:
 
with interactive jobs before it is prepared for a grid job.   
 
with interactive jobs before it is prepared for a grid job.   
 
Several examples of simulation fcl files are under the <code>JobConfig</code>  
 
Several examples of simulation fcl files are under the <code>JobConfig</code>  
subdirectory of Offline.
+
subdirectory of Production repo.  See also [[SimulationFCL]] for the fcl patterns used in MDC2020.
  
 
If a fcl file is working interactively, it will also work on the grid,
 
If a fcl file is working interactively, it will also work on the grid,
Line 40: Line 43:
 
<pre>
 
<pre>
 
setup mu2e
 
setup mu2e
source the appropriate Offline setup.sh file for your project
+
muse setup  the appropriate build for your project
 
setup mu2etools
 
setup mu2etools
 
setup mu2efiletools
 
setup mu2efiletools
 
</pre>
 
</pre>
  
The setup of <code>mu2etools</code> above makes the <code>generate_fcl</code> command available in your path.
+
The setup of <code>mu2etools</code> above makes the <code>generate_fcl</code> command available in your path.
 +
 
 +
When you write to the SAM database, usually by mu2eFileDeclare, you will need to have a [[Authentication|valid certificate]].
  
 
==Generating fcl==
 
==Generating fcl==
Line 73: Line 78:
 
<ul>
 
<ul>
 
   <li><code>--description</code> the "description" field of fcl file names</li>
 
   <li><code>--description</code> the "description" field of fcl file names</li>
   <li><code>--dsconf</code> the "configuration" field</li>
+
   <li><code>--dsconf</code> the "configuration" field of fcl file names</li>
   <li><code>--dsowner</code> the "owner" field.  This parameter
+
   <li><code>--dsowner</code> the "owner" field of fcl file names.  This parameter
 
defaults to the user who executes <code>generate_fcl</code>, but
 
defaults to the user who executes <code>generate_fcl</code>, but
 
can  be overridden. For example, <code>--dsowner=mu2e</code>
 
can  be overridden. For example, <code>--dsowner=mu2e</code>
Line 80: Line 85:
 
   </li>
 
   </li>
 
</ul>
 
</ul>
 +
 +
One set of fcl files can be used to produce a series of different output datasets by changing the version of Offline used with the fcl.  Conceptually then, the description and dsconf should be relevant to the fcl files themselves, and not the output dataset you have in mind today.  For example, the same fcl file generating protons from stopped muons can be run with different version of Offline whcih contain different versions of geant, producing two different datasets.  So the fcl name should reflect "protons from stopped muons" not "geant study".
 +
  
 
The <code>--old-seeds</code> parameter can be used
 
The <code>--old-seeds</code> parameter can be used
Line 99: Line 107:
 
specify <code>--old-seeds=/dev/null</code>.
 
specify <code>--old-seeds=/dev/null</code>.
  
 +
Please take a look at [[RunNumbers]] before choosing what run number to use. 1000 is the recommended default for simulation with no run dependence.
 +
 +
 +
'''Note''': If you use the auxinput switch of generate_fcl in order to provide auxiliary input files (such as mixing files or stopped muon files) to your job and declare your fcl files to SAM, then the files used in the auxinput switch should already be declared to SAM.  The auxinput files will be considered as parents (or precursors) to your fcl file, so the fcl file SAM record should record that fact.  For standard mixing files, the upload and declare is already done for you.  If you use personal files, then they should be uploaded and declared before using them in the fcl.  For a personal, temporary job, you also can chose to not declare your fcl files, which will avoid this issue.
  
 
Run <code>generate_fcl --help</code> to see all the options.
 
Run <code>generate_fcl --help</code> to see all the options.
Line 110: Line 122:
 
cd /mu2e/data/users/$USER/projects/my_project/fcl/job
 
cd /mu2e/data/users/$USER/projects/my_project/fcl/job
 
</pre>
 
</pre>
 +
 +
Please take a look at [[RunNumbers]] before choosing what run number to use. 1000 is the recommended default for simulation with no run dependence.
  
  
Line 119: Line 133:
 
line file with just an include directive, in our case
 
line file with just an include directive, in our case
 
<pre>
 
<pre>
#include "JobConfig/cd3/pions/pions_g4s1.fcl"
+
#include "Production/JobConfig/primary/CeEndpoint.fcl"
 
</pre>
 
</pre>
 
but one can also add e.g. geometry file overrides, or even write a
 
but one can also add e.g. geometry file overrides, or even write a
Line 126: Line 140:
 
we use <code>template.fcl</code> as the template file name.
 
we use <code>template.fcl</code> as the template file name.
 
Note that the include directive should specify include file pathname
 
Note that the include directive should specify include file pathname
relative to the Offline directory that you setup earlier.  (More precisely, relative to
+
relative to the Muse build directory that you setup earlier.  (More precisely, relative to
 
a directory listed in the FHICL_FILE_PATH.) <em>Absolute filenames
 
a directory listed in the FHICL_FILE_PATH.) <em>Absolute filenames
 
do not work in fhicl #include.</em>
 
do not work in fhicl #include.</em>
Line 135: Line 149:
 
generate_fcl --description=my-test-s1 \
 
generate_fcl --description=my-test-s1 \
 
             --dsconf=v0 \
 
             --dsconf=v0 \
             --run=2700 \
+
             --run=1000 \
 
             --events=1000 \
 
             --events=1000 \
 
             --njobs=5 \
 
             --njobs=5 \
    template.fcl
+
            --embed template.fcl
 
</pre>
 
</pre>
 
After the command completes, we will see something like
 
After the command completes, we will see something like
Line 145: Line 159:
 
000  template.fcl  seeds.gandr.my-test-s1.v0.Td6j.txt
 
000  template.fcl  seeds.gandr.my-test-s1.v0.Td6j.txt
 
> ls 000
 
> ls 000
cnf.gandr.my-test-s1.v0.002700_00000000.fcl      cnf.gandr.my-test-s1.v0.002700_00000002.fcl.json
+
cnf.gandr.my-test-s1.v0.001000_00000000.fcl      cnf.gandr.my-test-s1.v0.001000_00000002.fcl.json
cnf.gandr.my-test-s1.v0.002700_00000000.fcl.json  cnf.gandr.my-test-s1.v0.002700_00000003.fcl
+
cnf.gandr.my-test-s1.v0.001000_00000000.fcl.json  cnf.gandr.my-test-s1.v0.001000_00000003.fcl
cnf.gandr.my-test-s1.v0.002700_00000001.fcl      cnf.gandr.my-test-s1.v0.002700_00000003.fcl.json
+
cnf.gandr.my-test-s1.v0.001000_00000001.fcl      cnf.gandr.my-test-s1.v0.001000_00000003.fcl.json
cnf.gandr.my-test-s1.v0.002700_00000001.fcl.json  cnf.gandr.my-test-s1.v0.002700_00000004.fcl
+
cnf.gandr.my-test-s1.v0.001000_00000001.fcl.json  cnf.gandr.my-test-s1.v0.001000_00000004.fcl
cnf.gandr.my-test-s1.v0.002700_00000002.fcl      cnf.gandr.my-test-s1.v0.002700_00000004.fcl.json
+
cnf.gandr.my-test-s1.v0.001000_00000002.fcl      cnf.gandr.my-test-s1.v0.001000_00000004.fcl.json
 
</pre>
 
</pre>
 
The generated fcl files and their corresponding json files are
 
The generated fcl files and their corresponding json files are
Line 155: Line 169:
 
subdirectory.  Random number seeds used for all the fcl files have
 
subdirectory.  Random number seeds used for all the fcl files have
 
been dumped into the "seeds" file.
 
been dumped into the "seeds" file.
 
  
 
===Example 2 - secondary stage  or concatenation===
 
===Example 2 - secondary stage  or concatenation===
Line 166: Line 179:
 
line file with just an include directive, in our case
 
line file with just an include directive, in our case
 
<pre>
 
<pre>
#include "JobConfig/cd3/beam/beam_g4s2.fcl"
+
#include "JobConfig/beam/beam_g4s2.fcl"
 
</pre>
 
</pre>
 
but one can also add e.g. geometry file overrides, or even write a
 
but one can also add e.g. geometry file overrides, or even write a
Line 178: Line 191:
  
 
If the jobs is simply concatenation, the template file will be, with a choice of output file name:
 
If the jobs is simply concatenation, the template file will be, with a choice of output file name:
  #include "JobConfig/cd3/common/artcat.fcl"
+
  #include "JobConfig/common/artcat.fcl"
 
  outputs.out.fileName: "sim.DSOWNER.cd3-beam-cs3-mothers.DSCONF.SEQ.art"
 
  outputs.out.fileName: "sim.DSOWNER.cd3-beam-cs3-mothers.DSCONF.SEQ.art"
 
Please look inside this fcl for directions on how to deal with the output file name. (Since the description is not known when the fcl file is written, you have to specify it in the fcl file.)
 
Please look inside this fcl for directions on how to deal with the output file name. (Since the description is not known when the fcl file is written, you have to specify it in the fcl file.)
Line 187: Line 200:
 
</pre>
 
</pre>
  
If the input dataset is on disk in dCache because it is the output of a previous stage that just completed, you can make the input file list from that area.  If you ran mu2eCheckAndMove, it still be under the "good" subdirectory.
+
If the input dataset is on disk in dCache because it is the output of a previous stage that just completed, you can make the input file list from that area.  If you ran mu2eCheckAndMove, it should still be under the "good" subdirectory.
 +
<pre>
 +
mu2eClusterFileList --dsname <clusterDirFullPath>  >  <myPath>/inputs.txt
 +
</pre>
 +
For example:
 
<pre>
 
<pre>
 
cd /pnfs/mu2e/persistent/users/mu2epro/workflow/beam_g4s1_g4v10_p03_validation_rc1/good
 
cd /pnfs/mu2e/persistent/users/mu2epro/workflow/beam_g4s1_g4v10_p03_validation_rc1/good
mu2eDatasetFileList sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art > \
+
mu2eClusterFileList --dsname sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art $PWD/1123456 > \
 
   /mu2e/data/users/mu2epro/projects/geant_val_p03_s2/fcl/job/inputs.txt
 
   /mu2e/data/users/mu2epro/projects/geant_val_p03_s2/fcl/job/inputs.txt
 
</pre>
 
</pre>
 +
The full path to the cluster directory is necessary so that the full path appears in the inputs.txt file.  If the input does not contain the full path, the system can't find the files.
  
 
Now go to the fcl working area and generate the files. It should be part of your [[JobPlan|job plan]]
 
Now go to the fcl working area and generate the files. It should be part of your [[JobPlan|job plan]]
Line 202: Line 220:
 
             --inputs=inputs.txt \
 
             --inputs=inputs.txt \
 
             --merge=30 \
 
             --merge=30 \
template.fcl
+
            --embed template.fcl
 
</pre>
 
</pre>
  
Line 211: Line 229:
 
000  inputs.txt  seeds.mu2e.my-project-s2.v1.ZqcU.txt  template.fcl
 
000  inputs.txt  seeds.mu2e.my-project-s2.v1.ZqcU.txt  template.fcl
 
  > ls 000
 
  > ls 000
cnf.mu2e.my-project-s2.v1.001600_00089044.fcl
+
cnf.mu2e.my-project-s2.v1.001000_00089044.fcl
cnf.mu2e.my-project-s2.v1.001600_00089044.fcl.json
+
cnf.mu2e.my-project-s2.v1.001000_00089044.fcl.json
cnf.mu2e.my-project-s2.v1.001600_00089046.fcl
+
cnf.mu2e.my-project-s2.v1.001000_00089046.fcl
cnf.mu2e.my-project-s2.v1.001600_00089046.fcl.json
+
cnf.mu2e.my-project-s2.v1.001000_00089046.fcl.json
 
...
 
...
 
</pre>
 
</pre>
Line 260: Line 278:
 
We are ready to generate the fcl dataset.  Note the '@' symbol in the --aux parameter - it says
 
We are ready to generate the fcl dataset.  Note the '@' symbol in the --aux parameter - it says
 
that bgHitFiles should be devined in a PROLOG, as the included file expects instead of being
 
that bgHitFiles should be devined in a PROLOG, as the included file expects instead of being
appended at the end.
+
appended at the end.  The leading number is how many background files each job will need.
  
 
<pre>
 
<pre>
Line 268: Line 286:
 
             --merge=1 \
 
             --merge=1 \
 
             --aux=1:@bgHitFiles:backgrounds.txt \
 
             --aux=1:@bgHitFiles:backgrounds.txt \
template.fcl
+
            --embed template.fcl
 
</pre>
 
</pre>
  
Line 278: Line 296:
 
000  backgrounds.txt  template.fcl  inputs.txt  seeds.gandr.my-reco-test.v0.CRdP.txt
 
000  backgrounds.txt  template.fcl  inputs.txt  seeds.gandr.my-reco-test.v0.CRdP.txt
 
> ls 000
 
> ls 000
cnf.gandr.my-reco-test.v0.004001_00000000.fcl      cnf.gandr.my-reco-test.v0.004001_00000002.fcl.json
+
cnf.gandr.my-reco-test.v0.001000_00000000.fcl      cnf.gandr.my-reco-test.v0.001000_00000002.fcl.json
cnf.gandr.my-reco-test.v0.004001_00000000.fcl.json  cnf.gandr.my-reco-test.v0.004001_00000003.fcl
+
cnf.gandr.my-reco-test.v0.001000_00000000.fcl.json  cnf.gandr.my-reco-test.v0.001000_00000003.fcl
cnf.gandr.my-reco-test.v0.004001_00000001.fcl      cnf.gandr.my-reco-test.v0.004001_00000003.fcl.json
+
cnf.gandr.my-reco-test.v0.001000_00000001.fcl      cnf.gandr.my-reco-test.v0.001000_00000003.fcl.json
cnf.gandr.my-reco-test.v0.004001_00000001.fcl.json  cnf.gandr.my-reco-test.v0.004001_00000004.fcl
+
cnf.gandr.my-reco-test.v0.001000_00000001.fcl.json  cnf.gandr.my-reco-test.v0.001000_00000004.fcl
cnf.gandr.my-reco-test.v0.004001_00000002.fcl      cnf.gandr.my-reco-test.v0.004001_00000004.fcl.json
+
cnf.gandr.my-reco-test.v0.001000_00000002.fcl      cnf.gandr.my-reco-test.v0.001000_00000004.fcl.json
 
</pre>
 
</pre>
  
Line 300: Line 318:
 
==Save fcl==
 
==Save fcl==
 
In this step you will put the fcl in its final position, ready to use.  You will have a choice of where to keep the fcl.   
 
In this step you will put the fcl in its final position, ready to use.  You will have a choice of where to keep the fcl.   
* upload to persistent dCache and declare to SAM.  Use this for collaboration sponsored jobs, that will be uploaded to tape and will be used for a long time in the future
+
* '''Option 1''' - upload to persistent dCache and declare to SAM.  Use this for collaboration sponsored jobs, that will be uploaded to tape and will be used for a long time in the future.  If any of the descendants of the fcl will be declared to SAM, or uploaded to tape, then you will need to use this option.  If you want to use the mu2eMissingJobs tool for job recovery, it also requires this option.  If you choose this option and this is the first time you are writing your personal fcl files to persistent dCache, you will need to ask for your personal directory to be created, please send mail to [mailto:kutschke@fnal.gov,gandr@fnal.gov,rlc@fnal.gov mu2eDataAdmin].
* move fcl to scratch dCache.  Use this for jobs which are large, but temporary or personal.
+
* '''Option 2''' - move fcl files to scratch dCache.  Use this for jobs which are temporary or personal, not to be uploaded to tape.
* leave it on the data disk where you generated it. Only for projects with small (<10K) jobs.
+
* '''Option 3''' - make a tarball of fcl files, move that to scratch dCache, also for jobs which are not to be uploaded. Keeping the files as one tarball reduces your exposure to dCache rate and reliability issues.
 +
You will also make a list of the fcl files that will be used to submit jobs.
 +
The result is several files named fcllist.??, each containing the names of a subset of fcl files.  Each fclist file is used to submit a cluster of jobs.  For production, we typically use 10K jobs in each submission, but for smaller personal projects, you might this to be smaller. You can also split off a smaller set for tests.
  
You will also make a list of the fcl files that will be used to submit jobs.
+
Moving files to [[Dcache|dCache]] and writing to the [[SAM]] database requires [[Authentication]], see especially the [[Authentication#Grid_Workflows|grid notes]]
The result is several files named fcllist.??, each containing a set of fcl files.  Each fclit files is used submit a cluster of jobs.  For production we typically use 10K jobs in each submission, but for smaller personal projects, you might this to be smaller. You can also split off a smaller set for tests.
 
  
===Move and SAM declare===
+
===Option 1 -Move and SAM declare===
 
First create the list of fcl to submit:
 
First create the list of fcl to submit:
 
<pre>
 
<pre>
Line 314: Line 333:
 
This should take a few minutes per 100K files.   
 
This should take a few minutes per 100K files.   
  
If for some reason the files were already moved declared the files before you made the list, you can make the list with <code>mu2eDatasetList</code>.  If you moved then, but not declared them, you can make the list from the json files:
+
<span style="color:grey">
<pre>
+
A common recovery situation... If for some reason the files were already moved and declared before you made the list, you can make the list with <code style="color:grey">mu2eDatasetFileList</code> which reads the SAM records.  If you moved them, but not declared them, you can make the list from the json files:
 +
<pre style="color:grey">
 
(for dir in ???; do cd $dir; ls *.fcl.json | sed 's/\.json//'; cd ..; done) | mu2eabsname_disk  | split -l 10000 -d - fcllist.
 
(for dir in ???; do cd $dir; ls *.fcl.json | sed 's/\.json//'; cd ..; done) | mu2eabsname_disk  | split -l 10000 -d - fcllist.
 
</pre>
 
</pre>
 +
</span>
  
In the normal flow, move them to the persistent dCache area:
+
In the normal flow, move them to the persistent dCache area: (remember to setup ifdhc)
 
<pre>
 
<pre>
(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --disk >& upload.log &
+
(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --disk --ifdh >& upload.log &
 
</pre>
 
</pre>
 
This should run at about 100K to a few 100K files per day.
 
This should run at about 100K to a few 100K files per day.
 +
If you can't write to the output area in persistent dCache, see the note above about creating your personal directory.
 +
 +
Moving files to[[Dcache|dCache]] and writing to the [[SAM]] database requires [[Authentication]], see  especially the [[Authentication#Grid_Workflows|grid notes]]
 +
  
 
<pre>
 
<pre>
Line 330: Line 355:
 
This should run at about 100K files per day.
 
This should run at about 100K files per day.
  
===Move to scratch===
+
Moving files to [[Dcache|dCache]] and writing to the [[SAM]] database requires [[Authentication]], see especially the [[Authentication#Grid_Workflows|grid notes]]
 +
 
 +
===Option 2 - Move fcl files to scratch===
 
First create the list of fcl to submit:
 
First create the list of fcl to submit:
 
<pre>
 
<pre>
Line 337: Line 364:
 
This should take a few minutes per 100K files.   
 
This should take a few minutes per 100K files.   
  
Then move them to the scratch dCache disk, in the default area:
+
Then move them to the scratch dCache disk, in the default area: (remember to setup ifdhc)
 
<pre>
 
<pre>
(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --scratch >& upload.log &
+
(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --scratch --ifdh >& upload.log &
 
</pre>
 
</pre>
  
===Leave in place===
+
===Option 3 - Tar fcl files, move tarballs to scratch===
In this case you only need to create the list of fcl to submit:
+
Create a tarball for each set of files to be used in a submission.  Typically a submission is 10K fcl files or fewer.  This command will put each set of 10K fcl files in a tarball.
 
<pre>
 
<pre>
(for dir in ???; do ls $dir/*.fcl; done) | while read FF; do echo $PWD/$FF; done | split -l 1000 -d - fcllist.
+
ls -1 -d ??? | cut -c 1-2 | sort | uniq | while read NN; do echo $NN; tar -cjf fcllist_${NN}.bz2 ${NN}?; done
 
</pre>
 
</pre>
  
 +
Then move them to the scratch dCache disk, in your area:
 +
<pre>
 +
cp fcllist* /pnfs/mu2e/scratch/users/$USER
 +
</pre>
 +
When submitting each cluster of jobs, you can point to one of these fcl file tarballs.
  
 
==Return to workflow==
 
==Return to workflow==
Line 358: Line 390:
  
 
[[Category:Computing]]
 
[[Category:Computing]]
[[Category:Computing/Workflow]]
+
[[Category:Workflows]]

Latest revision as of 21:23, 15 October 2021

Introduction

This procedure is used to generate a set of fcl files, one for each job to be run in a project. The fcl might drive a simulation project that starts with a generator, or a later stage of simulation starting with the art file output of an earlier stage, or a file concatenation project, or even an analysis project. See SimulationFCL for the fcl patterns used in MDC2020.

You would typically come to this page through the MCProdWorkflow procedure.

To learn how to generate and reconstruct physics processes see SimulationFCL.

Preparation

A fcl file should be developed and verified with interactive jobs before it is prepared for a grid job. Several examples of simulation fcl files are under the JobConfig subdirectory of Production repo. See also SimulationFCL for the fcl patterns used in MDC2020.

If a fcl file is working interactively, it will also work on the grid, with the following caveat. Using the fcl with the production system imposes one extra requirement: all output files of the job should satisfy the Mu2e naming conventions. You must be familiar with the naming convention and its fields or this documentation will not make sense.

Files under the JobConfig directory, intended for production, for example, write output files like:

TFileService   : { fileName : "nts.owner.cd3-beam-g4s1.configuration.sequencer.root" }
...
fileName    : "sim.owner.cd3-beam-g4s1-mubeam.configuration.sequencer.art"

The values of the owner, configuration, and sequencer fields used in the prepared fcl file are not important, because they will be overridden later when the fcl is generated. The values of data_tier, description, and file_format will be used as-is and must be set correctly, please see file names for guidelines on how to use these fields. Use the "art" extension ("file format") for framework outputs (written by RootOutput modules) and "root" for TFileService (ntuple and histogram) file output files.

Setup

Setup the utilities

setup mu2e
muse setup  the appropriate build for your project
setup mu2etools
setup mu2efiletools

The setup of mu2etools above makes the generate_fcl command available in your path.

When you write to the SAM database, usually by mu2eFileDeclare, you will need to have a valid certificate.

Generating fcl

There are two invocation modes that require mutually exclusive sets of parameters

  • For jobs with EmptyEvent input source (events created by a generator) specify
    • --run-number that will be used for all the generated fcl files
    • --events-per-job
    • --njobs
  • Jobs with RootInput source (art files) require
    • --inputs a file containing a list of all input data full filespecs
    • --merge-factor how many input files should be analyzed by a single job

The number of generated fcl files will be determined from the above inputs.

The following parameters are used to construct the names of fcl files produced by the generate_fcl invocation. Please see file names for guidelines on how to use these fields.

  • --description the "description" field of fcl file names
  • --dsconf the "configuration" field of fcl file names
  • --dsowner the "owner" field of fcl file names. This parameter defaults to the user who executes generate_fcl, but can be overridden. For example, --dsowner=mu2e should be used to generate an official production dataset of fcl files.

One set of fcl files can be used to produce a series of different output datasets by changing the version of Offline used with the fcl. Conceptually then, the description and dsconf should be relevant to the fcl files themselves, and not the output dataset you have in mind today. For example, the same fcl file generating protons from stopped muons can be run with different version of Offline whcih contain different versions of geant, producing two different datasets. So the fcl name should reflect "protons from stopped muons" not "geant study".


The --old-seeds parameter can be used for incremental generation of fcl datasets. For example, one can generate a test batch of 1000 fcl files and run them through the grid. If the result is satisfactory, and one wants to increase the statistics to 10,000 jobs, care should be taken to guarantee the uniqueness of random seeds across all the 10,000 jobs. Each run of generate_fcl produces a text file that contains the values of all random seeds used so far for the current set of jobs. So when generate_fcl is used the second time to add 9,000 jobs to the dataset, one should use the file with 1000 seeds from the first run for the --old-seeds parameter to make sure those seeds are not re-used. (Also, --first-subrun should be adjusted so that subrun numbers do not repeat.) The second run will dump a list of 10,000 seeds, which can be used in a subsequent generation if a further increase in statistics is desired. For the initial run you can specify --old-seeds=/dev/null.

Please take a look at RunNumbers before choosing what run number to use. 1000 is the recommended default for simulation with no run dependence.


Note: If you use the auxinput switch of generate_fcl in order to provide auxiliary input files (such as mixing files or stopped muon files) to your job and declare your fcl files to SAM, then the files used in the auxinput switch should already be declared to SAM. The auxinput files will be considered as parents (or precursors) to your fcl file, so the fcl file SAM record should record that fact. For standard mixing files, the upload and declare is already done for you. If you use personal files, then they should be uploaded and declared before using them in the fcl. For a personal, temporary job, you also can chose to not declare your fcl files, which will avoid this issue.

Run generate_fcl --help to see all the options.

Examples

Create a working dir. The data disk is a good place to work since we want fast response and some moderate space.

mkdir -p /mu2e/data/users/$USER/projects/my_project/fcl/job
cd /mu2e/data/users/$USER/projects/my_project/fcl/job

Please take a look at RunNumbers before choosing what run number to use. 1000 is the recommended default for simulation with no run dependence.


Example 1 - generator

A first stage simulation job with no input files.

Prepare a template file. Usually it can be a single line file with just an include directive, in our case

#include "Production/JobConfig/primary/CeEndpoint.fcl"

but one can also add e.g. geometry file overrides, or even write a completely new fcl configuration and use it as a template. In this example we use template.fcl as the template file name. Note that the include directive should specify include file pathname relative to the Muse build directory that you setup earlier. (More precisely, relative to a directory listed in the FHICL_FILE_PATH.) Absolute filenames do not work in fhicl #include.


Now generate the files:

generate_fcl --description=my-test-s1 \
             --dsconf=v0 \
             --run=1000 \
             --events=1000 \
             --njobs=5 \
             --embed template.fcl

After the command completes, we will see something like

> ls
000  template.fcl  seeds.gandr.my-test-s1.v0.Td6j.txt
> ls 000
cnf.gandr.my-test-s1.v0.001000_00000000.fcl       cnf.gandr.my-test-s1.v0.001000_00000002.fcl.json
cnf.gandr.my-test-s1.v0.001000_00000000.fcl.json  cnf.gandr.my-test-s1.v0.001000_00000003.fcl
cnf.gandr.my-test-s1.v0.001000_00000001.fcl       cnf.gandr.my-test-s1.v0.001000_00000003.fcl.json
cnf.gandr.my-test-s1.v0.001000_00000001.fcl.json  cnf.gandr.my-test-s1.v0.001000_00000004.fcl
cnf.gandr.my-test-s1.v0.001000_00000002.fcl       cnf.gandr.my-test-s1.v0.001000_00000004.fcl.json

The generated fcl files and their corresponding json files are written into subdirectories 000, 001, etc, with up to 1000 fcl files per subdirectory. Random number seeds used for all the fcl files have been dumped into the "seeds" file.

Example 2 - secondary stage or concatenation

This is an example for running a later stage of a simulation job, where the the input file comes from an earlier stage. Concatenation can be thought of as a stage of simulation, just a particularly a simple one.

Prepare a template file. Usually it can be a single line file with just an include directive, in our case

#include "JobConfig/beam/beam_g4s2.fcl"

but one can also add e.g. geometry file overrides, or even write a completely new fcl configuration and use it as a template. In this example we use template.fcl as the template file name. Note that the include directive should specify include file pathname relative to the Offline directory that you setup earlier. (More precisely, relative to a directory listed in the FHICL_FILE_PATH.) Absolute filenames do not work in fhicl #include.

If the jobs is simply concatenation, the template file will be, with a choice of output file name:

#include "JobConfig/common/artcat.fcl"
outputs.out.fileName: "sim.DSOWNER.cd3-beam-cs3-mothers.DSCONF.SEQ.art"

Please look inside this fcl for directions on how to deal with the output file name. (Since the description is not known when the fcl file is written, you have to specify it in the fcl file.)

In all cases, you will also need a list of input files. If the list comes from a previously-uploaded dataset, it will be in SAM and in tape-backed dCache, so we can use SAM to generate the file list:

mu2eDatasetFileList sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art > inputs.txt

If the input dataset is on disk in dCache because it is the output of a previous stage that just completed, you can make the input file list from that area. If you ran mu2eCheckAndMove, it should still be under the "good" subdirectory.

mu2eClusterFileList --dsname <clusterDirFullPath>  >  <myPath>/inputs.txt

For example:

cd /pnfs/mu2e/persistent/users/mu2epro/workflow/beam_g4s1_g4v10_p03_validation_rc1/good
mu2eClusterFileList --dsname sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art $PWD/1123456 > \
  /mu2e/data/users/mu2epro/projects/geant_val_p03_s2/fcl/job/inputs.txt

The full path to the cluster directory is necessary so that the full path appears in the inputs.txt file. If the input does not contain the full path, the system can't find the files.

Now go to the fcl working area and generate the files. It should be part of your job plan to figure out how many of the files from the earlier stage should be used in this stage, which is the --merge factor.

generate_fcl --desc=my-project-s2 \
             --dsconf=v1 \
             --inputs=inputs.txt \
             --merge=30 \
             --embed template.fcl


After the command completes, we will see something like

 > ls
000  inputs.txt  seeds.mu2e.my-project-s2.v1.ZqcU.txt  template.fcl
 > ls 000
cnf.mu2e.my-project-s2.v1.001000_00089044.fcl
cnf.mu2e.my-project-s2.v1.001000_00089044.fcl.json
cnf.mu2e.my-project-s2.v1.001000_00089046.fcl
cnf.mu2e.my-project-s2.v1.001000_00089046.fcl.json
...

The generated fcl files and their corresponding json files are written into subdirectories 000, 001, etc, with up to 1000 fcl files per subdirectory. Random number seeds used for all the fcl files have been dumped into the "seeds" file.

Example 3 - mixing

A digitization+reconstruction job on a conversion electron file, with background mixing.

Prepare a template file. We want to use JobConfig/cd3/beam/dra_mix_baseline.fcl, but in many Offline releases this file is, strictly speaking, not a valid fcl because it references a variable bgHitFiles that is not defined. (A legacy of the mu2eart way of running grid jobs.) To fix that, we define the variable before including the "baseline" file. This example template.fcl also shows how to set names of output histogram files.

BEGIN_PROLOG
bgHitFiles: @nil
END_PROLOG
#include "JobConfig/cd3/beam/dra_mix_baseline.fcl"
services.TFileService.fileName: "nts.owner.my-ce-reco.ver.seq.root"

We also need a list of input files, and a list of background overlay files. We want to run 5 jobs with one input file per job, so we need to shorten the conversion list to have only 5 input files. We only need one background hits file per job, but there is no harm of listing more background overlay files than necessary, so we will use a complete detmix-cut dataset:

setup mu2efiletools
mu2eDatasetFileList sim.mu2e.cd3-beam-g4s4-detconversion.v566.art | head -n 5 > inputs.txt
mu2eDatasetFileList sim.mu2e.cd3-detmix-cut.v566b.art > backgrounds.txt

We are ready to generate the fcl dataset. Note the '@' symbol in the --aux parameter - it says that bgHitFiles should be devined in a PROLOG, as the included file expects instead of being appended at the end. The leading number is how many background files each job will need.

generate_fcl --desc=my-reco-test \
             --dsconf=v0 \
             --inputs=inputs.txt \
             --merge=1 \
             --aux=1:@bgHitFiles:backgrounds.txt \
             --embed template.fcl


Take a look:

> ls
000  backgrounds.txt  template.fcl  inputs.txt  seeds.gandr.my-reco-test.v0.CRdP.txt
> ls 000
cnf.gandr.my-reco-test.v0.001000_00000000.fcl       cnf.gandr.my-reco-test.v0.001000_00000002.fcl.json
cnf.gandr.my-reco-test.v0.001000_00000000.fcl.json  cnf.gandr.my-reco-test.v0.001000_00000003.fcl
cnf.gandr.my-reco-test.v0.001000_00000001.fcl       cnf.gandr.my-reco-test.v0.001000_00000003.fcl.json
cnf.gandr.my-reco-test.v0.001000_00000001.fcl.json  cnf.gandr.my-reco-test.v0.001000_00000004.fcl
cnf.gandr.my-reco-test.v0.001000_00000002.fcl       cnf.gandr.my-reco-test.v0.001000_00000004.fcl.json

Test fcl

It is highly recommended to test a newly generated fcl file by running a small interactive job. Following up on Example 1 above, one can do

mkdir test
cd test
/usr/bin/time mu2e -c `ls ../000/*.fcl | head -1`

to run a full size job, or add a -n 10 option to the mu2e command line to quickly make sure that there are no obvious problems with the configuration.


Save fcl

In this step you will put the fcl in its final position, ready to use. You will have a choice of where to keep the fcl.

  • Option 1 - upload to persistent dCache and declare to SAM. Use this for collaboration sponsored jobs, that will be uploaded to tape and will be used for a long time in the future. If any of the descendants of the fcl will be declared to SAM, or uploaded to tape, then you will need to use this option. If you want to use the mu2eMissingJobs tool for job recovery, it also requires this option. If you choose this option and this is the first time you are writing your personal fcl files to persistent dCache, you will need to ask for your personal directory to be created, please send mail to mu2eDataAdmin.
  • Option 2 - move fcl files to scratch dCache. Use this for jobs which are temporary or personal, not to be uploaded to tape.
  • Option 3 - make a tarball of fcl files, move that to scratch dCache, also for jobs which are not to be uploaded. Keeping the files as one tarball reduces your exposure to dCache rate and reliability issues.

You will also make a list of the fcl files that will be used to submit jobs. The result is several files named fcllist.??, each containing the names of a subset of fcl files. Each fclist file is used to submit a cluster of jobs. For production, we typically use 10K jobs in each submission, but for smaller personal projects, you might this to be smaller. You can also split off a smaller set for tests.

Moving files to dCache and writing to the SAM database requires Authentication, see especially the grid notes

Option 1 -Move and SAM declare

First create the list of fcl to submit:

(for dir in ???; do cd $dir; ls *.fcl; cd ..; done) | mu2eabsname_disk  | split -l 10000 -d - fcllist.

This should take a few minutes per 100K files.

A common recovery situation... If for some reason the files were already moved and declared before you made the list, you can make the list with mu2eDatasetFileList which reads the SAM records. If you moved them, but not declared them, you can make the list from the json files:

(for dir in ???; do cd $dir; ls *.fcl.json | sed 's/\.json//'; cd ..; done) | mu2eabsname_disk  | split -l 10000 -d - fcllist.

In the normal flow, move them to the persistent dCache area: (remember to setup ifdhc)

(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --disk --ifdh >& upload.log &

This should run at about 100K to a few 100K files per day. If you can't write to the output area in persistent dCache, see the note above about creating your personal directory.

Moving files todCache and writing to the SAM database requires Authentication, see especially the grid notes


(for dir in ???; do ls $dir/*.fcl.json; done) | mu2eFileDeclare >& declare.log &

This should run at about 100K files per day.

Moving files to dCache and writing to the SAM database requires Authentication, see especially the grid notes

Option 2 - Move fcl files to scratch

First create the list of fcl to submit:

(for dir in ???; do cd $dir; ls *.fcl; cd ..; done) | mu2eabsname_scratch  | split -l 10000 -d - fcllist.

This should take a few minutes per 100K files.

Then move them to the scratch dCache disk, in the default area: (remember to setup ifdhc)

(for dir in ???; do ls $dir/*.fcl; done) | mu2eFileUpload --scratch --ifdh >& upload.log &

Option 3 - Tar fcl files, move tarballs to scratch

Create a tarball for each set of files to be used in a submission. Typically a submission is 10K fcl files or fewer. This command will put each set of 10K fcl files in a tarball.

ls -1 -d ??? | cut -c 1-2 | sort | uniq | while read NN; do echo $NN; tar -cjf fcllist_${NN}.bz2 ${NN}?; done

Then move them to the scratch dCache disk, in your area:

cp fcllist* /pnfs/mu2e/scratch/users/$USER

When submitting each cluster of jobs, you can point to one of these fcl file tarballs.

Return to workflow

At the end of this procedure, you should have a set of files, each containing a list of fcl files.