GenerateFcl: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
==Introduction== | |||
This procedure is used to generate a set of [[FclIntro|fcl]] files, one for each [[Grids|job]] to be run in project. | |||
The fcl might drive a simulation project that starts with a generator, or a later stage of simulation starting | |||
with the art file output of an earlier stage, or a file concatenation project, or even an analysis project. | |||
==Preparation== | ==Preparation== | ||
A | A [[FclIntro|fcl file]] should be developed and verified | ||
with interactive jobs. Using | with interactive jobs before it is prepared for a grid job. | ||
Several examples of simulation fcl files are under the <code>JobConfig</code> | |||
subdirectory of Offline. | |||
Using the fcl with the production system imposes | |||
one extra requirement: all output files of the job should satisfy | one extra requirement: all output files of the job should satisfy | ||
the [[FileNames|Mu2e naming conventions]]. | the [[FileNames|Mu2e naming conventions]]. | ||
Line 10: | Line 19: | ||
<code>data_tier</code>, <code>description</code>, | <code>data_tier</code>, <code>description</code>, | ||
and <code>file_format</code> will be used as is and must be set | and <code>file_format</code> will be used as is and must be set | ||
correctly. Use the "art" extension ("file format") for framework | correctly, please see [[FileNames|file names]] for guidelines on how to use these fields. | ||
Use the "art" extension ("file format") for framework | |||
outputs (written by <code>RootOutput</code> modules) and "root" | outputs (written by <code>RootOutput</code> modules) and "root" | ||
for <code>TFileService</code> | for <code>TFileService</code> (ntuple and histogram) file output files. | ||
==Setup== | ==Setup== | ||
Setup the utilities | |||
<pre> | <pre> | ||
setup mu2e | |||
source the appropriate Offline setup.sh file for your project | |||
setup mu2etools | setup mu2etools | ||
</pre> | </pre> | ||
The above makes | The above makes the <code>generate_fcl</code> command available in your path. | ||
== | ==Generating fcl== | ||
There are two invocation modes that require mutually exclusive sets of | There are two invocation modes that require mutually exclusive sets of parameters | ||
parameters | |||
<ul> | <ul> | ||
<li>For jobs with <code>EmptyEvent</code> input source (events created by a generator) specify | |||
<li>For jobs with <code>EmptyEvent</code> input source | |||
<ul> | <ul> | ||
<li><code>--run-number</code> that will be used for all the generated fcl files</li> | <li><code>--run-number</code> that will be used for all the generated fcl files</li> | ||
Line 39: | Line 47: | ||
</li> | </li> | ||
<li>Jobs with <code>RootInput</code> | <li>Jobs with <code>RootInput</code> source (art files) require</li> | ||
<ul> | <ul> | ||
<li><code>--inputs</code> a list of input data | <li><code>--inputs</code> a file containing a list of all input data full filespecs</li> | ||
<li><code>--merge-factor</code> how many input files should be analyzed by a single job</li> | <li><code>--merge-factor</code> how many input files should be analyzed by a single job</li> | ||
</ul> | </ul> | ||
</ul> | </ul> | ||
The number of generated fcl files will be determined from the above inputs. | |||
The following parameters are used to construct the names of fcl | The following parameters are used to construct the names of fcl | ||
files produced by the <code>generate_fcl</code> invocation. | files produced by the <code>generate_fcl</code> invocation. | ||
Please see [[FileNames|file names]] for guidelines on how to use these fields. | |||
<ul> | <ul> | ||
<li><code>--description</code> the "description" field of fcl file names</li> | <li><code>--description</code> the "description" field of fcl file names</li> | ||
<li><code>--dsconf</code> the "configuration" field</li> | <li><code>--dsconf</code> the "configuration" field</li> | ||
In production, this field is often reduced to a date (mmdd format) and | |||
a version letter, for example "1231a", by convention. What these tags | |||
mean has to be recorded elsewhere. | |||
<li><code>--dsowner</code> the "owner" field. This parameter | <li><code>--dsowner</code> the "owner" field. This parameter | ||
defaults to the user who executes <code>generate_fcl</code>, but | defaults to the user who executes <code>generate_fcl</code>, but | ||
can be overridden. For example, <code>--dsowner=mu2e</code> | can be overridden. For example, <code>--dsowner=mu2e</code> | ||
should be used to generate an official dataset of fcl files. | should be used to generate an official production dataset of fcl files. | ||
</li> | </li> | ||
</ul> | </ul> | ||
Line 80: | Line 92: | ||
Run <code>generate_fcl --help</code> to see all the options. | Run <code>generate_fcl --help</code> to see all the options. | ||
== | ==Examples== | ||
Create a working dir. The data disk is a good place to work | |||
since we want fast response and some moderate space. | |||
<pre> | <pre> | ||
mkdir -p /mu2e/data/users/`whoami`/fclds | mkdir -p /mu2e/data/users/`whoami`/fclds/my_project_name | ||
cd /mu2e/data/users/`whoami`/fclds | cd /mu2e/data/users/`whoami`/fclds | ||
mkdir 20161121- | mkdir 20161121-my_project_name | ||
cd 20161121- | cd 20161121-my_project_name | ||
</pre> | </pre> | ||
===Example 1 - generator=== | |||
A first stage simulation job with no input files. | |||
Prepare a template file. Usually it can be a single | Prepare a template file. Usually it can be a single | ||
Line 98: | Line 114: | ||
</pre> | </pre> | ||
but one can also add e.g. geometry file overrides, or even write a | but one can also add e.g. geometry file overrides, or even write a | ||
completely new fcl configuration and use it as a template. In this example | completely new fcl configuration and use it as a template. | ||
In this example | |||
we use <code>template.fcl</code> as the template file name. | we use <code>template.fcl</code> as the template file name. | ||
Note that the include directive should specify include file pathname | Note that the include directive should specify include file pathname | ||
relative to the Offline directory. (More precisely, relative to | relative to the Offline directory that you setup earlier. (More precisely, relative to | ||
a directory listed in the FHICL_FILE_PATH.) <em>Absolute filenames | a directory listed in the FHICL_FILE_PATH.) <em>Absolute filenames | ||
do not work in fhicl #include.</em> | do not work in fhicl #include.</em> | ||
Line 117: | Line 134: | ||
After the command completes, we will see something like | After the command completes, we will see something like | ||
<pre> | <pre> | ||
> ls | |||
000 template.fcl seeds.gandr.my-test-s1.v0.Td6j.txt | 000 template.fcl seeds.gandr.my-test-s1.v0.Td6j.txt | ||
> ls 000 | |||
cnf.gandr.my-test-s1.v0.002700_00000000.fcl cnf.gandr.my-test-s1.v0.002700_00000002.fcl.json | cnf.gandr.my-test-s1.v0.002700_00000000.fcl cnf.gandr.my-test-s1.v0.002700_00000002.fcl.json | ||
cnf.gandr.my-test-s1.v0.002700_00000000.fcl.json cnf.gandr.my-test-s1.v0.002700_00000003.fcl | cnf.gandr.my-test-s1.v0.002700_00000000.fcl.json cnf.gandr.my-test-s1.v0.002700_00000003.fcl | ||
Line 132: | Line 148: | ||
been dumped into the "seeds" file. | been dumped into the "seeds" file. | ||
==Example 2== | |||
===Example 2 - mixing === | |||
A digitization+reconstruction job on a conversion electron file, with | A digitization+reconstruction job on a conversion electron file, with | ||
background mixing. | background mixing. | ||
Prepare a template file. We want to use | Prepare a template file. We want to use | ||
Line 192: | Line 201: | ||
<pre> | <pre> | ||
> ls | |||
000 backgrounds.txt template.fcl inputs.txt seeds.gandr.my-reco-test.v0.CRdP.txt | 000 backgrounds.txt template.fcl inputs.txt seeds.gandr.my-reco-test.v0.CRdP.txt | ||
> ls 000 | |||
cnf.gandr.my-reco-test.v0.004001_00000000.fcl cnf.gandr.my-reco-test.v0.004001_00000002.fcl.json | cnf.gandr.my-reco-test.v0.004001_00000000.fcl cnf.gandr.my-reco-test.v0.004001_00000002.fcl.json | ||
cnf.gandr.my-reco-test.v0.004001_00000000.fcl.json cnf.gandr.my-reco-test.v0.004001_00000003.fcl | cnf.gandr.my-reco-test.v0.004001_00000000.fcl.json cnf.gandr.my-reco-test.v0.004001_00000003.fcl | ||
Line 211: | Line 219: | ||
mkdir test | mkdir test | ||
cd test | cd test | ||
/usr/bin/time mu2e -c | /usr/bin/time mu2e -c `ls ../000/*.fcl | head -1` | ||
</pre> | </pre> | ||
to run a full size job, or add a <code>-n 10</code> option to the <code>mu2e</code> command line | to run a full size job, or add a <code>-n 10</code> option to the <code>mu2e</code> command line | ||
to quickly make sure that there are no obvious problems with the configuration. | to quickly make sure that there are no obvious problems with the configuration. |
Revision as of 21:25, 3 April 2017
Introduction
This procedure is used to generate a set of fcl files, one for each job to be run in project. The fcl might drive a simulation project that starts with a generator, or a later stage of simulation starting with the art file output of an earlier stage, or a file concatenation project, or even an analysis project.
Preparation
A fcl file should be developed and verified
with interactive jobs before it is prepared for a grid job.
Several examples of simulation fcl files are under the JobConfig
subdirectory of Offline.
Using the fcl with the production system imposes
one extra requirement: all output files of the job should satisfy
the Mu2e naming conventions.
The values of the owner
, configuration
,
and sequencer
fields used in the prepared fcl file are
not important, because they will be overridden later. The values of
data_tier
, description
,
and file_format
will be used as is and must be set
correctly, please see file names for guidelines on how to use these fields.
Use the "art" extension ("file format") for framework
outputs (written by RootOutput
modules) and "root"
for TFileService
(ntuple and histogram) file output files.
Setup
Setup the utilities
setup mu2e source the appropriate Offline setup.sh file for your project setup mu2etools
The above makes the generate_fcl
command available in your path.
Generating fcl
There are two invocation modes that require mutually exclusive sets of parameters
- For jobs with
EmptyEvent
input source (events created by a generator) specify--run-number
that will be used for all the generated fcl files--events-per-job
--njobs
- Jobs with
RootInput
source (art files) require --inputs
a file containing a list of all input data full filespecs--merge-factor
how many input files should be analyzed by a single job
The number of generated fcl files will be determined from the above inputs.
The following parameters are used to construct the names of fcl
files produced by the generate_fcl
invocation.
Please see file names for guidelines on how to use these fields.
--description
the "description" field of fcl file names--dsconf
the "configuration" field
In production, this field is often reduced to a date (mmdd format) and
a version letter, for example "1231a", by convention. What these tags
mean has to be recorded elsewhere.
--dsowner
the "owner" field. This parameter defaults to the user who executesgenerate_fcl
, but can be overridden. For example,--dsowner=mu2e
should be used to generate an official production dataset of fcl files.
The --old-seeds
parameter can be used
for incremental generation of fcl datasets. For example, one can
generate a test batch of 1000 fcl files and run them through the
grid. If the result is satisfactory, and one wants to increase the
statistics to 10,000 jobs, care should be taken to guarantee the
uniqueness of random seeds across all the 10,000 jobs. Each run
of generate_fcl
produces a text file that contains the
values of all random seeds used so far for the current set of jobs.
So when generate_fcl
is used the second time to add
9,000 jobs to the dataset, one should use the file with 1000 seeds
from the first run for the --old-seeds
parameter to
make sure those seeds are not re-used.
(Also, --first-subrun
should be adjusted so that subrun
numbers do not repeat.) The second run will dump a list of 10,000
seeds, which can be used in a subsequent generation if a further
increase in statistics is desired. For the initial run you can
specify --old-seeds=/dev/null
.
Run generate_fcl --help
to see all the options.
Examples
Create a working dir. The data disk is a good place to work since we want fast response and some moderate space.
mkdir -p /mu2e/data/users/`whoami`/fclds/my_project_name cd /mu2e/data/users/`whoami`/fclds mkdir 20161121-my_project_name cd 20161121-my_project_name
Example 1 - generator
A first stage simulation job with no input files.
Prepare a template file. Usually it can be a single line file with just an include directive, in our case
#include "JobConfig/cd3/pions/pions_g4s1.fcl"
but one can also add e.g. geometry file overrides, or even write a
completely new fcl configuration and use it as a template.
In this example
we use template.fcl
as the template file name.
Note that the include directive should specify include file pathname
relative to the Offline directory that you setup earlier. (More precisely, relative to
a directory listed in the FHICL_FILE_PATH.) Absolute filenames
do not work in fhicl #include.
Now generate the files:
generate_fcl --description=my-test-s1 \ --dsconf=v0 \ --run=2700 \ --events=1000 \ --njobs=5 \ template.fcl
After the command completes, we will see something like
> ls 000 template.fcl seeds.gandr.my-test-s1.v0.Td6j.txt > ls 000 cnf.gandr.my-test-s1.v0.002700_00000000.fcl cnf.gandr.my-test-s1.v0.002700_00000002.fcl.json cnf.gandr.my-test-s1.v0.002700_00000000.fcl.json cnf.gandr.my-test-s1.v0.002700_00000003.fcl cnf.gandr.my-test-s1.v0.002700_00000001.fcl cnf.gandr.my-test-s1.v0.002700_00000003.fcl.json cnf.gandr.my-test-s1.v0.002700_00000001.fcl.json cnf.gandr.my-test-s1.v0.002700_00000004.fcl cnf.gandr.my-test-s1.v0.002700_00000002.fcl cnf.gandr.my-test-s1.v0.002700_00000004.fcl.json
The generated fcl files and their corresponding json files are written into subdirectories 000, 001, etc, with up to 1000 fcl files per subdirectory. Random number seeds used for all the fcl files have been dumped into the "seeds" file.
Example 2 - mixing
A digitization+reconstruction job on a conversion electron file, with background mixing.
Prepare a template file. We want to use
JobConfig/cd3/beam/dra_mix_baseline.fcl
, but in many
Offline releases this file is, strictly speaking, not a valid fcl
because it references a variable bgHitFiles
that is not
defined. (A legacy of the mu2eart
way of running grid
jobs.) To fix that, we define the variable before including the
"baseline" file. This example template.fcl
also
shows how to set names of output histogram files.
BEGIN_PROLOG bgHitFiles: @nil END_PROLOG #include "JobConfig/cd3/beam/dra_mix_baseline.fcl" services.TFileService.fileName: "nts.owner.my-ce-reco.ver.seq.root"
We also need a list of input files, and a list of background overlay files. We want to run 5 jobs with one input file per job, so we need to shorten the conversion list to have only 5 input files. We only need one background hits file per job, but there is no harm of listing more background overlay files than necessary, so we will use a complete detmix-cut dataset:
setup mu2efiletools mu2eDatasetFileList sim.mu2e.cd3-beam-g4s4-detconversion.v566.art | head -n 5 > inputs.txt mu2eDatasetFileList sim.mu2e.cd3-detmix-cut.v566b.art > backgrounds.txt
We are ready to generate the fcl dataset. Note the '@' symbol in the --aux parameter - it says that bgHitFiles should be devined in a PROLOG, as the included file expects instead of being appended at the end.
generate_fcl --desc=my-reco-test \ --dsconf=v0 \ --inputs=inputs.txt \ --merge=1 \ --aux=1:@bgHitFiles:backgrounds.txt \ template.fcl
Take a look:
> ls 000 backgrounds.txt template.fcl inputs.txt seeds.gandr.my-reco-test.v0.CRdP.txt > ls 000 cnf.gandr.my-reco-test.v0.004001_00000000.fcl cnf.gandr.my-reco-test.v0.004001_00000002.fcl.json cnf.gandr.my-reco-test.v0.004001_00000000.fcl.json cnf.gandr.my-reco-test.v0.004001_00000003.fcl cnf.gandr.my-reco-test.v0.004001_00000001.fcl cnf.gandr.my-reco-test.v0.004001_00000003.fcl.json cnf.gandr.my-reco-test.v0.004001_00000001.fcl.json cnf.gandr.my-reco-test.v0.004001_00000004.fcl cnf.gandr.my-reco-test.v0.004001_00000002.fcl cnf.gandr.my-reco-test.v0.004001_00000004.fcl.json
Test fcl
It is highly recommended to test a newly generated fcl file by running a small interactive job. Following up on Example 1 above, one can do
mkdir test cd test /usr/bin/time mu2e -c `ls ../000/*.fcl | head -1`
to run a full size job, or add a -n 10
option to the mu2e
command line
to quickly make sure that there are no obvious problems with the configuration.