ProductionProceduresMC: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 5: Line 5:
This guide describes how to run Monte Carlo (MC) production using POMS. Jobs fall into two categories:
This guide describes how to run Monte Carlo (MC) production using POMS. Jobs fall into two categories:


== Simple jobs ==
== Base-template jobs ==
* **Definition:** Process a single input file to produce one output via a standard FCL template.   
* Definition: Process a single input file to produce one output via a standard FCL template.   
* **Driver:** <code>Production/Scripts/run_RecoEntuple.py</code> (consider renaming to <code>run_DigiReco.py</code>).   
* Driver: <code>Production/Scripts/run_RecoEntuple.py</code> (consider renaming to <code>run_DigiReco.py</code>).   
* **Output storage:** Write results to persistent storage to avoid many small tape files; later concatenate before archiving.   
* Output storage: Write results to persistent storage to avoid many small tape files; later concatenate before archiving.   
* **Examples:** digitization, reconstruction, event-ntupling.   
* Examples: digitization, reconstruction, event-ntupling.   
* **Example campaign:** [https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24194 POMS Campaign 24194]
* Example campaign: [https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24194 POMS Campaign 24194]


=== Stage Parameter Overrides ===
=== Stage Parameter Overrides ===
Line 27: Line 27:
* Remaining overrides feed into <code>run_RecoEntuple.py</code>
* Remaining overrides feed into <code>run_RecoEntuple.py</code>


The split types that we use are:
=== Split Types ===
Split Type: drainingn(500)
There are multiple split types in POMS, but we've been most using <code>drainingn</code> and <code>nfiles</code>
* <code>draining(n)</code> – pulls at most <code>n</code> files per iteration and tracks delivered files via snapshots.


, which is described through `Edit Campaign Stage` and in POMS docs:
This type, when filled out as drainign(n) for some integer
      n, will pull at most n files at a time from the dataset
      and deliver them on each iteration, keeping track of the
      delivered files with a snapshot.


To modify campaign, the preferred option is to use GUI editor on the main page, which will bring you the below:
To modify campaign, the preferred option is to use GUI editor on the main page, which will bring you the below:
Line 42: Line 38:
Then double click on <mark>digi</mark> cell to modify campaign parameters
Then double click on <mark>digi</mark> cell to modify campaign parameters


=== Complex jobs ===
== Extended-template jobs ==
* Require unique, job-specific parameters and configurations.
* Definition: Require unique, job-specific parameters and configurations.
* '''Examples:''' stage-1 processing, stage-2 resampling, mixing.
* Examples: stage-1 processing, stage-2 resampling, mixing.
 
* Example campaign: [https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24200 POMS Campaign 24200]
POMS campaign example[https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24200]


==== Primaries ====
==== Primaries ====


We resample primary from particle stops.
We resample primary from particle stops.
We use gen_Resampler.sh to produce a parameter file
Generate resampler par file with <code>gen_Resampler.sh</code>


Example:
Example:
  gen_Resampler.sh --json /exp/mu2e/app/users/oksuzian/muse_080224/Production/data/primary_dio.json --json_index 0
  gen_Resampler.sh --json /exp/mu2e/app/users/oksuzian/muse_080224/Production/data/primary_dio.json --json_index 0


json file index 0 looks like:
Sample JSON entry:
 
     {
     {
         "dsconf": "MDC2020at",
         "dsconf": "MDC2020at",
Line 71: Line 65:
         "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020at/setup.sh"
         "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020at/setup.sh"
     }
     }
, and essentially sets the parameter for '''gen_Resampler.sh'''
, and essentially sets the parameters for '''gen_Resampler.sh'''


will produce:
Produces:
  cnf.mu2e.DIOtail95.MDC2020at.0.tar
  cnf.mu2e.DIOtail95.MDC2020at.0.tar
  cnf.mu2e.DIOtail95.MDC2020at.fcl
  cnf.mu2e.DIOtail95.MDC2020at.fcl


fcl file can be used for testing
fcl file above is intended for testing.
If happy upload par file to disk:
 
  gen_Resampler.sh --json Production/data/primary_dio.json --json_index 0
After the test, upload par file to disk:
  gen_Resampler.sh --json Production/data/primary_dio.json --json_index 0 --pushout
 
 
json files are located in '''Production/data''', and can be investigated as such
 
List all available primaries:
jq 'map(.desc)' data/resampler.json
[
  "RMCFlatGammaStops",
  "RMCFlatGammaResampling",
  "RMCIPAFlatGammaResampling",
  "RMCWireFlatGammaResampling",
  "RMCInternalEndpoint",
  "RMCExternalEndpoint",
  "RMCInternal",
  "RMCExternal",
  "DIOtail95",
  "IPAMuminusMichel",
  "CePlusEndpoint",
  "CeEndpoint",
  "CePLeadingLog",
  "CeMLeadingLog",
  "RPCInternal",
  "RPCExternal"
]
 
Locate map index that corresponds to '''CeEndpoint''':
 
jq 'map(.desc) | index("CeEndpoint")' data/resampler.json
11


==== Merging ====
==== Merging ====
Line 99: Line 123:
, and essentially sets the parameter for '''gen_Merge.sh'''
, and essentially sets the parameter for '''gen_Merge.sh'''


will produce:
Produces:
  cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.0.tar
  cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.0.tar
  cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.fcl
  cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.fcl


fcl file can be used for testing
Upload with pushout:
If happy upload par file to disk:
  gen_Merge.sh --json Production/data/merge_filter.json --json_index 4 --pushout
  gen_Merge.sh --json Production/data/merge_filter.json --json_index 4 --pushout


===Index datasets===
===Index datasets===
Complex job type run of the index datasets as such:
Extended-template job type run of the index datasets as such:
$ samdes idx_map042425.txt
$ samdes idx_map042425.txt
Definition Name: idx_map042425.txt
Definition Name: idx_map042425.txt
Line 133: Line 156:


=== Scripts/run_JITfcl.py ===
=== Scripts/run_JITfcl.py ===
This script drives complex job types of the index definitions.
This script drives extended-template job types of the index definitions.
On the grid it:
On the grid it:
* Extracts the parameter filename and local index from the map, i.e. /exp/mu2e/data/users/oksuzian/poms_map/merged_map042425.txt  
* Extracts the parameter filename and local index from the map, i.e. /exp/mu2e/data/users/oksuzian/poms_map/merged_map042425.txt  
Line 139: Line 162:
* Runs and pushOut all the relevant output: art, root, log
* Runs and pushOut all the relevant output: art, root, log


=== Current datasets ===
== Monitoring ==
* Recently produced datasets: <code>listNewDatasets.sh</code> 
* Official datasets: [https://mu2ewiki.fnal.gov/wiki/MDC2020#Current_Datasets MDC2020#Current_Datasets] 


You can check recent datasets using listNewDatasets.sh
The current datasets are also available:
https://mu2ewiki.fnal.gov/wiki/MDC2020#Current_Datasets
These webpage are geneted by nightly cron jobs:
These webpage are geneted by nightly cron jobs:
/exp/mu2e/app/home/mu2epro/cron/datasetMon/
/exp/mu2e/app/home/mu2epro/cron/datasetMon/


==Running jobs locally==
==Running jobs locally==


Both drivers can run locally by providing proper variables like:
setup mu2egrid
export fname=etc.mu2e.index.000.0000000.txt
...whatever else the script complains about


[[Category:Computing]]
[[Category:Computing]]
[[Category:Workflows]]
[[Category:Workflows]]

Latest revision as of 17:28, 5 May 2025

POMS MC Production Guide

Introduction

This guide describes how to run Monte Carlo (MC) production using POMS. Jobs fall into two categories:

Base-template jobs

  • Definition: Process a single input file to produce one output via a standard FCL template.
  • Driver: Production/Scripts/run_RecoEntuple.py (consider renaming to run_DigiReco.py).
  • Output storage: Write results to persistent storage to avoid many small tape files; later concatenate before archiving.
  • Examples: digitization, reconstruction, event-ntupling.
  • Example campaign: POMS Campaign 24194

Stage Parameter Overrides

Param_Overrides = [
   ['-Oglobal.dataset=',    '%(dataset)s'],
   ['--stage=',             'digireco_digi_list'],
   ['-Oglobal.release_v_o=','au'],
   ['-Oglobal.dbversion=',  'v1_3'],
   ['-Oglobal.fcl=',        'Production/JobConfig/digitize/OnSpill.fcl'],
   ['-Oglobal.nevent=',     '-1'],
]
  • %(dataset)s – placeholder for POMS slice names (e.g. dts.sophie.ensembleMDS2a.MDC2020at.art_slice_72935_stage_5)
  • digireco_digi_list – stage definition from …/poms_includes/mdc2020ar.cfg
  • Remaining overrides feed into run_RecoEntuple.py

Split Types

There are multiple split types in POMS, but we've been most using drainingn and nfiles

  • draining(n) – pulls at most n files per iteration and tracks delivered files via snapshots.


To modify campaign, the preferred option is to use GUI editor on the main page, which will bring you the below:

GUI.png

Then double click on digi cell to modify campaign parameters

Extended-template jobs

  • Definition: Require unique, job-specific parameters and configurations.
  • Examples: stage-1 processing, stage-2 resampling, mixing.
  • Example campaign: POMS Campaign 24200

Primaries

We resample primary from particle stops. Generate resampler par file with gen_Resampler.sh

Example:

gen_Resampler.sh --json /exp/mu2e/app/users/oksuzian/muse_080224/Production/data/primary_dio.json --json_index 0

Sample JSON entry:

   {
       "dsconf": "MDC2020at",
       "desc": "DIOtail95",
       "fcl": "Production/JobConfig/primary/DIOtail.fcl",
       "resampler_name": "TargetStopResampler",
       "resampler_data": "sim.mu2e.MuminusStopsCat.MDC2020p.art",
       "events": 5000,
       "njobs": 2000,
       "start_mom": 95,
       "end_mom": 1000,
       "run": 1202,
       "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020at/setup.sh"
   }

, and essentially sets the parameters for gen_Resampler.sh

Produces:

cnf.mu2e.DIOtail95.MDC2020at.0.tar
cnf.mu2e.DIOtail95.MDC2020at.fcl

fcl file above is intended for testing.

After the test, upload par file to disk:

gen_Resampler.sh --json Production/data/primary_dio.json --json_index 0 --pushout


json files are located in Production/data, and can be investigated as such

List all available primaries:

jq 'map(.desc)' data/resampler.json
[
 "RMCFlatGammaStops",
 "RMCFlatGammaResampling",
 "RMCIPAFlatGammaResampling",
 "RMCWireFlatGammaResampling",
 "RMCInternalEndpoint",
 "RMCExternalEndpoint",
 "RMCInternal",
 "RMCExternal",
 "DIOtail95",
 "IPAMuminusMichel",
 "CePlusEndpoint",
 "CeEndpoint",
 "CePLeadingLog",
 "CeMLeadingLog",
 "RPCInternal",
 "RPCExternal"
]

Locate map index that corresponds to CeEndpoint:

jq 'map(.desc) | index("CeEndpoint")' data/resampler.json 
11

Merging

Example:

gen_Merge.sh --json Production/data/merge_filter.json --json_index 4

json file index 4 looks like:

   {
       "desc": "ensembleMDS1eOnSpillTriggered-noMC",
       "dsconf": "MDC2020au_best_v1_3",
       "append": ["physics.trigger_paths: []", "outputs.strip.fileName: \"dig.owner.dsdesc.dsconf.seq.art\""],
       "extra_opts": "--override-output-description",
       "fcl": "Production/JobConfig/digitize/StripMC.fcl",
       "dataset": "dig.mu2e.ensembleMDS1eOnSpillTriggered.MDC2020aq_best_v1_3.art",
       "merge-factor": 1,
       "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020au/setup.sh"
   }

, and essentially sets the parameter for gen_Merge.sh

Produces:

cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.0.tar
cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.fcl

Upload with pushout:

gen_Merge.sh --json Production/data/merge_filter.json --json_index 4 --pushout

Index datasets

Extended-template job type run of the index datasets as such: $ samdes idx_map042425.txt Definition Name: idx_map042425.txt

 Definition Id: 208459
 Creation Date: 2025-04-25T15:58:50+00:00
      Username: oksuzian
         Group: mu2e
    Dimensions: dh.dataset etc.mu2e.index.000.txt and dh.sequencer < 0003892

The definitions are created from a list of par files like:

cnf.mu2e.MuonIPAStopSelector.MDC2020at.tar -1
cnf.mu2e.RMCInternal.MDC2020at.tar 2000 
cnf.mu2e.RMCExternal.MDC2020at.tar 8000
cnf.mu2e.IPAMuminusMichel.MDC2020at.tar 2000
cnf.mu2e.CeMLeadingLog.MDC2020at.tar 2000
cnf.mu2e.CePLeadingLog.MDC2020at.tar 2000
cnf.mu2e.DIOtail95.MDC2020at.tar 2000

Where the first column are the parameter files definitions, and the second column are the number of jobs (-1 means the number of jobs can be extracted from the par file itself)

Then using the list above, we can create a definition:

gen_MergeMap.py /exp/mu2e/data/users/oksuzian/poms_map/map041025.txt

Scripts/run_JITfcl.py

This script drives extended-template job types of the index definitions. On the grid it:

  • Extracts the parameter filename and local index from the map, i.e. /exp/mu2e/data/users/oksuzian/poms_map/merged_map042425.txt
  • Download par file, and extracts fcl file
  • Runs and pushOut all the relevant output: art, root, log

Monitoring

These webpage are geneted by nightly cron jobs:

/exp/mu2e/app/home/mu2epro/cron/datasetMon/

Running jobs locally

Both drivers can run locally by providing proper variables like:

setup mu2egrid
export fname=etc.mu2e.index.000.0000000.txt
...whatever else the script complains about