ProductionProceduresMC: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
(Created page with "==Introduction== ==Running jobs locally== Category:Computing Category:Workflows")
 
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Introduction==
= POMS MC Production Guide =
__TOC__
 
== Introduction ==
This guide describes how to run Monte Carlo (MC) production using POMS. Jobs fall into two categories:
 
== Base-template jobs ==
* Definition: Process a single input file to produce one output via a standard FCL template. 
* Driver: <code>Production/Scripts/run_RecoEntuple.py</code> (consider renaming to <code>run_DigiReco.py</code>). 
* Output storage: Write results to persistent storage to avoid many small tape files; later concatenate before archiving. 
* Examples: digitization, reconstruction, event-ntupling. 
* Example campaign: [https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24194 POMS Campaign 24194]
 
=== Stage Parameter Overrides ===
 
Param_Overrides = [
    ['-Oglobal.dataset=',    '%(dataset)s'],
    ['--stage=',            'digireco_digi_list'],
    ['-Oglobal.release_v_o=','au'],
    ['-Oglobal.dbversion=',  'v1_3'],
    ['-Oglobal.fcl=',        'Production/JobConfig/digitize/OnSpill.fcl'],
    ['-Oglobal.nevent=',    '-1'],
]
 
* <code>%(dataset)s</code> – placeholder for POMS slice names (e.g. <code>dts.sophie.ensembleMDS2a.MDC2020at.art_slice_72935_stage_5</code>) 
* <code>digireco_digi_list</code> – stage definition from <code>…/poms_includes/mdc2020ar.cfg</code> 
* Remaining overrides feed into <code>run_RecoEntuple.py</code>
 
=== Split Types ===
There are multiple split types in POMS, but we've been most using <code>drainingn</code> and <code>nfiles</code>
* <code>draining(n)</code> – pulls at most <code>n</code> files per iteration and tracks delivered files via snapshots.
 
 
To modify campaign, the preferred option is to use GUI editor on the main page, which will bring you the below:
 
[[File:GUI.png|300px]]
 
Then double click on <mark>digi</mark> cell to modify campaign parameters
 
== Extended-template jobs ==
* Definition: Require unique, job-specific parameters and configurations. 
* Examples: stage-1 processing, stage-2 resampling, mixing. 
* Example campaign: [https://pomsgpvm02.fnal.gov/poms/campaign_stage_info/mu2e/production?campaign_stage_id=24200 POMS Campaign 24200]
 
==== Primaries ====
 
We resample primary from particle stops.
Generate resampler par file with <code>gen_Resampler.sh</code>
 
Example:
gen_Resampler.sh --json /exp/mu2e/app/users/oksuzian/muse_080224/Production/data/primary_dio.json --json_index 0
 
Sample JSON entry:
    {
        "dsconf": "MDC2020at",
        "desc": "DIOtail95",
        "fcl": "Production/JobConfig/primary/DIOtail.fcl",
        "resampler_name": "TargetStopResampler",
        "resampler_data": "sim.mu2e.MuminusStopsCat.MDC2020p.art",
        "events": 5000,
        "njobs": 2000,
        "start_mom": 95,
        "end_mom": 1000,
        "run": 1202,
        "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020at/setup.sh"
    }
, and essentially sets the parameters for '''gen_Resampler.sh'''
 
Produces:
cnf.mu2e.DIOtail95.MDC2020at.0.tar
cnf.mu2e.DIOtail95.MDC2020at.fcl
 
fcl file above is intended for testing.
 
After the test, upload par file to disk:
gen_Resampler.sh --json Production/data/primary_dio.json --json_index 0 --pushout
 
 
json files are located in '''Production/data''', and can be investigated as such
 
List all available primaries:
jq 'map(.desc)' data/resampler.json
[
  "RMCFlatGammaStops",
  "RMCFlatGammaResampling",
  "RMCIPAFlatGammaResampling",
  "RMCWireFlatGammaResampling",
  "RMCInternalEndpoint",
  "RMCExternalEndpoint",
  "RMCInternal",
  "RMCExternal",
  "DIOtail95",
  "IPAMuminusMichel",
  "CePlusEndpoint",
  "CeEndpoint",
  "CePLeadingLog",
  "CeMLeadingLog",
  "RPCInternal",
  "RPCExternal"
]
 
Locate map index that corresponds to '''CeEndpoint''':
 
jq 'map(.desc) | index("CeEndpoint")' data/resampler.json
11
 
==== Merging ====
Example:
 
gen_Merge.sh --json Production/data/merge_filter.json --json_index 4
 
json file index 4 looks like:
    {
        "desc": "ensembleMDS1eOnSpillTriggered-noMC",
        "dsconf": "MDC2020au_best_v1_3",
        "append": ["physics.trigger_paths: []", "outputs.strip.fileName: \"dig.owner.dsdesc.dsconf.seq.art\""],
        "extra_opts": "--override-output-description",
        "fcl": "Production/JobConfig/digitize/StripMC.fcl",
        "dataset": "dig.mu2e.ensembleMDS1eOnSpillTriggered.MDC2020aq_best_v1_3.art",
        "merge-factor": 1,
        "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020au/setup.sh"
    }
, and essentially sets the parameter for '''gen_Merge.sh'''
 
Produces:
cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.0.tar
cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.fcl
 
Upload with pushout:
gen_Merge.sh --json Production/data/merge_filter.json --json_index 4 --pushout
 
===Index datasets===
Extended-template job type run of the index datasets as such:
$ samdes idx_map042425.txt
Definition Name: idx_map042425.txt
  Definition Id: 208459
  Creation Date: 2025-04-25T15:58:50+00:00
      Username: oksuzian
          Group: mu2e
    Dimensions: dh.dataset etc.mu2e.index.000.txt and dh.sequencer < 0003892
 
The definitions are created from a list of par files like:
 
cnf.mu2e.MuonIPAStopSelector.MDC2020at.tar -1
cnf.mu2e.RMCInternal.MDC2020at.tar 2000
cnf.mu2e.RMCExternal.MDC2020at.tar 8000
cnf.mu2e.IPAMuminusMichel.MDC2020at.tar 2000
cnf.mu2e.CeMLeadingLog.MDC2020at.tar 2000
cnf.mu2e.CePLeadingLog.MDC2020at.tar 2000
cnf.mu2e.DIOtail95.MDC2020at.tar 2000
 
Where the first column are the parameter files definitions, and the second column are the number of jobs (-1 means the number of jobs can be extracted from the par file itself)
 
Then using the list above, we can create a definition:
gen_MergeMap.py /exp/mu2e/data/users/oksuzian/poms_map/map041025.txt
 
=== Scripts/run_JITfcl.py ===
This script drives extended-template job types of the index definitions.
On the grid it:
* Extracts the parameter filename and local index from the map, i.e. /exp/mu2e/data/users/oksuzian/poms_map/merged_map042425.txt
* Download par file, and extracts fcl file
* Runs and pushOut all the relevant output: art, root, log
 
== Monitoring ==
* Recently produced datasets: <code>listNewDatasets.sh</code> 
* Official datasets: [https://mu2ewiki.fnal.gov/wiki/MDC2020#Current_Datasets MDC2020#Current_Datasets] 
 
These webpage are geneted by nightly cron jobs:
/exp/mu2e/app/home/mu2epro/cron/datasetMon/
 
==Running jobs locally==
==Running jobs locally==


Both drivers can run locally by providing proper variables like:
setup mu2egrid
export fname=etc.mu2e.index.000.0000000.txt
...whatever else the script complains about


[[Category:Computing]]
[[Category:Computing]]
[[Category:Workflows]]
[[Category:Workflows]]

Latest revision as of 17:28, 5 May 2025

POMS MC Production Guide

Introduction

This guide describes how to run Monte Carlo (MC) production using POMS. Jobs fall into two categories:

Base-template jobs

  • Definition: Process a single input file to produce one output via a standard FCL template.
  • Driver: Production/Scripts/run_RecoEntuple.py (consider renaming to run_DigiReco.py).
  • Output storage: Write results to persistent storage to avoid many small tape files; later concatenate before archiving.
  • Examples: digitization, reconstruction, event-ntupling.
  • Example campaign: POMS Campaign 24194

Stage Parameter Overrides

Param_Overrides = [
   ['-Oglobal.dataset=',    '%(dataset)s'],
   ['--stage=',             'digireco_digi_list'],
   ['-Oglobal.release_v_o=','au'],
   ['-Oglobal.dbversion=',  'v1_3'],
   ['-Oglobal.fcl=',        'Production/JobConfig/digitize/OnSpill.fcl'],
   ['-Oglobal.nevent=',     '-1'],
]
  • %(dataset)s – placeholder for POMS slice names (e.g. dts.sophie.ensembleMDS2a.MDC2020at.art_slice_72935_stage_5)
  • digireco_digi_list – stage definition from …/poms_includes/mdc2020ar.cfg
  • Remaining overrides feed into run_RecoEntuple.py

Split Types

There are multiple split types in POMS, but we've been most using drainingn and nfiles

  • draining(n) – pulls at most n files per iteration and tracks delivered files via snapshots.


To modify campaign, the preferred option is to use GUI editor on the main page, which will bring you the below:

GUI.png

Then double click on digi cell to modify campaign parameters

Extended-template jobs

  • Definition: Require unique, job-specific parameters and configurations.
  • Examples: stage-1 processing, stage-2 resampling, mixing.
  • Example campaign: POMS Campaign 24200

Primaries

We resample primary from particle stops. Generate resampler par file with gen_Resampler.sh

Example:

gen_Resampler.sh --json /exp/mu2e/app/users/oksuzian/muse_080224/Production/data/primary_dio.json --json_index 0

Sample JSON entry:

   {
       "dsconf": "MDC2020at",
       "desc": "DIOtail95",
       "fcl": "Production/JobConfig/primary/DIOtail.fcl",
       "resampler_name": "TargetStopResampler",
       "resampler_data": "sim.mu2e.MuminusStopsCat.MDC2020p.art",
       "events": 5000,
       "njobs": 2000,
       "start_mom": 95,
       "end_mom": 1000,
       "run": 1202,
       "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020at/setup.sh"
   }

, and essentially sets the parameters for gen_Resampler.sh

Produces:

cnf.mu2e.DIOtail95.MDC2020at.0.tar
cnf.mu2e.DIOtail95.MDC2020at.fcl

fcl file above is intended for testing.

After the test, upload par file to disk:

gen_Resampler.sh --json Production/data/primary_dio.json --json_index 0 --pushout


json files are located in Production/data, and can be investigated as such

List all available primaries:

jq 'map(.desc)' data/resampler.json
[
 "RMCFlatGammaStops",
 "RMCFlatGammaResampling",
 "RMCIPAFlatGammaResampling",
 "RMCWireFlatGammaResampling",
 "RMCInternalEndpoint",
 "RMCExternalEndpoint",
 "RMCInternal",
 "RMCExternal",
 "DIOtail95",
 "IPAMuminusMichel",
 "CePlusEndpoint",
 "CeEndpoint",
 "CePLeadingLog",
 "CeMLeadingLog",
 "RPCInternal",
 "RPCExternal"
]

Locate map index that corresponds to CeEndpoint:

jq 'map(.desc) | index("CeEndpoint")' data/resampler.json 
11

Merging

Example:

gen_Merge.sh --json Production/data/merge_filter.json --json_index 4

json file index 4 looks like:

   {
       "desc": "ensembleMDS1eOnSpillTriggered-noMC",
       "dsconf": "MDC2020au_best_v1_3",
       "append": ["physics.trigger_paths: []", "outputs.strip.fileName: \"dig.owner.dsdesc.dsconf.seq.art\""],
       "extra_opts": "--override-output-description",
       "fcl": "Production/JobConfig/digitize/StripMC.fcl",
       "dataset": "dig.mu2e.ensembleMDS1eOnSpillTriggered.MDC2020aq_best_v1_3.art",
       "merge-factor": 1,
       "simjob_setup": "/cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020au/setup.sh"
   }

, and essentially sets the parameter for gen_Merge.sh

Produces:

cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.0.tar
cnf.mu2e.ensembleMDS1eOnSpillTriggered-noMC.MDC2020au_best_v1_3.fcl

Upload with pushout:

gen_Merge.sh --json Production/data/merge_filter.json --json_index 4 --pushout

Index datasets

Extended-template job type run of the index datasets as such: $ samdes idx_map042425.txt Definition Name: idx_map042425.txt

 Definition Id: 208459
 Creation Date: 2025-04-25T15:58:50+00:00
      Username: oksuzian
         Group: mu2e
    Dimensions: dh.dataset etc.mu2e.index.000.txt and dh.sequencer < 0003892

The definitions are created from a list of par files like:

cnf.mu2e.MuonIPAStopSelector.MDC2020at.tar -1
cnf.mu2e.RMCInternal.MDC2020at.tar 2000 
cnf.mu2e.RMCExternal.MDC2020at.tar 8000
cnf.mu2e.IPAMuminusMichel.MDC2020at.tar 2000
cnf.mu2e.CeMLeadingLog.MDC2020at.tar 2000
cnf.mu2e.CePLeadingLog.MDC2020at.tar 2000
cnf.mu2e.DIOtail95.MDC2020at.tar 2000

Where the first column are the parameter files definitions, and the second column are the number of jobs (-1 means the number of jobs can be extracted from the par file itself)

Then using the list above, we can create a definition:

gen_MergeMap.py /exp/mu2e/data/users/oksuzian/poms_map/map041025.txt

Scripts/run_JITfcl.py

This script drives extended-template job types of the index definitions. On the grid it:

  • Extracts the parameter filename and local index from the map, i.e. /exp/mu2e/data/users/oksuzian/poms_map/merged_map042425.txt
  • Download par file, and extracts fcl file
  • Runs and pushOut all the relevant output: art, root, log

Monitoring

These webpage are geneted by nightly cron jobs:

/exp/mu2e/app/home/mu2epro/cron/datasetMon/

Running jobs locally

Both drivers can run locally by providing proper variables like:

setup mu2egrid
export fname=etc.mu2e.index.000.0000000.txt
...whatever else the script complains about