ProductionProcedures

From Mu2eWiki
Revision as of 20:29, 31 July 2025 by Rlc (talk | contribs)
Jump to navigation Jump to search

Introduction

Running jobs locally

In production log files, there is configuration dump stanzas like

************** control summary exe ***************
MOO_CAMPAIGN_STAGE=Reco
MOO_SOURCE=v00_03_02
MOO_DATASET=CRVWB-000
MOO_VERBOSE=1
MOO_OUTDIR=production
MOO_APPEND_NAME=none
MOO_CFG=CRVWB-008
MOO_CONFIG=CRVWB-000-008-000
MOO_CAMPAIGN=CRVWB-000-0
MOO_SCRIPT=CRVWB/reco.sh
MOO_CRVTESTSTAND=v17
************** control summary exe ***************

Jobs are run in a a generic wrapper script and are completely controlled by these variables and the input files. The variables are set through the process of interpreting the POMS campaign configuration, the cfg files, and the wrapper script itself. The easiest way to get a complete of control variable is from a log file, but if that's not available, there is currently no simple verified way to extract them from the sources (if, for example, no jobs run). To rerun a jobs locally, you only need to write a little script that sets these variables, then provides one more:

export MOO_LOCAL_INPUT=https://fndcadoor.fnal.gov:2880/pnfs/fnal.gov/usr/mu2e/tape/phy-raw/raw/mu2e/CRV_wideband_cosmics/crvled-001/dat/3b/a7/raw.mu2e.CRV_wideband_cosmics.crvled-001.001303_056.dat

Goto an area with some space

cd /exp/mu2e/data/users/mu2epro/production_recovery
# pick a subdirectory
cd 1
# cleanup, make it look a grid dir
rm -f * jsb_tmp/*
mkdir -p jsb_tmp

if needed to run in sl7.

mu2einit
sl7container
mu2einit

run the job script

export MOO_CAMPAIGN_STAGE=Reco
export MOO_SOURCE=v00_03_02
export MOO_DATASET=CRVWB-000
export MOO_VERBOSE=1
export MOO_OUTDIR=production
export MOO_APPEND_NAME=none
export MOO_CFG=CRVWB-008
export MOO_CONFIG=CRVWB-000-008-000
export MOO_CAMPAIGN=CRVWB-000-0
export MOO_SCRIPT=CRVWB/reco.sh
export MOO_CRVTESTSTAND=v17

export MOO_LOCAL_INPUT=https://fndcadoor.fnal.gov:2880/pnfs/fnal.gov/usr/mu2e/tape/phy-raw/raw/mu2e/CRV_wideband_cosmics/crvled-001/dat/3b/a7/raw.mu2e.CRV_wideband_cosmics.crvled-001.001303_056.dat

nice /cvmfs/mu2e.opensciencegrid.org/bin/OfflineOps/wrapper.sh  \
  1> jsb_tmp/JOBSUB_LOG_FILE 2> jsb_tmp/JOBSUB_ERR_FILE

# optionally put in the background

Keepup scripts

The keepup scripts drive production scripts that have to run constantly. The keepup technique keeps the scripts running constantly and uses a cron job only to check that all the scripts are still running. This is a little easier to maintain and has the nice feature if a script runs over a repeat interval period, the work load is heavy and the script can be rerun again immediately.

The framework scripts are in ~mu2epro/cron/production. A cron job on mu2eprodgpvm01 running keepup.sh checks the scripts are running and have recent heartbeats. Each procedure is called a "service" and each service can have multiple independent instances labeled by a different "name". The master list of services and instances is in keepup.txt. There are notes at the top of the file on the meaning of each row, which represents an instance of a service.

Each service has a code and configuration subdirectory (named by the service) with a main worker script called run.sh. The service can customize how to name and configure its instances. Each instance of each service has its own working area under /exp/mu2e/data/users/mu2epro/production/logs. The keepup scripts monitor the heartbeat of the service and will trigger an alarm (currently just an email) if a service is missing or the script is stalled.