ProductionProcedures: Difference between revisions

Latest revision as of 20:40, 31 July 2025

Introduction

Running jobs locally

In production log files, there is configuration dump stanzas like

************** control summary exe ***************
MOO_CAMPAIGN_STAGE=Reco
MOO_SOURCE=v00_03_02
MOO_DATASET=CRVWB-000
MOO_VERBOSE=1
MOO_OUTDIR=production
MOO_APPEND_NAME=none
MOO_CFG=CRVWB-008
MOO_CONFIG=CRVWB-000-008-000
MOO_CAMPAIGN=CRVWB-000-0
MOO_SCRIPT=CRVWB/reco.sh
MOO_CRVTESTSTAND=v17
************** control summary exe ***************

Jobs are run in a a generic wrapper script and are completely controlled by these variables and the input files. The variables are set through the process of interpreting the POMS campaign configuration, the cfg files, and the wrapper script itself. The easiest way to get a complete of control variable is from a log file, but if that's not available, there is currently no simple verified way to extract them from the sources (if, for example, no jobs run). To rerun a jobs locally, you only need to write a little script that sets these variables, then provides one more:

export MOO_LOCAL_INPUT=https://fndcadoor.fnal.gov:2880/pnfs/fnal.gov/usr/mu2e/tape/phy-raw/raw/mu2e/CRV_wideband_cosmics/crvled-001/dat/3b/a7/raw.mu2e.CRV_wideband_cosmics.crvled-001.001303_056.dat

Goto an area with some space

cd /exp/mu2e/data/users/mu2epro/production_recovery
# pick a subdirectory
cd 1
# cleanup, make it look a grid dir
rm -f * jsb_tmp/*
mkdir -p jsb_tmp

if needed to run in sl7.

mu2einit
sl7container
mu2einit

run the job script

export MOO_CAMPAIGN_STAGE=Reco
export MOO_SOURCE=v00_03_02
export MOO_DATASET=CRVWB-000
export MOO_VERBOSE=1
export MOO_OUTDIR=production
export MOO_APPEND_NAME=none
export MOO_CFG=CRVWB-008
export MOO_CONFIG=CRVWB-000-008-000
export MOO_CAMPAIGN=CRVWB-000-0
export MOO_SCRIPT=CRVWB/reco.sh
export MOO_CRVTESTSTAND=v17

export MOO_LOCAL_INPUT=https://fndcadoor.fnal.gov:2880/pnfs/fnal.gov/usr/mu2e/tape/phy-raw/raw/mu2e/CRV_wideband_cosmics/crvled-001/dat/3b/a7/raw.mu2e.CRV_wideband_cosmics.crvled-001.001303_056.dat

nice /cvmfs/mu2e.opensciencegrid.org/bin/OfflineOps/wrapper.sh  \
  1> jsb_tmp/JOBSUB_LOG_FILE 2> jsb_tmp/JOBSUB_ERR_FILE

# optionally put in the background

Keepup scripts

The keepup scripts drive production scripts that have to run constantly. The keepup technique keeps the scripts running constantly and uses a cron job only to check that all the scripts are still running. This is a little easier to maintain and has the nice feature if a script runs over a repeat interval period, the work load is heavy and the script can be rerun again immediately.

The framework scripts are in ~mu2epro/cron/production. A cron job on mu2eprodgpvm01 running keepup.sh checks the scripts are running and have recent heartbeats. Each procedure is called a "service" and each service can have multiple independent instances labeled by a different "name" and can run on a requested node. The master list of services and instances is in keepup.txt which controls which services are running. There are notes at the top of the file on the meaning of each row, which represents an instance of a service. The keepup scripts monitor the heartbeat of the service and will trigger an alarm (currently just an email) if a service is missing or the script is stalled.

Each service has a code and configuration subdirectory (named by the service) with a main worker script called run.sh. The service can customize how to name and configure its instances. Each instance of each service has its own working area under /exp/mu2e/data/users/mu2epro/production/logs. In the working directory, there are standard files and directories:

log files by date - this is the ouput of the service script
an executable file name "<service>-<name>". This is a wrapper script calling run.sh, renamed for convenience
work directory - scratch space for the service
heartbeat.txt - hearbeats from the wrapper confirming the service script is active
wrapper.log - output from the wrapper script

@@ Line 65: / Line 65: @@
 The keepup scripts drive production scripts that have to run constantly.  The keepup technique keeps the scripts running constantly and uses a cron job only to check that all the scripts are still running.  This is a little easier to maintain and has the nice feature if a script runs over a repeat interval period, the work load is heavy and the script can be rerun again immediately.
-The framework scripts are in <code>~mu2epro/cron/production</code>.  A cron job on <code>mu2eprodgpvm01</code> running <code>keepup.sh</code> checks the scripts are running and have recent heartbeats.  Each procedure is called a "service" and each service can have multiple independent instances labeled by a different "name".  The master list of services and instances is in <code>keepup.txt</code>.  There are notes at the top of the file on the meaning of each row, which represents an instance of a service.
+The framework scripts are in <code>~mu2epro/cron/production</code>.  A cron job on <code>mu2eprodgpvm01</code> running <code>keepup.sh</code> checks the scripts are running and have recent heartbeats.  Each procedure is called a "service" and each service can have multiple independent instances labeled by a different "name" and can run on a requested node.  The master list of services and instances is in <code>keepup.txt</code> which controls which services are running.  There are notes at the top of the file on the meaning of each row, which represents an instance of a service. The keepup scripts monitor the heartbeat of the service and will trigger an alarm (currently just an email) if a service is missing or the script is stalled.
-Each service has a code and configuration subdirectory (named by the service) with a main worker script called <code>run.sh</code>.  The service can customize how to name and configure its instances.  Each instance of each service has its own working area under <code>/exp/mu2e/data/users/mu2epro/production/logs</code>.  The keepup scripts monitor the heartbeat of the service and will trigger an alarm (currently just an email) if a service is missing or the script is stalled.
+Each service has a code and configuration subdirectory (named by the service) with a main worker script called <code>run.sh</code>.  The service can customize how to name and configure its instances.  Each instance of each service has its own working area under <code>/exp/mu2e/data/users/mu2epro/production/logs</code>.  In the working directory, there are standard files and directories:
+* log files by date - this is the ouput of the service script
+* an executable file name "<service>-<name>".  This is a wrapper script calling <code>run.sh</code>, renamed for convenience
+* <code>work</code> directory - scratch space for the service
+* <code>heartbeat.txt</code> - hearbeats from the wrapper confirming the service script is active
+* <code>wrapper.log</code> - output from the wrapper script
 <code></code>
 <code></code>

ProductionProcedures: Difference between revisions

Latest revision as of 20:40, 31 July 2025

Introduction

Running jobs locally

Keepup scripts

Navigation menu

Search