ProductionLogic

Introduction

A discussion of production logic patterns, with the goals of running with patterns that generate complete and correct output.

Assumptions

We assume that the jobs are run by POMS and data_dispatcher (ddisp) and output is recorded in metacat and Rucio. We assume that all stages are driven by ddisp and an input dataset. The model for discussion is that stage0 take datasets A, with files A0, A1, ..., consumed on input, and files for two datasets are produced, B and C. A job in stage0 receives a file A0, and produces B0 and C0. Stages consumes dataset B via another ddisp project. If the job produces files with a version tag (unique to this particular recovery iteration) it will be labeled B0-T0, B0-T1, etc.

We assume that POMS will not allow a file in B to be passed to stag1 ddisp project until it is metacat child of a file from A which was consumed by a successful stage0 project worker. needs to be confirmed

Patterns

These are two "sub patterns" which will be used below.

In the "default" jobs submisison pattern

a job contacts the ddisp project to get a file from dataset A
ddisp will not allow any other jobs to operate on A0 until this job has completed (failed or succeeded) or timed-out.
a file which is timed-out will go to ddisp project "retry" category (confirmed)
a file from job which is reports a failure puts the file in the "retry" category (retry can be set by the job)
POMS runs recovery jobs

In the "strict" sub-pattern we add the following requirement,

the job must not run past the ddisp timeout. This can be achieved by jobsub expected-lifetime, jobsub script timeout switch, and timeout internal to the job.

OneP

the strict sub-pattern is run
the ddisp project runs a "virtual project" which it does not contact Rucio for locations, the job script knows how to find a file based on the file name and POMS settings
the job writes output in the B0 and C0 final location, overwriting previous output, if any
the jobs create a metacat record, overwriting the record, if it exists
the job does not write Rucio records

post-processing:

a cron job searches recent ddisp projects, determines files A0 which were run in successful ddisp workers, and declares the files B0 and C0 to Rucio.

Notes

only one dCache i/o for output
only one version of output files, with fixed names
Rucio is updated a few hours past processing
stage1 ddisp can't count on using Rucio locations unless it is delayed, or also virtual

Mixed

the strict sub-pattern is run
the ddisp project runs a default project with Rucio locations
the jobs searches for output from previous iterations, B0-T0, B0-T0 and if they exist, the jobs deprecates the files (possibly metacat retired, Rucio removed from dataset and RSE, or files deleted)
the job writes unique output for files in each recovery iteration B0-T1 and C0-T1 in their final location
the jobs creates metacat and Rucio records for T1 versions

post-processing:

None

Notes

only one dCache i/o for output
only one version of output files, concurrently, with non-fixed names
Rucio features may be used as soon at the file is created
stage1 ddisp can count on no duplicates and correct Rucio locations

Afterburner

the default sub-pattern is run
the job writes only files unique to the recovery iteration, like B0-T0
the job copies the files to a unique location
the job creates new metacat and Rucio records
job are recovered until the ddisp project sees success

post-processing:

a cron job searches recent ddisp projects, determines files A0 which were run in successful ddisp workers. If multiple copies exist, the latest will be declared the correct copy and earlier copies will be deprecated (possibly metacat retired, Rucio removed from dataset and RSE, or files deleted)

Notes

no fixed file names
for a time, multiple versions of output files
Rucio features may be used immedaitely after file upload
stage1 ddisp can't proceed on stage0 ddisp success, since there may be multiple files B0-T0, B0-T1 until prost-processing is done

ProductionLogic

Contents

Introduction

Assumptions

Patterns

OneP

Mixed

Afterburner

Navigation menu

ProductionLogic

Introduction

Assumptions

Patterns

OneP

Mixed

Afterburner

Navigation menu

Search