Mock Data (MDS): Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
Line 175: Line 175:


4) Launch this job as you would any other.
4) Launch this job as you would any other.
==Production Scripts==
To automate parts of the process a number of scripts have been written. These reside in the Produciton repo: Production/ensembles/scripts.
There are three main scripts:
* calculateInputs.sh - this script is useful for making the input samples, it allows use of the DBs to calculate the expected number of each process for the chosen livetime, beam mode etc.
* MakeTemplateFcl.sh - this script will take in the data sets and must be ran interactively. The result is the template fcl with appropriate weights for each process.
* genEnsemblesGrid.sh - this script will deploy the grid scripts to run the ensembling process, all the math is based relative to the number of cosmics jobs, this is automated.
Once you have the samples produced and declared in the data file catalog (SAM currently) then digi and reco can be ran in the same way as any other primary sample.


== Components ==
== Components ==

Revision as of 21:37, 3 July 2024

MDC 2024: Mock Data samples

Introduction

Mock data samples can be helpful in two ways:

  • to help prepare physics analysis efforts;
  • to help us understand the size of our data .art files and ntuples.

Streams

The two purposes above (physics studies, trigger studies) will result in different samples, with differing complexity.

Physics stream

Includes all major components and pile-up. The DIOtail momentum cut can be higher (nominally p>95 MeV/c will be used as a starting point). Three samples will be made: signal at just below current limit (1e-13), closed sample (random signal choice), no signal.

Trigger stream

Here all backgrounds and pile-up will be included but no signal. The DIOtail cut is reduced below the trigger threshold to p > 75 MeV/c.

Inputs

There are several assumptions made when we choose a livetime:

Booster Batch Mode

The Booster Batch (BB) mode describes the incoming operational mode of the booster which feeds the beam through our delivery ring which in turn passes protons to Mu2e.

There are two "run modes" in Mu2e: 1BB and 2BB, in the low-intensity running mode the mean intensity is 1.6E7 protons/pulse and in the higher-intensity mode this becomes 3.9E7 protons/pulse.

Batch Time (s) T Cycle (s) T Spill (s) Spills frac On Spill Time N-cycles POT per cycle
1 9.52E+06 1.33 1.07E-01 4 0.323 3.07E+06 7.16E+06 4.00E+12
2 1.58E+06 1.4 4.31E-02 8 0.246 3.89E+05 1.13E+06 8.00E+12

Expected DIOs

The expected number of muon stops per POT is 1.56E-3 muons/POT (from MDC2020p). The decay : capture ratio for Al is 0.39:0.61.

In our simulation, we tend to focus on simulating the higher momentum tail with cuts of p > 75MeV/c sampling a fraction of 4.19E-07 of the entire DIO spectrum. and p > 95 MeV/c sampling a fraction of 3.64E-11 of the entire DIO spectrum.

livetime BB POT Stopped Muons DIOs DIOs (p>75MeV/c) DIOs (p>95MeV/c)
1 hour 1BB 3.50E+15 5.45E+12 2.12E+12 77 8.90E+05

Expected CEs

As part of this study, we begin with a conversion rate just below the current upper limit (7e-13) that is 1e-13.


livetime BB POT Stopped Muons Surviving Muons CE rate NCEs
1 week 1BB 5.88E+17 9.16E+14 5.59E+14 1E-13 55
1 month 1BB 2.35E+18 3.66E+15 2.23E+15 1E-13 222
1 year 1BB 3.00E+19 4.67E+16 2.85E+16 1E-13 2.85E+03

It should be noted that this is a favorable choice of signal rate, we will also simulate lower rates, and a no signal scenario.

Combining primaries

To create Mock Data the process is as follows:

  • Each primary (DIOtail, CE, Cosmics etc.) is simulated separately, the number of events simulated must be equal to ( or greater than ) the livetime of the Mock Data sample (we do not resample);
  • The run_si.py script is run with several input arguments:
 stdpath = the path to output and where filelists for each input is located
 BB = 1, 2 or averaged booster batch mode
 livetime = livetime in seconds
 prc = the list of processes to be included
 rmue = signal branching rate
  • this creates a template .fcl file with filenames and weights. The weights are relative to the chosen input parameters (livetime, Rmue etc.)

Configuring

As part of MDC2024 we developed a grid base ensembling work flow. This allowed for larger samples to be created.

There are two stages to making the configuration:

1) Make the input .art files. One thing to be aware of is that the Sampling Input technology requires the same number of files per process. The best way to work with this is to use the hardest job (cosmics) as the standard. Creating jobs for other processes with a compatible number of jobs.

2) Running the python script make_si.py which builds a template fcl for a given livetime and Rmue (plus a few other variables)

3) Use the mu2e tools: mu2ejobdef to make a tar. With sampling input for the various input streams e.g:


mu2ejobdef --desc=test-ensemble --dsconf=v0 \
--run=1002 \
--setup /cvmfs/mu2e.opensciencegrid.org/Musings/SimJob/MDC2020ae/setup.sh \
--sampling=1:CE:ensemble/CE.txt \
--sampling=1:DIO:ensemble/DIO.txt \
--sampling=1:CRYCosmic:ensemble/CRYCosmic.txt \
--embed samplingtest.fcl --verb


4) Launch this job as you would any other.

Production Scripts

To automate parts of the process a number of scripts have been written. These reside in the Produciton repo: Production/ensembles/scripts.

There are three main scripts:

  • calculateInputs.sh - this script is useful for making the input samples, it allows use of the DBs to calculate the expected number of each process for the chosen livetime, beam mode etc.
  • MakeTemplateFcl.sh - this script will take in the data sets and must be ran interactively. The result is the template fcl with appropriate weights for each process.
  • genEnsemblesGrid.sh - this script will deploy the grid scripts to run the ensembling process, all the math is based relative to the number of cosmics jobs, this is automated.

Once you have the samples produced and declared in the data file catalog (SAM currently) then digi and reco can be ran in the same way as any other primary sample.


Components

DIO tail

The DIO tails is simulated from stopped muons using the SingleProcessGenerator defined in the Offline EventGenerator directory. The DIOGenerator tool is used to provide the correct momentum distribution based on the 5-8 polynomial derived by Czarnecki et al.

A filter called GenFilter is used to remove events unlikely to produce viable events in the reconstruction. The effect of the filter is to improve the time performance by 40%, there is no loss of efficiency.

Two DIO tail samples are included as primaries in two sets of samples for MDC2024: one has a cut at p > 95 MeV/c (a fraction of 3.64e-11 of the entire DIO momentum spectrum) and another has a lower cut, below the trigger threshold, of p > 75 MeV/c (a fraction of 4.19e-7 of the entire DIO spectrum).

In previous simulation studies, DIOs of all momenta are included in the pile-up stream and not as primaries, including them as primaries has the advantage of giving us a large sample of events and therefore increased realism.

Conversion and Conversion Leading Log

CeEndpoints are a standard part of production. The Leading Log camapign includes the leading log corrections calculated by Szfaron. This results in about 10% of electrons being in a lower momentum tail (as opposed to all being at 104.97 MeV/c in the case of the CeEndpoint).

compares CE and CELL.

Cosmics

As part of SU2020 a campaign that used the CORSIKA generator was built and exercised, providing 1.1e7s of cosmic events to be sampled from. Similarily a campaign of a similar size using the CRY generator is also available.

The CRY sample is used for pass 0, but the CORSIKA one is used for the later camapigns.

Pile-up

For pass 0 the existing pile-up streams were used. These were mixed with the combined primary sample as if it were any other primary sample.

This will provide some inaccuracies, as we are mixing in two DIO samples (one as a primary for p > 95 MeV/c and one which is part of the MuStopPileup sample and covers all momentum ranges up to the endpoint). This could introduce some double counting but it is unlikely to overly effect the outcomes of any physics analysis applied to these samples.

For future passes, custom pile-up samples will be combined as primaries in the same way we have done the DIO tails.

RPC

RPC is simulated using the RPCGun generator. Both internal and external RPC can be simulated using the same generator.

At timing filter on arrival proper time of the stopped pions is used to improve performance of the simulation. This must be factored in when normalizing the samples.

DIO 75MeV/c short tests

A set of 1minute samples with a p>75MeV/c cut on the DIO tail were generated to get a feel for the size and time taken to generate this sample.


Tag Processes BB livetime Rmue conditions Comments sam name
testa CE+DIO(75MeV/c) 1BB 1 min 1e-13 perfect dts,dig,mcs ensemble-1BB-CEDIO-60s-p75MeVc
testb CE+DIO(75MeV/c)+CRY 1BB 1 min 1e-13 perfect dts,dig,mcs ensemble-1BB-CEDIOCRYCosmic-60s-p75MeVc
testc CE+DIO(75MeV/c) 2BB 1 min 1e-13 perfect dts,dig,mcs ensemble-2BB-CEDIO-60s-p75MeVc
testd CE+DIO(75MeV/c)+CRY 2BB 1 min 1e-13 perfect dts,dig,mcs ensemble-2BB-CEDIOCRYCosmic-60s-p75MeVc
teste CE+DIO(75MeV/c)+CRY 1BB 1 hour 1e-13 perfect dts only ensemble-1BB-CEDIOCRYCosmic-3600s-p75MeVc
testf CE+DIO(75MeV/c)+CRY+PU 1BB 1 min 1e-13 perfect dts only ensembles-1BB-CEDIOCRYCosmic-60s-p75MeVc-OnSpillMix1BBTriggered

Mock-Dataset-0 (MDS0) (95 MeV/c)

The MDS0 samples all include DIO tail events with the 95 MeV/c cut. Two sample sizes are chosen: 1 week livetime and 1 month livetime.

All components except the RPC are included. Two Rmue values are used, one at 1e-13 which is just below the present upper limit (7e-13) and allows around 55 generated CE events for the 1 week sample and 222 generated CE for the 1 month livetime (before any selection or reconstruction efficiency is factored in).

The samples available are listed below:


Tag Processes BB livetime Rmue conditions sam name Comments
MDS0a CE+DIO(95MeV/c) Mixed 1 month 1e-13 best,perfect ensemble-MixBB-CEDIO-1month-p95MeVc-Triggered simple test
MDS0b CE+DIO(95MeV/c)+CRY 1BB 1 week 1e-13 best,perfect ensemble-1BB-CEDIOCRYCosmic-600000s-p95MeVcTrigger- simple test including cosmics
MDS0c CE+DIO(95MeV/c)+CRY 1BB 1 month 1e-13 best,perfect ensemble-1BB-CEDIOCRYCosmic-2400000s-p95MeVc-Trigger- simple test including more cosmics
MDS0d CE+DIO(95MeV/c)+CRY+PU 1BB 1 week 1e-13 perfect ensemble-1BB-CEDIOCRYCosmic-600000s-p95MeVcMix1BBTriggered simple test including old pile-up streams
MDS0e CE+DIO(95MeV/c)+CRY 1BB 1 year 1e-13 perfect dts only: ensemble-1BB-CEDIOCRYCosmic-31000000s-p95MeVc largest simple sample

The dts, digi, mcs and TrkAna ntuples are available in the usual locations. In most cases the digi and reco stages were ran with perfect and best condtions.

The component samples which went into these streams are listed here:

process tag Comments
CeEndpoint MDC2020ac 100K CEs simulated
DIOtail (95MeV/c) MDC2020ad 1 month DIO equiv.
DIOtail (75MeV/c) MDC2020ad_sm0 1 week DIO equiv.
CRY Comsic MDC2020s 1 year sample, signal stream
pile-up/stops MDC2020p most recently made mu beam sample

Mock Dataset 1 (MDS1)

MDS1 will inherit from the MDC2020ae (Cosmics) and MDC2024a_* releases and classified as MDC2024a.

Several updates are made for MDS 1:

  • CeEndpoint now including the leading log too;
  • DIO tail momentum cut moved to 75 MeV/c for triggered stream only;
  • CORSIKA generator used for cosmics;
  • PU streams upgraded (might move to pass2).


process tag events
CeMLeadingLog MDC2024a_sm4 800K
DIOtail (95MeV/c) MDC2024a_sm4 1 year
DIOtail (75MeV/c) MDC2024a_sm3 1 week
CORSIKA MDC2020ae
pile-up/stops MDC2020p -

Mock Dataset 2 (MDS2)

Here we add in the RPC/RMC streams and also provide positron samples ... TBC