Data Products and Processing Tutorial

Tutorial Session Goal

This tutorial will explore the data products used in Mu2e and the modules and algorithms which create them. It is part of the June 2019 Computing and Software tutorial

Session Prerequisites and Advance Preparation

This tutorial assumes knowledge of art and the Mu2e detector. You will need to understand basic principles of how modules and event processing function in art. You will need to understand C++ data structures and fundamental types. You should have completed the following tutorials:

Session Introduction

The information content of Mu2e is stored in the form of art data products. There are several levels of information:

Monte Carlo generator information
Geant4 information
Digitized detector data, or digis (Offline format)
Reconstructed data

We will explore a few of these, and the algorithms which create them.

Exercises

General

On your machine, setup an area for this tutorial, and launch Docker interactively with a link to it. The docker command is for macos, modify the Display setting as needed for windows, linux, see Docker for instructions.

> cd $HOME
> mkdir Tutorials
> mkdir Tutorials/DataExploration
> docker run -it --rm -v /Users/brownd/Tutorials/DataExploration:/home/DataExploration -e DISPLAY=$ip:0 mu2e/user:tutorial_1-02

Inside the docker window, setup a satellite release for data exploration exercises:

[root@80c41be82418 home]# source /Tutorials_2019/setup_container.sh
[root@80c41be82418 home]# cp -r /Tutorials_2019/DataExploration/* /home/DataExploration/
[root@80c41be82418 home]# cd /home/DataExploration/
[root@80c41be82418 DataExploration]# $TUTORIAL_OFFLINE/v7_4_1/SLF6/prof/Offline/bin/createSatelliteRelease --directory .
[root@80c41be82418 DataExploration]# ls
[root@80c41be82418 DataExploration]# source setup.sh 
[root@80c41be82418 DataExploration]# scons -j4

Monte Carlo Generators

Mu2e generators and GenParticle class

Geant4 and Detector Simulation

The G4 Mu2e Detector description text files
Examine the SimParticle and StepPointMC classes
Virtual detectors

Digitized signals

The term 'digi' refers to the digitized detector data stored during Mu2e operations by the Data Acquisition (DAQ) system.

Exercise 1: Tracker digis

Run the Ex01 example: this creates a few histograms of the straw digis. First, process a pure μ- → e- conversion sample:

[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art
[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root
root [1] ESD->Get("NStrawDigis")->Draw();

You should see ~40 StrawDigis/event on average. Now try with a μ- → e- conversion sample with beam backgrounds mixed in:

[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl --TFile ExploreStrawDigis_mix.root $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art 
[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis_mix.root

You should see around 2300 StrawDigis/event. The signal/noise for raw data is < 2% ! This is why we need background rejection and pattern recognition. Now look at the TDC and ADC spectra:

[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root
root [] ESD->Get("tdc")->Draw();
root [] ESD->Get("deltatdc")->Draw();
root [] ESD->Get("tot")->Draw();
root [] ESD->Get("adc")->Draw();

Examine the module source file using your favorite editor. For instance, if you use the 'vi' editor, the command is:

[root@80c41be82418 DataExploration]# vi Examples/src/ExploreStrawDigis_module.cc
...

Substitute 'emacs' for 'vi' if that's your prefered editor. You can also edit the file on your native machine using whatever editor you have installed there. For instance, in a native terminal window:

> vim $HOME/Tutorials/DataExploration/Examples/src/ExploreStrawDigis_module.cc

Look to see how the see how collections are retrieved from the event, and the include files that are pulled in. Also look at the fcl script Examples/fcl/Ex01.fcl

Questions:

What quantities do the different fields in the StrawDigi correspond to?
What differences do you see between the different distributions comparing pure signal and signal plus background? Can you explain them?

Exercise 2: Calo digis

In this exercise, we'll explore calo crystal and calo cluster digis. Some supporting slides are written in Mu2e doc 26766.

We will start with a few questions:

1) Which data product contains crystal hits? How can you find the time / energy of a hit?
Answer: The data product is CaloCrystalHit, described in RecoDataProduct/inc/CaloCrystalHit.hh. The member functions time() and energy() give the corresponding information.

2) Which modules produce calorimeter clusters. What is the difference between them? Which data product should you use?
Answer: CaloProtoClusterFromCrystalHits and CaloClusterFromProtoCluster. CaloProtoClusterFromCrystalHits forms simply connected clusters from calorimeter hits. CaloClusterFromProtoCluster combines proto clusters close in time / distance into final clusters. You should use CaloClusters, unless you want to study how the proto-clusters are merged together.

3) Which data member indicates whether a cluster is contains several proto-clusters or a single one?
Answer: The boolean variable isSplit is true if the cluster contains several proto-clusters

4) How do I access the list of crystal hits contained a cluster!
Answer: The caloCrystalHitsPtrVector is a vector containing a list of art::Ptr to the CaloCrystalHits.

Now that you are all warmed up, we'll make a few plots (I know, this is getting so exciting!). First run the following snippet to produce the required data, then load the TTree in memory. The TTree name is DumpCaloDigis/Calo.

> mu2e -c Examples/fcl/Ex02.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art
> root -l ExploreCaloDigis.root
> TTree *calo = (TTree*) _file0->Get("DumpCaloDigis/Calo")

What is available in this ntuple?
Hint: Look at the file Examples/src/ExploreCaloDigis_module.cc and the corresponding data products. Most of the names are self-explanatory, but a few other more cryptic!

Tip: you can use the TBrowser to inspect the content of a file, simply type

> TBrowser tb;

Now histogram the energy of the crystal hits (switching to log scale is a good idea here):

 > calo->Draw("cryEnergyDep")

You should a rapidly falling distribution, as most hits are low energy. Now let's plot the crystal hits only in the second disk (first disk ID=0, second disk ID=1)

> calo->Draw("cryEnergyDep","cryDiskId==1")

Can you histogram the position of each crystal hit?

> calo->Draw("cryPosY:cryPosX","","box")

You should see 674 boxes... the bigger the box, the larger the number of hits in that crystal. As expected, there are more hits in the central region.

You should be on fire at this point, so we'll look at the cluster. Let's plot the number of crystals in the cluster

> calo->Draw("cluNumCrystals")

Next, draw the energy of all clusters with a radius of the center-of-gravity greater than 400 (less than 400)

> calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)>400")
> calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)<400")

There is a lot of noise below 400! What about clusters in disk 0 and disk 1? Can we plot both on the same plot?

> calo->Draw("cluEnergy","cluIsSplit==0")
> calo->Draw("cluEnergy","cluIsSplit==1","same")

As expected, disk 1 is cleaner!

Bonus: If you feel audacious, try to write an analysis module to do the following:

> Plot the energy and time of all crystal hits in the microbunch
> Plot the energy of all clusters with a radial location greater than 400 mm. 
> Plot the energy of clusters containing a single proto-cluster or several proto-clusters in two separate histograms   
> Plot the energy of the most energetic hit in the cluster

An implementation is shown in Examples/src/ExploreCaloDigis_module.cc

Reconstruction

Digi data objects must be processed through calibration algorithms to convert raw digital values (ADC, TDC, StrawId ...) into quantities with physical units (energy, time, position, ...). Additionally, digis created by the same particle need to be identified and grouped together to increase the amount of information we can extract from the data. The following exercises go through reconstruction for the tracker data.

Excerise 3: StrawHits and ComboHits

StrawHit and ComboHit objects are created from the StrawDigis by applying calibration algorithms. StrawHits have a 1-1 relation with StrawDigis, and are only used for calibration. A ComboHit can represent either a single straw, or a group of contiguous straws in the same panel, in which case the physical properties are averaged. Panel-based ComboHits are used in the track pattern recognition which is discussed in the following exercises.

To study the hit reconstruction, process a collection of StrawDigis from a sample of signal events with mixed beam backgrounds, and look at the properties of the StrawHits and Combohits that get produced using dedicated diagnostic modules and their associated TTrees.

[root@d7d1258ed46e DataExploration]# mu2e -c Examples/fcl/Ex03.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art
[root@d7d1258ed46e DataExploration]# root -l ExploreComboHits.root
root [2] shd = (TTree*)SHD->Get("shdiag")
root [3] chd = (TTree*)CHD->Get("chdiag");

The following projection shows the correlation between the tot (TimeOverThreshold) and the MC true path length of a signal electron particle through the straw gas:

root [3] shd->Draw("tot:mcplen>>totvplen(50,0,10,50,0,50)","mcpdg==11&&mcgen==2","colorz")

The following compares the energy deposition for signal electrons, low-energy electrons produced by beam background processes, and protons produced in nuclear breakup following muon nuclear capture on Al:

root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==11&&mcgen==2")
root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==11&&mcgen<0")
root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==2212")

Questions:

Why does the energy deposition depend on the path length through the straw gas?
Why does the energy deposition depend on the particle species? On signal vs background electrons?

The following plots show how the time difference can be used to measure the particle position along the length of the straw (longitudinal position). Plot the time difference against the MC true particle position along the straw for different energy depositions.

root [3] shd->Draw("tcal-thv:mcshlen","edep<0.0005");
root [4] shd->Draw("tcal-thv:mcshlen","edep<0.004");

Questions:

What is the slope of the relationship between time and distance? (time is measured in nanoseconds, distance in mm)
What is the physical significance of this relationship? If you interpret it as a velocity, how does it compare to the speed of light?
How does the relationship change for different energy depositions? Can you guess why?

Now look at the ComboHits. These use a model of the straw signal longitudinal propagation velocity to convert delta-t into a longitudinal position estimate. Compare the difference between the measured and true longitudinal positions, and fit the difference to a Gaussian to extract the longitudinal resolution. Try this for ComboHits made from exactly 1 straw first:

root [] chd->Draw("wdist:mcdist");
root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)");
root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)","edep<0.0005");
root [] wres->Fit("gaus");
root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)","edep>0.004");
root [] wres->Fit("gaus");

Now, edit Examples/fcl/Ex03.fcl and change the diagnostics to look at panel-based ComboHits. re-run mu2e with the edited script, and look again at the longitudinal resolution.

Questions:

What is the longitudinal resolution? Can you guess why it depends on the energy deposition?
How does the resolution change between single-straw and panel-based Combo hits?

To assist the downstream pattern recognition, we flag ComboHits according to some of their properties. The following plots show how those flags relate to the MC true particle origins. Make sure you are using the panel-based ComboHits when you make these plots:

root [] chd->Draw("bkgclust","mcpdg==11&&mcgen==2");
root [] chd->Draw("bkg","mcpdg==11&&mcgen==2");
root [] chd->Draw("bkgclust","mcpdg==11&&mcgen<0");
root [] chd->Draw("bkg","mcpdg==11&&mcgen<0");

The 'bkgclust' flag says this hit is part of a group of ComboHits tightly clustered in space and time. The 'bkg' flag says this cluster was identified as likely to be a from a beam background process (Compton scattering, delta-ray, ...), and will be ignored in subsequent pattern recognition.

Questions:

Why do both signal (mcgen==2) and beam background (mcgen<0) electrons form clusters?
what properties of a cluster might be used to distinguish beam background electrons from signal?

Pattern Recognition

A signal particle takes 10-20 nsec to cross the tracker and the maximum drift time for electrons generated in a straw to reach the wire is ~40 nsec, so all the hits generated by a signal particle are closely clustered in time. We identify potential signal candidates by making a histogram of ComboHit times and finding local maxima. Signal particles describe a helix as they move through the tracker. We refine the signal candidates by fitting the positions of ComboHits in a time cluster to a helix: a circle in XY and a line in Z.

Excericse 4: Time clustering and Helix Finding

The following exercises show the results of these signal identification. You will run the pattern recognition on ComboHits looking for downstream electrons, and make diagnostic plots of the time spectra and hit positions, with the signal candidates overlaid. First, process the mixed signal + background sample:

[root@d7d1258ed46e DataExploration]# mu2e -c Examples/fcl/Ex04.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art
[root@d7d1258ed46e DataExploration]# root -l ExploreTrackReco.root

Plot the hit time spectra. Each histogram is a separate event. The individual hits are in yellow and green, candidate clusters are outlined in blue and the central time is indicated by a triangle. The Monte Carlo true signal particle hit times are shown in red.

root [1] .L Examples/test/PlotTimeSpectra.C
root [3] PlotTimeSpectra(TCD)

Questions:

How many hits are in a typical signal time cluster?
Why do you think there are so many time clusters found each event? What particles might make the other clusters?
Why do the number of time clusters vary so widely between events?

Plot the candidate helices. Each histogram is a separate event. The top plot shows the XY projection of the ComboHits positions and their estimated errors as ellipses; the time-difference position resolution defines the longitudinal error, the straw diameter (5mm) defines the transverse. Yellow and orange hits are flagged as 'bkg' and not used in the fit, red are hits from the time cluster and used in the helix fit. The reconstructed helix is shown as a red circle, the Monte Carlo true signal particle hit positions are blue dots, and the blue, and the (approximate) true helix is shown in blue.

root [4] .L Examples/test/PlotHelices.C 
root [5] PlotHelices ph(HD)
root [6] ph.plot()

Questions:

How well does the pattern recognition do in using all the signal hits (efficiency)?
How well does the pattern recognition do in rejecting beam background hits (purity)?
Which of these do you think is more important for correctly estimating the helix?

Kalman filter fit

If a helix is found, its hits and parameters are passed to a Kalman filter fit. The Kalman filter fit uses a detailed model of how electrons drift inside the straw to provide a much more accurate estimate of the position of the particle, but only transverse to the straw. It also accounts for the effects of energy loss and scattering as the particle goes through the straws, and for the small inhomogeneities in the DS solenoidal field (the helix fit used in pattern recognition assumes a perfect solenoidal field.). You will study the results of the Kalman fit using detailed TrkAna TTrees. These are produced from the KalSeed objects that are the end product of the full reconstruction chain.

Exercise 5: KalSeed Properties

In this exercise you will look at the properties of hits that are used in the Kalman fit. First, prepare some TrkAna trees with detailed hit information using a sample of conversion electron signal events with beam backgrounds mixed in:

[root@d7d1258ed46e DataExploration]# mu2e -c Examples/fcl/Ex05.fcl $TUTORIAL_DATA/mcs.mu2e.CeEndpoint-mix.MDC2018h.001002_00000000.art

Then, compare the reconstructed drift time (= hit time - hit t0) with the MC true distance of closest approach between the particle and the wire. These are related through the finite time it takes the ionization electrons produced in the gas to reach the wire. Electrons are attracted to the wire because it is held at a positive potential compared to the outer straw wall.

[root@2971d331fb1e DataExploration]# root -l TrkAnaReco.root 
root [2] t = (TTree*)TrkAnaNeg->Get("trkana");
root [3] t->Draw("detsh._tdrift:detshmc._dist>>dvel(50,0,2.5,50,-10,50)","detsh._active","colorz");

Questions:

Why is the relationship between the time and the distance roughly linear? What does the slope represent physically? (Hint: the electron is moving through the staw gas)
Why isn't it exactly linear? (Hint: think about the electric field produced in the straw)
Why is the relationship more smeared out when the particle comes close to the wire? (Hint: it has to do with ionization statistics)

The drift time can be converted into a distance using a model of the electron drift velocity. We can also compute the geometric distance between the reconstructed track and the wire, which is called DOCA (Distance of closest approach). Note that DOCA is defined to be signed by the angular momentum between the track and the wire. Lets compare these against each other:

root [8] t->Draw("detsh._doca:detsh._rdrift>>doca(50,0,2.5,50,-3,3)","detsh._active","colorz")

Questions:

Why are there 2 branches to this relationship?

To use the hit to constrain the position of the track in the fit, we must assign a left right ambiguity to each drift distance. This tells the fit whether the track is passing on one side or the other of the wire. The hit ambiguity must be determined from data. Lets compared this to the MC true drift position, signed in the same way. Note that some of the hits are assigned a null (0) ambiguity, in that case the drift distance is taken to be 0.

root [11] t->Draw("detsh._rdrift*detsh._ambig:detshmc._dist*detshmc._ambig","detsh._active","colorz")

Questions:

Why are there anti-correlated entries?
Why do our algorithms choose sometimes a null ambiguity? (Hint: it's related to the answer of the previous question)

To give an accurate estimate of the momentum of the particle at it's production point, the Kalman filter fit must correct for the energy the particle loses through ionization and radiation as it traverses the straws. This estimate is computed for each straw that intersects the trajectory of the fit. Plot the estimate as a function of the DOCA to the wire:

root [15] t->Draw("detsm._dp:detsm._doca","detsm._active");

Questions:

Assuming a typical particle traverses 40 straws, what is the total change in momentum expected over the length of the tracker?
Why is the momenum change larger when the particle passes near the edges of the straw?

Future Exercises

Calorimeter reconstruction algorithms and data products
- Readout hit reconstruction
- Crystal hit
- Simply connected clusters
- Full clusters
CRV reconstruction algorithms and data products

Reference Materials

Mu2e doc 4914 Packet Definition
Mu2e doc 22693 Mock Data Challenge 2018

Glossary of Raw and Reconstructed Data Products

class	description	contents
StrawDigi	Offline format of a single Tracker hit	TDC and TOT from both straw ends, ADC waveform
ComboHit	Calibrated Tracker hit, or an aggregate of several hits	position in space, time, and time differences
TimeCluster	Collection of ComboHits nearby in time and (roughly) space	average time and error
HelixSeed	Helix interpretation of a subset of hits in a TimeCluster	Helix parameters, t0, ComboHits with position along the helix
KalRep	Full Kalman filter fit result: not persistable	Complete set of weight and parameter matrices and vectors used in the fit
KalSeed	Compact summary of the Kalman filter fit result	Sampled fit segments, associated straw hits and straws
KalSegment	KalSeed component: local fit result	Fit parameters and covariance at a particular point
TrkStrawHitSeed	KalSeed component: straw hit as used in fit	hit position, residual, time, drift radius, errors, ...
TrkStraw	KalSeed component: straw intersected by the fit	strawID, DOCA to wire, radiation length, energy loss, ...
CaloRecoDigi	Calorimeter readout reconstructed hit	readout ID, energy, time, chi2 of fit to waveform,...
CaloCrystalHit	Calorimeter crystal reconstructed hit	crystal ID, energy, time and associated errors
CaloCluster	Full cluster of calorimeter crystal hits	Total energy, center of gravity (COG), energy moments
CrvCoincidenceCluster	Cluster of adjacent CRV reco pulses	position, PE count, start and end times

Glossary of Principle Reconstruction Modules

module	category	description
StrawDigisFromStepPointMCs	Simulation	Converts G4 straw energy deposits into StrawDigs
StrawHitReco	Reconstruction	Converts StrawDigs into single-straw ComboHits
CombineStrawHits	Reconstruction	Combines adjacent ComboHits in a panel into aggregate ComboHits
FlagBkgHits	Reconstruction	Identify (flag) panel ComboHits likely produced by low-energy Compton or delta-ray electrons
TimeClusterFinder	Reconstruction	Group time-adjacent panel ComboHits (and calorimeter cluster if available) into a cluster
RobustHelixFinder	Reconstruction	Fit a cluster of panel ComboHits to a simple helix using space-point positions
CalTimePeakFinder	Reconstruction	Group panel ComboHits near a calorimeter cluster in time into a cluster
CalHelixFinder	Reconstruction	Fit the calorimeter cluster position, target position and panel ComboHits to a simple helix
KalSeedFit	Reconstruction	Fit single-straw transverse wire positions to a helix, using a simple helix as starting point
KalFinalFit	Reconstruction	Kalman filter fit of single-straw drift ellipses, constrained with calorimeter cluster time (if present)
CaloRecoDigiFromDigis	Reconstruction	Extract hit from the waveform digitized by each calorimeter readout
CaloCrystalHitFromHits	Reconstruction	Combine readout hits from the same crystal to form crystal hits
CaloProtoClusterFromCrystalHits	Reconstruction	Build simply connected cluster of crystal hits
CaloClusterFromProtoCluster	Reconstruction	Associates proto-clusters separated by small distance into full clusters

Data Products and Processing Tutorial

Contents

Tutorial Session Goal

Session Prerequisites and Advance Preparation

Session Introduction

Exercises

General

Monte Carlo Generators

Geant4 and Detector Simulation

Digitized signals

Exercise 1: Tracker digis

Exercise 2: Calo digis

Reconstruction

Excerise 3: StrawHits and ComboHits

Pattern Recognition

Excericse 4: Time clustering and Helix Finding

Kalman filter fit

Exercise 5: KalSeed Properties

Future Exercises

Reference Materials

Glossary of Raw and Reconstructed Data Products

Glossary of Principle Reconstruction Modules

Navigation menu

Data Products and Processing Tutorial

Tutorial Session Goal

Session Prerequisites and Advance Preparation

Session Introduction

Exercises

General

Monte Carlo Generators

Geant4 and Detector Simulation

Digitized signals

Exercise 1: Tracker digis

Exercise 2: Calo digis

Reconstruction

Excerise 3: StrawHits and ComboHits

Pattern Recognition

Excericse 4: Time clustering and Helix Finding

Kalman filter fit

Exercise 5: KalSeed Properties

Future Exercises

Reference Materials

Glossary of Raw and Reconstructed Data Products

Glossary of Principle Reconstruction Modules

Navigation menu

Search