Data Products and Processing Tutorial
Tutorial Session Goal
This tutorial will explore the data products used in Mu2e and the modules and algorithms which create them. It is part of the June 2019 Computing and Software tutorial
Session Prerequisites and Advance Preparation
This tutorial assumes knowledge of art and the Mu2e detector. You will need to understand basic principles of how modules and event processing function in art. You will need to understand C++ data structures and fundamental types. You should have completed the following tutorials:
- Mu2e detector overview
- Mu2e_Offline_Tutorial
- Running_Art_Tutorial
Session Introduction
The information content of Mu2e is stored in the form of art data products. There are several levels of information:
- Monte Carlo generator information
- Geant4 information
- Digitized detector data, or digis (Offline format)
- Reconstructed data
We will explore a few of these, and the algorithms which create them.
Exercises
General
On your machine, setup an area for this tutorial, and launch Docker interactively with a link to it. The docker command is for macos, modify the Display setting as needed for windows, linux, see Docker for instructions.
> cd $HOME > mkdir Tutorials > mkdir Tutorials/DataExploration > docker run -it --rm -v /Users/brownd/Tutorials/DataExploration:/home/DataExploration -e DISPLAY=$ip:0 mu2e/user:tutorial_1-02
Inside the docker window, setup a satellite release for data exploration exercises:
[root@80c41be82418 home]# source /Tutorials_2019/setup_container.sh [root@80c41be82418 home]# cp -r /Tutorials_2019/DataExploration/* /home/DataExploration/ [root@80c41be82418 home]# cd /home/DataExploration/ [root@80c41be82418 DataExploration]# $TUTORIAL_OFFLINE/v7_4_1/SLF6/prof/Offline/bin/createSatelliteRelease --directory . [root@80c41be82418 DataExploration]# ls [root@80c41be82418 DataExploration]# source setup.sh [root@80c41be82418 DataExploration]# scons -j4
Monte Carlo Generators
- Mu2e generators and GenParticle class
Geant4 and Detector Simulation
- The G4 Mu2e Detector description text files
- Examine the SimParticle and StepPointMC classes
- Virtual detectors
Digitized signals
The term 'digi' refers to the digitized detector data stored during Mu2e operations by the Data Acquisition (DAQ) system.
Exercise 1: Tracker digis
Run the Ex01 example: this creates a few histograms of the straw digis. First, process a pure μ- → e- conversion sample:
[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art [root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root root [1] ESD->Get("NStrawDigis")->Draw();
You should see ~40 StrawDigis/event on average. Now try with a μ- → e- conversion sample with beam backgrounds mixed in:
[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl --TFile ExploreStrawDigis_mix.root $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art [root@80c41be82418 DataExploration]# root -l ExploreStrawDigis_mix.root
You should see around 2300 StrawDigis/event. The signal/noise for raw data is < 2% ! This is why we need background rejection and pattern recognition. Now look at the TDC and ADC spectra:
[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root root [] ESD->Get("tdc")->Draw(); root [] ESD->Get("deltatdc")->Draw(); root [] ESD->Get("tot")->Draw(); root [] ESD->Get("adc")->Draw();
Examine the module source file using your favorite editor. For instance, if you use the 'vi' editor, the command is:
[root@80c41be82418 DataExploration]# vi Examples/src/ExploreStrawDigis_module.cc ...
Substitute 'emacs' for 'vi' if that's your prefered editor. You can also edit the file on your native machine using whatever editor you have installed there. For instance, in a native terminal window:
> vim $HOME/Tutorials/DataExploration/Examples/src/ExploreStrawDigis_module.cc
Look to see how the see how collections are retrieved from the event, and the include files that are pulled in. Also look at the fcl script Examples/fcl/Ex01.fcl
Questions:
- What quantities do the different fields in the StrawDigi correspond to?
- What differences do you see between the different distributions comparing pure signal and signal plus background? Can you explain them?
Exercise 2: Calo digis
In this exercise, we'll explore calo crystal and calo cluster digis. Some supporting slides are written in Mu2e doc 26766.
We will start with a few questions:
1) Which data product contains crystal hits? How can you find the time / energy of a hit?
Answer: The data product is CaloCrystalHit, described in RecoDataProduct/inc/CaloCrystalHit.hh. The member functions time() and energy() give the corresponding information.
2) Which modules produce calorimeter clusters. What is the difference between them? Which data product should you use?
Answer: CaloProtoClusterFromCrystalHits and CaloClusterFromProtoCluster. CaloProtoClusterFromCrystalHits forms simply connected clusters from calorimeter hits. CaloClusterFromProtoCluster combines proto clusters close in time / distance into final clusters. You should use CaloClusters, unless you want to study how the proto-clusters are merged together.
3) Which data member indicates whether a cluster is contains several proto-clusters or a single one?
Answer: The boolean variable isSplit is true if the cluster contains several proto-clusters
4) How do I access the list of crystal hits contained a cluster!
Answer: The caloCrystalHitsPtrVector is a vector containing a list of art::Ptr to the CaloCrystalHits.
Now that you are all warmed up, we'll make a few plots (I know, this is getting so exciting!). First run the following snippet to produce the required data, then load the TTree in memory. The TTree name is DumpCaloDigis/Calo.
> mu2e -c Examples/fcl/Ex02.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art > root -l ExploreCaloDigis.root > TTree *calo = (TTree*) _file0->Get("DumpCaloDigis/Calo")
What is available in this ntuple?
Hint: Look at the file Examples/src/ExploreCaloDigis_module.cc and the corresponding data products. Most of the names are self-explanatory, but a few other more cryptic!
Tip: you can use the TBrowser to inspect the content of a file, simply type
> TBrowser tb;
Now histogram the energy of the crystal hits (switching to log scale is a good idea here):
> calo->Draw("cryEnergyDep")
You should a rapidly falling distribution, as most hits are low energy. Now let's plot the crystal hits only in the second disk (first disk ID=0, second disk ID=1)
> calo->Draw("cryEnergyDep","cryDiskId==1")
Can you histogram the position of each crystal hit?
> calo->Draw("cryPosY:cryPosX","","box")
You should see 674 boxes... the bigger the box, the larger the number of hits in that crystal. As expected, there are more hits in the central region.
You should be on fire at this point, so we'll look at the cluster. Let's plot the number of crystals in the cluster
> calo->Draw("cluNumCrystals")
Next, draw the energy of all clusters with a radius of the center-of-gravity greater than 400 (less than 400)
> calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)>400") > calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)<400")
There is a lot of noise below 400! What about clusters in disk 0 and disk 1? Can we plot both on the same plot?
> calo->Draw("cluEnergy","cluIsSplit==0") > calo->Draw("cluEnergy","cluIsSplit==1","same")
As expected, disk 1 is cleaner!
Bonus: If you feel audacious, try to write an analysis module to do the following:
> Plot the energy and time of all crystal hits in the microbunch > Plot the energy of all clusters with a radial location greater than 400 mm. > Plot the energy of clusters containing a single proto-cluster or several proto-clusters in two separate histograms > Plot the energy of the most energetic hit in the cluster
An implementation is shown in Examples/src/ExploreCaloDigis_module.cc
Reconstruction
Digi data objects must be processed through calibration algorithms to convert raw digital values (ADC, TDC, StrawId ...) into quantities with physical units (energy, time, position, ...). Additionally, digis created by the same particle need to be identified and grouped together to increase the amount of information we can extract from the data. The following exercises go through reconstruction for the tracker data.
Excerise 3: StrawHits and ComboHits
StrawHit and ComboHit objects are created from the StrawDigis by applying calibration algorithms. StrawHits have a 1-1 relation with StrawDigis, and are only used for calibration. A ComboHit can represent either a single straw, or a group of contiguous straws in the same panel, in which case the physical properties are averaged. In this exercise we will process a collection of StrawDigis into StrawHits and Combohits, and look at their properties in associated TTrees.
[root@d7d1258ed46e DataExploration]# mu2e -c Examples/fcl/Ex03.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art [root@d7d1258ed46e DataExploration]# root -l ExploreComboHits.root root [2] shd = (TTree*)SHD->Get("shdiag") root [3] chd = (TTree*)CHD->Get("chdiag");
The following projection shows the correlation between the tot (TimeOverThreshold) and the MC true path length of a signal electron particle through the straw gas:
root [3] shd->Draw("tot:mcplen>>totvplen(50,0,10,50,0,50)","mcpdg==11&&mcgen==2","colorz")
The following compares the energy deposition for signal electrons, low-energy electrons produced by beam background processes, and protons produced in nuclear breakup following muon nuclear capture on Al:
root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==11&&mcgen==2") root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==11&&mcgen<0") root [4] shd->Draw("edep>>edep(100,0,0.01)","mcpdg==2212")
Questions:
- Why does the energy deposition depend on the path length through the straw gas?
- Why does the energy deposition depend on the particle species? On signal vs background electrons?
The following plots show how the time difference can be used to measure the particle position along the length of the straw (longitudinal position). Plot the time difference against the MC true particle position along the straw for different energy depositions.
root [3] shd->Draw("tcal-thv:mcshlen","edep<0.0005"); root [4] shd->Draw("tcal-thv:mcshlen","edep<0.004");
Questions:
- What is the slope of the relationship between time and distance? (time is measured in nanoseconds, distance in mm)
- What is the physical significance of this relationship? If you interpret it as a velocity, how does it compare to the speed of light?
- How does the relationship change for different energy depositions? Can you guess why?
Now look at the ComboHits. These use a model of the straw signal longitudinal propagation velocity to convert delta-t into a longitudinal position estimate. Compare the difference between the measured and true longitudinal positions, and fit the difference to a Gaussian to extract the longitudinal resolution. Try this for ComboHits made from exactly 1 straw first:
root [] chd->Draw("wdist:mcdist"); root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)"); root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)","edep<0.0005"); root [] wres->Fit("gaus"); root [] chd->Draw("wdist-mcdist>>wres(100,-250,250)","edep>0.004"); root [] wres->Fit("gaus");
Now, edit Examples/fcl/Ex03.fcl and change the diagnostics to look at panel-based ComboHits. re-run mu2e with the edited script, and look again at the longitudinal resolution.
Questions:
- What is the longitudinal resolution? Can you guess why it depends on the energy deposition?
- How does the resolution change between single-straw and panel-based Combo hits?
Pattern Recognition
A signal particle takes 10-20 nsec to cross the tracker and the maximum drift time for electrons generated in a straw to reach the wire is ~40 nsec, so all the hits generated by a signal particle are closely clustered in time. We identify potential signal candidates by making a histogram of ComboHit times and finding local maxima. Signal particles describe a helix as they move through the tracker. We refine the signal candidates by fitting the positions of ComboHits in a time cluster to a helix: a circle in XY and a line in Z.
Excericse 4: Time clustering and Helix Finding
The following exercises show the results of these signal identification. You will run the pattern recognition on ComboHits looking for downstream electrons, and make diagnostic plots of the time spectra and hit positions, with the signal candidates overlaid. First, process the mixed signal + background sample:
[root@d7d1258ed46e DataExploration]# mu2e -c Examples/fcl/Ex04.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art [root@d7d1258ed46e DataExploration]# root -l ExploreTrackReco.root
Plot the hit time spectra. Each histogram is a separate event. The individual hits are in yellow and green, candidate clusters are outlined in blue and the central time is indicated by a triangle. The Monte Carlo true signal particle hit times are shown in red.
root [1] .L Examples/test/PlotTimeSpectra.C root [3] PlotTimeSpectra(TCD)
Plot the candidate helices. Each histogram is a separate event. The top plot shows the XY projection of the ComboHits positions and their estimated errors as elipses, yellow and orange are not used in the fit, red are hits from the time cluster and used in the helix fit, shown in red. The Monte Carlo true signal particle hit positions are blue circles, and the (approximate) true helix is shown in blue.
root [4] .L Examples/test/PlotHelices.C root [5] PlotHelices ph(HD) root [6] ph.plot()
Kalman filter fit
- Calorimeter reconstruction algorithms and data products
- Readout hit reconstruction
- Crystal hit
- Simply connected clusters
- Full clusters
- CRV reconstruction algorithms and data products
Reference Materials
- Mu2e doc 4914 Packet Definition
- Mu2e doc 22693 Mock Data Challenge 2018
Glossary of Raw and Reconstructed Data Products
class | description | contents |
---|---|---|
StrawDigi | Offline format of a single Tracker hit | TDC and TOT from both straw ends, ADC waveform |
ComboHit | Calibrated Tracker hit, or an aggregate of several hits | position in space, time, and time differences |
TimeCluster | Collection of ComboHits nearby in time and (roughly) space | average time and error |
HelixSeed | Helix interpretation of a subset of hits in a TimeCluster | Helix parameters, t0, ComboHits with position along the helix |
KalRep | Full Kalman filter fit result: not persistable | Complete set of weight and parameter matrices and vectors used in the fit |
KalSeed | Compact summary of the Kalman filter fit result | Sampled fit segments, associated straw hits and straws |
KalSegment | KalSeed component: local fit result | Fit parameters and covariance at a particular point |
TrkStrawHitSeed | KalSeed component: straw hit as used in fit | hit position, residual, time, drift radius, errors, ... |
TrkStraw | KalSeed component: straw intersected by the fit | strawID, DOCA to wire, radiation length, energy loss, ... |
CaloRecoDigi | Calorimeter readout reconstructed hit | readout ID, energy, time, chi2 of fit to waveform,... |
CaloCrystalHit | Calorimeter crystal reconstructed hit | crystal ID, energy, time and associated errors |
CaloCluster | Full cluster of calorimeter crystal hits | Total energy, center of gravity (COG), energy moments |
CrvCoincidenceCluster | Cluster of adjacent CRV reco pulses | position, PE count, start and end times |
Glossary of Principle Reconstruction Modules
module | category | description |
---|---|---|
StrawDigisFromStepPointMCs | Simulation | Converts G4 straw energy deposits into StrawDigs |
StrawHitReco | Reconstruction | Converts StrawDigs into single-straw ComboHits |
CombineStrawHits | Reconstruction | Combines adjacent ComboHits in a panel into aggregate ComboHits |
FlagBkgHits | Reconstruction | Identify (flag) panel ComboHits likely produced by low-energy Compton or delta-ray electrons |
TimeClusterFinder | Reconstruction | Group time-adjacent panel ComboHits (and calorimeter cluster if available) into a cluster |
RobustHelixFinder | Reconstruction | Fit a cluster of panel ComboHits to a simple helix using space-point positions |
CalTimePeakFinder | Reconstruction | Group panel ComboHits near a calorimeter cluster in time into a cluster |
CalHelixFinder | Reconstruction | Fit the calorimeter cluster position, target position and panel ComboHits to a simple helix |
KalSeedFit | Reconstruction | Fit single-straw transverse wire positions to a helix, using a simple helix as starting point |
KalFinalFit | Reconstruction | Kalman filter fit of single-straw drift ellipses, constrained with calorimeter cluster time (if present) |
CaloRecoDigiFromDigis | Reconstruction | Extract hit from the waveform digitized by each calorimeter readout |
CaloCrystalHitFromHits | Reconstruction | Combine readout hits from the same crystal to form crystal hits |
CaloProtoClusterFromCrystalHits | Reconstruction | Build simply connected cluster of crystal hits |
CaloClusterFromProtoCluster | Reconstruction | Associates proto-clusters separated by small distance into full clusters |