Data Products and Processing Tutorial

From Mu2eWiki
Revision as of 18:54, 19 June 2019 by Echenard (talk | contribs)
Jump to navigation Jump to search

Tutorial Session Goal

This tutorial will explore the data products used in Mu2e and the modules and algorithms which create them. It is part of the June 2019 Computing and Software tutorial

Session Prerequisites and Advance Preparation

This tutorial assumes knowledge of art and the Mu2e detector. You will need to understand basic principles of how modules and event processing function in art. You will need to understand C++ data structures and fundamental types. You should have completed the following tutorials:

Session Introduction

The information content of Mu2e is stored in the form of art data products. There are several levels of information:

  • Monte Carlo generator information
  • Geant4 information
  • Digitized detector data, or digis (Offline format)
  • Reconstructed data

We will explore a few of these, and the algorithms which create them.

Exercises

General

On your machine, setup an area for this tutorial, and launch Docker interactively with a link to it. The docker command is for macos, modify the Display setting as needed for windows, linux, see Docker for instructions.

> cd $HOME
> mkdir Tutorials
> mkdir Tutorials/DataExploration
> docker run -it --rm -v /Users/brownd/Tutorials/DataExploration:/home/DataExploration -e DISPLAY=$ip:0 mu2e/user:tutorial_1-02 
 

Inside the docker window, setup a satellite release for data exploration exercises:

[root@80c41be82418 home]# source /Tutorials_2019/setup_container.sh
[root@80c41be82418 home]# cp -r /Tutorials_2019/DataExploration/* /home/DataExploration/
[root@80c41be82418 home]# cd /home/DataExploration/
[root@80c41be82418 DataExploration]# $TUTORIAL_OFFLINE/v7_4_1/SLF6/prof/Offline/bin/createSatelliteRelease --directory .
[root@80c41be82418 DataExploration]# ls
[root@80c41be82418 DataExploration]# source setup.sh 
[root@80c41be82418 DataExploration]# scons -j4
 

Monte Carlo Generators

  • Mu2e generators and GenParticle class

Geant4 and Detector Simulation

  • The G4 Mu2e Detector description text files
  • Examine the SimParticle and StepPointMC classes
  • Virtual detectors

Digitized signals

The term 'digi' refers to the digitized detector data stored during Mu2e operations by the Data Acquisition (DAQ) system.

Exercise 1: Tracker digis

Run the Ex01 example: this creates a few histograms of the straw digis. First, process a pure μ- → e- conversion sample:

[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art
[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root
root [1] ESD->Get("NStrawDigis")->Draw();
 

You should see ~40 StrawDigis/event on average. Now try with a μ- → e- conversion sample with beam backgrounds mixed in:

[root@80c41be82418 DataExploration]# mu2e -c Examples/fcl/Ex01.fcl --TFile ExploreStrawDigis_mix.root $TUTORIAL_DATA/dig.mu2e.CeEndpoint-mix-subset.MDC2018d.001002_00000000.art 
[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis_mix.root
 

You should see around 2300 StrawDigis/event. The signal/noise for raw data is < 2% ! This is why we need background rejection and pattern recognition. Question: what is the format and what are the fields in the data collection file name and what do they mean? Hint: use the Mu2e wiki! Now look at the TDC and ADC spectra:

[root@80c41be82418 DataExploration]# root -l ExploreStrawDigis.root
root [] ESD->Get("tdc")->Draw();
root [] ESD->Get("deltatdc")->Draw();
root [] ESD->Get("tot")->Draw();
root [] ESD->Get("adc")->Draw();
 

The histograms will not have the correct range. Edit the module source file with your favorite editor to correct the histograms marked with FIXME!. and how the values are accessed from the data product. For instance, if you use the 'vi' editor, the command is:

[root@80c41be82418 DataExploration]# vi Examples/src/ExploreStrawDigis_module.cc
...

Substitute 'emacs' for 'vi' if that's your prefered editor. You can also edit the file on your native machine using whatever editor you have installed there. For instance, in a native terminal window:

> vim $HOME/Tutorials/DataExploration/Examples/src/ExploreStrawDigis_module.cc
 

While you are in the module source, look to see how the see how collections are retrieved from the event, and the include files that are pulled in. Also look at the fcl script Examples/fcl/Ex01.fcl

Questions:

  • what is the physical meaning of deltatdc? tot? cal and hv? (hint: look at the files #included by StrawDigi.hh)
  • What is the name of the StrawDigi collection produced by the simulation sequence?

Exercise 2: Calo digis

In this exercise, we'll explore calo crystal and calo cluster digis. Some supporting slides are written in Mu2e doc 26766.

We will start with a few questions:

1) Which data product contains crystal hits? How can you find the time / energy of a hit?
Answer: The data product is CaloCrystalHit, described in RecoDataProduct/inc/CaloCrystalHit.hh. The member functions time() and energy() give the corresponding information.

2) Which modules produce calorimeter clusters. What is the difference between them? Which data product should you use?
Answer: CaloProtoClusterFromCrystalHits and CaloClusterFromProtoCluster. CaloProtoClusterFromCrystalHits forms simply connected clusters from calorimeter hits. CaloClusterFromProtoCluster combines proto clusters close in time / distance into final clusters. You should use CaloClusters, unless you want to study how the proto-clusters are merged together.

3) Which data member indicates whether a cluster is contains several proto-clusters or a single one?
Answer: The boolean variable isSplit is true if the cluster contains several proto-clusters

4) How do I access the list of crystal hits contained a cluster!
Answer: The caloCrystalHitsPtrVector is a vector containing a list of art::Ptr to the CaloCrystalHits.


Now that you are all warmed up, we'll make a few plots (I know, this is getting so exciting!). First run the following snippet to produce the required data, then load the TTree in memory. The TTree name is DumpCaloDigis/Calo.

> mu2e -c Examples/fcl/Ex02.fcl $TUTORIAL_DATA/dig.mu2e.CeEndpoint.MDC2018b.001002_00000001.art
> root -l ExploreCaloDigis.root
> TTree *calo = (TTree*) _file0->Get("DumpCaloDigis/Calo")
 

What is available in this ntuple?
Hint: Look at the file Examples/src/ExploreCaloDigis_module.cc and the corresponding data products. Most of the names are self-explanatory, but a few other more cryptic!

Tip: you can use the TBrowser to inspect the content of a file, simply type

> TBrowser tb;
 

Now histogram the energy of the crystal hits (switching to log scale is a good idea here):

 > calo->Draw("cryEnergyDep")
 

You should a rapidly falling distribution, as most hits are low energy. Now let's plot the crystal hits only in the second disk (first disk ID=0, second disk ID=1)

> calo->Draw("cryEnergyDep","cryDiskId==1")
 

Can you histogram the position of each crystal hit?

> calo->Draw("cryPosY:cryPosX","","box")
 

You should see 674 boxes... the bigger the box, the larger the number of hits in that crystal. As expected, there are more hits in the central region.

You should be on fire at this point, so we'll look at the cluster. Let's plot the number of crystals in the cluster

> calo->Draw("cluNumCrystals")
 

Next, draw the energy of all clusters with a radius of the center-of-gravity greater than 400 (less than 400)

> calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)>400")
> calo->Draw("cluEnergy","sqrt(cluCogX**2+cluCogY**2)<400")
 

There is a lot of noise below 400! What about clusters in disk 0 and disk 1? Can we plot both on the same plot?

> calo->Draw("cluEnergy","cluIsSplit==0")
> calo->Draw("cluEnergy","cluIsSplit==1","same")
 

As expected, disk 1 is cleaner!

Bonus: If you feel audacious, try to write an analysis module to do the following:

> Plot the energy and time of all crystal hits in the microbunch
> Plot the energy of all clusters with a radial location greater than 400 mm. 
> Plot the energy of clusters containing a single proto-cluster or several proto-clusters in two separate histograms   
> Plot the energy of the most energetic hit in the cluster
 

An implementation is shown in Examples/src/ExploreCaloDigis_module.cc

Hit Reconstruction

  • Track reconstruction algorithms and data products
    • Hit Reconstruction
    • Time Clusters
    • Helices
    • Kalman Fit
  • Calorimeter reconstruction algorithms and data products
    • Readout hit reconstruction
    • Crystal hit
    • Simply connected clusters
    • Full clusters
  • CRV reconstruction algorithms and data products

Reference Materials

Glossary of Raw and Reconstructed Data Products

class description contents
StrawDigi Offline format of a single Tracker hit TDC and TOT from both straw ends, ADC waveform
ComboHit Calibrated Tracker hit, or an aggregate of several hits position in space, time, and time differences
TimeCluster Collection of ComboHits nearby in time and (roughly) space average time and error
HelixSeed Helix interpretation of a subset of hits in a TimeCluster Helix parameters, t0, ComboHits with position along the helix
KalRep Full Kalman filter fit result: not persistable Complete set of weight and parameter matrices and vectors used in the fit
KalSeed Compact summary of the Kalman filter fit result Sampled fit segments, associated straw hits and straws
KalSegment KalSeed component: local fit result Fit parameters and covariance at a particular point
TrkStrawHitSeed KalSeed component: straw hit as used in fit hit position, residual, time, drift radius, errors, ...
TrkStraw KalSeed component: straw intersected by the fit strawID, DOCA to wire, radiation length, energy loss, ...
CaloRecoDigi Calorimeter readout reconstructed hit readout ID, energy, time, chi2 of fit to waveform,...
CaloCrystalHit Calorimeter crystal reconstructed hit crystal ID, energy, time and associated errors
CaloCluster Full cluster of calorimeter crystal hits Total energy, center of gravity (COG), energy moments
CrvCoincidenceCluster Cluster of adjacent CRV reco pulses position, PE count, start and end times

Glossary of Principle Reconstruction Modules

module category description
StrawDigisFromStepPointMCs Simulation Converts G4 straw energy deposits into StrawDigs
StrawHitReco Reconstruction Converts StrawDigs into single-straw ComboHits
CombineStrawHits Reconstruction Combines adjacent ComboHits in a panel into aggregate ComboHits
FlagBkgHits Reconstruction Identify (flag) panel ComboHits likely produced by low-energy Compton or delta-ray electrons
TimeClusterFinder Reconstruction Group time-adjacent panel ComboHits (and calorimeter cluster if available) into a cluster
RobustHelixFinder Reconstruction Fit a cluster of panel ComboHits to a simple helix using space-point positions
CalTimePeakFinder Reconstruction Group panel ComboHits near a calorimeter cluster in time into a cluster
CalHelixFinder Reconstruction Fit the calorimeter cluster position, target position and panel ComboHits to a simple helix
KalSeedFit Reconstruction Fit single-straw transverse wire positions to a helix, using a simple helix as starting point
KalFinalFit Reconstruction Kalman filter fit of single-straw drift ellipses, constrained with calorimeter cluster time (if present)
CaloRecoDigiFromDigis Reconstruction Extract hit from the waveform digitized by each calorimeter readout
CaloCrystalHitFromHits Reconstruction Combine readout hits from the same crystal to form crystal hits
CaloProtoClusterFromCrystalHits Reconstruction Build simply connected cluster of crystal hits
CaloClusterFromProtoCluster Reconstruction Associates proto-clusters separated by small distance into full clusters