TrkAnaTutorial: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
 
(84 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Under Construction!==
The TrkAna tutorial is hosted within the GitHub repo [https://github.com/Mu2e/TrkAna/blob/main/tutorial/README.md here]
This tutorial is currently being written
 
== Session Prerequisites ==
This tutorial is aimed at anyone starting ntuple analysis.
 
Before starting this tutorial you should:
* know about the physics of Mu2e;
* have the appropriate docker container set up; and,
* know how to run the Mu2e Offline software and ROOT.
 
== Session Introduction ==
One of the final outputs of the Mu2e reconstruction are fits to the tracks in the tracker. These are stored as <code>KalSeed</code>s in <code>KalSeedCollection</code>s. For each fit hypothesis, we have a different <code>KalSeedCollection</code> (e.g. downstream electrons, downstream muons, upstream positrons).
 
In order to do analyses with these tracks, we have an art module that creates a ROOT [https://root.cern.ch/doc/master/classTTree.html TTree] of these <code>KalSeed</code>s called '''TrkAna'''. Each entry in the tree corresponds to a single track.
 
In this tutorial you will:
* create [[TrkAna]] trees using the Mu2e Offline software and MDC2018 datasets; and,
* analyze them using the ROOT command line and ROOT macros.
 
== Basic Exercises ==
These exercises are designed to be run on v7_4_1 from $TUTORIAL_BASE/TrkAna
<nowiki>> whatever command to setup in docker
> cd $TUTORIAL_BASE/TrkAna
</nowiki>
 
=== Exercise 1: Creating the simplest TrkAna tree ===
In this exercise, we will create the simplest TrkAna tree and investigate it with the ROOT command line.
 
<ol style="list-style-type:lower-alpha">
<li>First, run <code>mu2e</code> on a single CeEndpoint-mix reco art file:</li>
<nowiki> > mu2e -c fcl/TrkAnaTutEx01.fcl -S filelists/mcs.mu2e.CeEndpoint-mix-cat.MDC2018h.1-file.lst </nowiki>
<li> Now let's have a look at the TrkAna tree with the ROOT command line</li>
<nowiki> > root -l trkana-ex01.root
root[n]: TrkAnaEx01->cd()
root[n]: trkana->Print()</nowiki>
You will see the TrkAna tree structure. Here is a brief description of the branches:
<ol style="list-style-type:lower-roman">
  <li> <code>evtinfo</code>: '''ev'''en'''t''' level information (e.g. event ID of the <code>art::Event</code> this track is from) </li>
  <li> <code>hcnt</code>: '''h'''it '''c'''ou'''nt''' of different types of hit (e.g. number that pass certain selections) </li>
  <li> <code>tcnt</code>: '''t'''rack '''c'''ou'''nt''' of different track types </li>
  <li> <code>trk</code>: global fit information for the track (e.g. fit status, ranges of validity, number of hits) </li>
  <li> <code>trk(ent/mid/xit)</code>: local fit information for the track at the '''ent'''trance of the tracker, the '''mid'''dle of the tracker and e'''xit''' of the tracker (e.g. fit momentum, pitch angle) </li>
  <li> <code>trktch</code>: calorimeter hit information for the calorimeter function associated to the track (tch = '''T'''rk'''C'''alo'''H'''it)</li>
  <li> <code>crvinfo</code>: information of associated hits in the CRV </li>
  </ol>
Note that the "trk" parts of the branch names are configurable -- you will see this is a minute
<li> Now we can plot some simple things:
<ol style="list-style-type:lower-roman">
  <li> the track momentum at the tracker entrance </li>
  <nowiki>root[n]: trkana->Draw("trkent.mom")</nowiki>
  <li> the calorimeter cluster energy </li>
  <nowiki>root[n]: trkana->Draw("trktch.edep")</nowiki>
  <li> With this last command you will see some entries at -1000. This means that there is no associated calorimeter cluster for this track. To exclude these we want to want to add a cut on the <code>trktch.active</code> flag (0 = there is no TrkCaloHit, 1 = there is TrkCaloHit):</li>
  <nowiki>root[n]: trkana->Draw("trktch.edep", "trktch.active==1")</nowiki>
</ol>
<li> Let's take a quick look at the fcl file to see how the TrackAnalysisReco module has been configured. Open it up in your favourite text editor and look at these important lines: </li>
<nowiki>TrkAnaEx01 : { @table::TrackAnalysisReco }
physics.analyzers.TrkAnaEx01.candidate.input : "KFFDeM"
physics.analyzers.TrkAnaEx01.candidate.branch : "trk"
physics.analyzers.TrkAnaEx01.diagLevel : 0
physics.analyzers.TrkAnaEx01.FillMCInfo : false</nowiki>
In order, these lines:
<ol style="list-style-type:lower-roman">
  <li>import an example TrkAna module configuration (you can find it in $MU2E_BASE_RELEASE/TrkDiag/fcl/prolog.fcl);</li>
  <li>define the input KalSeedCollection that we want a TrkAna tree for (KFFDeM = '''K'''al'''F'''inal'''F'''it '''D'''ownstream '''eM'''inus);</li>
  <li>configure the name of the output branches;</li>
  <li>set TrkAna to use the lowest diagnostic level (1 = simple list of tracks, 2 = hit level diagnostics); and,</li>
  <li>make sure we are not touching the MC truth</li>
</ol>
</ol>
 
That's the end of this exercise -- you can now create a simple TrkAna tree! Try some of the following optional exercises to explore further:
<ol style="list-style-type:lower-alpha">
<li>(Optional): Run on a CeplusEndpoint-mix file (filelist/mcs.mu2e.CeplusEndpoint-mix-cat.MDC2018h.1-file.lst) and get a list of positively-charged tracks. What is the momentum of these tracks? </li>
<li>(Optional): Create a second instance of the TrackAnalysisReco module. Have one instance set to look at negatively-charged tracks and the other set to look at positively charged tracks. Run on muplusgamma-mix (filelist/mcs.mu2e.flatmugamma-mix-cat.MDC2018h.1-file.lst) and count how many tracks of each type are found</li>
</ol>
 
=== Exercise 2: Calculating the Ce efficiency ===
Now that we can create a TrkAna tree, let's calculate how efficient our reconstruction is for conversion electrons with some signal cuts. There's a lot of concepts introduced in this exercise, so don't worry if it all doesn't make sense at first. I'll start with some quick notes on '''event counting''', '''event weighting''', and '''reco quality'''.
 
'''Event Counting'''
 
In the simulation, we generate a certain number of events. However, in order to save space, we only write out events that will produce a reconstructed tracks. We have various ways of filtering events but the result is the same -- the number of events in the output art files do not correspond to the number of events that were generated. We need to know the total number of generated events in order to calculate an absolute efficiency. We keep track of the number of events that were generated by creating a <code>GenEventCount</code> object for each SubRun. Then in our TrkAna job, we run the <code>genCountLogger</code> module to read the actual number of generated events.
 
'''Event Weighting'''
 
In each "mixed" event, we add a single "primary" particle onto a set of "background frames", which represent the background hits from other processes. We want to simulate the variable intensity of the proton beam at the production target and so we scale the number of background hits when we create the mixed event. However, we still only add a single primary particle and so we record the scale factor used in a <code>ProtonBunchIntensity</code> object for use later. In our TrkAna job, we will add a new module to the trigger path (<code>PBIWeight</code>), which translates the scale factor used for the proton bunch intensity into an <code>EventWeight</code> object. The TrackAnalysisReco module will write out these proton beam scale factors to a new branch (<code>evtwt.PBIWeight</code>). Event weighting is explored in more detail in Advanced Exercise 4.
 
'''Reco Quality'''
 
We use various algorithms to check the "quality" of a track in some dimension. Currently there are two we consider:
* the track quality (i.e. how will-reconstructed a track is); and
* the particle ID (PID) quality (i.e. how closely a track resembles an electron rather than a muon).
 
In our TrkAna job, we will add two artificial neural network (ANN) based algorithms to determine these reco qualities (TrkQual and TrkCaloHitPID). In this exercise, we only care that the output of both of these modules is a <code>RecoQualCollection</code> with one <code>RecoQual</code> object per <code>KalSeed</code>. A <code>RecoQual</code> is essentially a float and, in these two algorithms, is between 0 (poorly-reconstructed, XXX PID) and 1 (well-reconstructed, YYY PID). TrkQual is explored in more detail in Advanced Exercise 3.
 
'''The Exercise'''
 
Now onto the exercise:
<ol style="list-style-type:lower-alpha">
<li>Create a TrkAna tree with CeEndpoint-mix tracks and include the <code>genCountLogger</code>, <code>PBIWeight</code>, <code>TrkQual</code> and <code>TrkCaloHitPID</code> modules</li>
<nowiki> mu2e -c fcl/TrkAnaTutEx02.fcl -S filelists/mcs.mu2e.CeEndpoint-mix-cat.MDC2018h.1-file.lst</nowiki>
If you read the file you will see that we have added:
<nowiki>physics.TrkAnaTrigPath : [ @sequence::TrkAnaReco.TrigSequence ]</nowiki>
which adds a standard set of modules including <code>PBIWeight</code> etc. (you can look in $MU2E_BASE_RELEASE/TrkDiag/fcl/prolog.fcl for more details).
<li>Also in TrkAnaTutEx02.fcl, you will see that we have changed the <code>input</code> parameter to just <code>"KFF"</code>and added the <code>suffix</code> parameter:</li>
<nowiki>physics.analyzers.TrkAnaEx02.candidate.input : "KFF"
physics.analyzers.TrkAnaEx02.candidate.branch : "trk"
physics.analyzers.TrkAnaEx02.candidate.suffix : "DeM"</nowiki>
In the Mu2e reconstruction jobs, we keep consistency between our fit hypotheses with standard suffixes to the module labels. If you look in $MU2E_BASE_RELEASE/TrkDiag/fcl/prolog.fcl, you will see that we have TrkQual'''DeM''', TrkQual'''DeP''' etc.
<li>If you open up the ROOT file and print the tree structure, you will notice the following new branches:</li>
<ol style="list-style-type:lower-roman">
  <li><code>evtwt</code>: stores the values of all <code>EventWeight</code> objects in the art event</li>
  <li><code>trkqual</code>: stores the values of all '''relevant''' <code>RecoQual</code> objects in the art event</li>
</ol>
Defining a <code>suffix</code> in the TrkAna configuration allows us to store only the '''relevant''' <code>RecoQual</code> values.
<li>(Optional): Try running without a suffix parameter and set <code>input</code> back to <code>"KFFDeM"</code>. What do you see in the output <code>trkqual</code> branch</li>
<li>Run this example ROOT macro that plots the track momentum onto a histogram with 0.5 MeV wide bins</li>
<nowiki> root -l scripts/TrkAnaTutEx02.C </nowiki>
<li>Change the bin width to 0.05 MeV</li>
<li>Add the following signal cuts to the Draw function</li>
<ol style="list-style-type:lower-roman">
  <li>the fit is successful (<code>trk.status > 0</code>)</li>
  <li>the track is in the time window of 700 ns -- 1695 ns (<code>trk.t0</code>)</li>
  <li>the tan-dip of the track is consistent with coming from the target: 0.577350 -- 1.000 (<code>trkent.td</code>)</li>
  <li>the impact parameter of the track is consistent with coming from the target: -80 mm -- 105 mm (<code>trkent.d0</code>)</li>
  <li>the maximum radius of the track is OK: 450 mm -- 680 mm (<code>trkent.d0 + 2./trkent.om</code>)</li>
  <li>the track is of good quality (<code>trkqual.TrkQualDeM > 0.8</code>)</li>
</ol>
<li>Because we simulated each event with a different proton bunch intensity, each track should be weighted by the PBIWeight. To do this you will want to modify the cut command to add the event weighting:</li>
<nowiki> evtwt.PBIWeight*(cuts) </nowiki>
<li>Now we can count the number of tracks that pass all these cuts</li>
<nowiki> hRecoMom->Integral() </nowiki>
<li>We can also integrate in the momentum signal region with the same function. Be careful TH1F::Integral takes bin numbers as its arguments and not x-values. Some hints:</li>
<ol style="list-style-type:lower-roman">
  <li>you can find a bin for a given x-value with hist->GetXaxis()->FindBin(x-value)</li>
  <li>make sure you aren't off by one bin by checking the bin low edges and bin high edges with TAxis::GetBinLowEdge() and TAxis::GetBinUpEdge().</li>
</ol>
<li>To calculate the efficiency you need to know the number of events generated for this simulation. This is stored in the output of the <code>genCountLogger</code> module</li>
<nowiki> TH1F* hNumEvents = (TH1F*) file->Get("genCountLogger/numEvents");
double n_generated_events = hNumEvents->GetBinContent(1);</nowiki>
<li>Now you can calculate the absolute Ce efficiency. What's the answer?</li>
</ol>
 
Now that you can calculate the Ce efficiency, try some of the following exercises:
<ol style="list-style-type:lower-alpha">
<li>(Optional): Make the plot prettier by:</li>
<ol style="list-style-type:lower-roman">
  <li>adding appropriate axis labels</li>
  <li>writing the Ce efficiency on the plot with a TLatex</li>
  <li>adding dashed lines to show the momentum window with TLines (extra bonus: have the lines move when the momentum window values change)</li>
</ol>
<li>(Optional): Add a second momentum plot but with a higher track quality cut and include a TLegend.</li>
</ol>
 
=== Exercise 3: Adding MC truth ===
In this exercise, we will add MC truth information to the TrkAna tree and see how close our reconstruction matches the truth.
 
<ol style="list-style-type:lower-alpha">
<li>Create a TrkAna tree with CeEndpoint-mix tracks and include the MC truth information:</li>
<nowiki>mu2e -c fcl/TrkAnaTutEx03.fcl -S filelists/mcs.mu2e.CeEndpoint-mix-cat.MDC2018h.1-file.lst</nowiki>
If you look in the fcl file, you will notice that there is a global switch <code>fillMCInfo</code> and a local switch for the candidate track <code>candidate.fillMC</code>. This will be important when we add supplemental tracks in Exercise 5.
<li>You can open up the file and Print the tree structure. For this exercise, we will focus on the <code>trkmcent</code>, which contains the MC information for the step that crossed the entrance into the tracker. The other <code>trkmc*</code> branches will be the focus of Exercise 4.
<li>Run this example ROOT macro that plots the intrinsic tracker momentum for tracks that pass our signal cuts except for the trkqual and momentum window cuts</li>
<nowiki> root -l scripts/TrkAnaTutEx03.C </nowiki>
<li>Add the following line to the start of the macro:</li>
<nowiki>gStyle->SetOptStat(111111);</nowiki>
This will add the "Overflow" and "Underflow" values of the histogram to the stats box. This shows the number of events that fall outside of the axis range of the histogram. In ROOT the overflow bin is at bin number n_bins+1 and the underflow bin is bin number 0.
<li>Using what you learned in Exercise 2, count the number of events above +3 MeV, including the events in the overflow bin</li>
<li>Add the trkqual cut and play with it. How does the number of events above +3 MeV change?</li>
</ol>
 
<ol style="list-style-type:lower-alpha">
<li>(Optional): Do the same comparison at the middle or exit of the tracker</li>
<li>(Optional): Perform a fit to a double-sided crystal ball function. This function is a Gaussian with polynomial tails. The function can be found here: scripts/dscb.h. You will need to create your own TF1 ([https://root.cern.ch/doc/v614/classTF1.html#F4 hint]) and use the TH1::Fit() function ([https://root.cern.ch/doc/master/classTH1.html#a7e7d34c91d5ebab4fc9bba3ca47dabdd hint]). What is the core resolution?</li>
</ol>
 
=== Exercise 4: Following genealogy ===
With TrkAna, we also have access to important steps in the genealogy of the track (i.e. which particles produced the track). Before looking at the branches themselves, we need to discuss a little about the simulation.
 
For each event in the simulation we instantiate a <code>GenParticle</code>, which represents the particle we want to start the simulation with. For some of our samples, we create more than one of these per event (e.g. for cosmic rays, we take all the particles created in the shower at a certain altitude). We then pass these <code>GenParticles</code> to the physics simulation and create a <code>SimParticle</code> for each one. We then let Geant4 simulate these <code>SimParticles</code> and create more <code>SimParticles</code> until we end up with <code>StepPointMCs</code> in the detectors. For the tracker, there will be many different <code>StepPointMCs</code> from different <code>SimParticles</code> in every straw. We assign a <code>SimParticle</code> to be responsible for each straw hit if it had the most <code>StepPointMCs</code> in that hit.
 
Now here is a description of the branches we have in TrkAna:
 
; trkmc : contains information about the <code>SimParticle</code> that produced the most hits on the track
; trkmcgen : contains information about the actual <code>'''Gen'''Particle</code> in our simulation that ultimately produced the track (e.g. for cosmic rays, this will be the particle in the shower that ultimately produced the track)
; trkmcpri : contains information about the '''pri'''mary particle that produced the <code>trkmcgen</code> particle (e.g. for cosmic rays, this will correspond to the cosmic ray that produced the shower (i.e. the proton))
 
For most of our samples, these are all the same particle. In order to explore the differences, we will run on a sample of cosmic rays made by the CRY generator.
 
<ol style="list-style-type:lower-alpha">
<li>Create a TrkAna tree with CRY-cosmic-general-mix tracks and include the MC truth information:</li>
<nowiki>mu2e -c fcl/TrkAnaTutEx04.fcl -S filelists/mcs.mu2e.CRY-cosmic-general-mix-cat.MDC2018h.1-file.lst</nowiki>
<li>Open the ROOT file and look at the PDG ID codes of the MC particles (defined [http://pdg.lbl.gov/2019/reviews/rpp2018-rev-monte-carlo-numbering.pdf here] but some important ones are 11 = electron, -11 = positron, 13 = muon, -13 = positive muon, 2212 = proton).</li>
<nowiki> root -l trkana-ex04.root
root[n]: TrkAnaEx04->cd()
root[n]: trkana->Scan("trkmc.pdg:trkmcgen.pdg:trkmcpri.pdg")</nowiki>
Here are the first few rows:
<nowiki>************************************************
*    Row  * trkmc.pdg * trkmcgen. * trkmcpri. *
************************************************
*        0 *        13 *        13 *      2212 *
*        1 *        13 *        13 *      2212 *
*        2 *        11 *        13 *      2212 *
*        3 *        13 *        13 *      2212 *
*        4 *      -13 *      -13 *      2212 *
*        5 *        11 *      -13 *      2212 *</nowiki>
So the "primary" particle is a proton, this is the cosmic ray proton and doesn't actually appear in our simulation. The <code>GenParticles</code> that started our simulation are either positive or negative muons. The particles that are responsible for the tracks are either muons or electrons.
<li>However just because we have two particles of the same type at the different stages, does not mean that they are the same particle. We need look at the <code>trkmc.prel</code>, which encodes the relationship between the track particle and the <code>GenParticle</code>:</li>
<nowiki>root[n]: trkana->Scan("trkmc.pdg:trkmc.prel:trkmcgen.pdg:trkmcgen.pdg")</nowiki>
You can look in $MU2E_BASE_RELEASE/MCDataProducts/inc/MCRelationship.hh for the definitions (0 = same, 1 = direct child, -1 = unrelated).
<li>'''Anything else to do here?'''</li>
</ol>
 
That's it for this exercise. As you can tell, this isn't an exhaustive genealogy tree so if you need to look at something more complex, then you can run in the full Offline framework where we (currently) store every step in the genealogy for saved tracks. However, you can try this optional exercise for another example:
<ol style="list-style-type:lower-alpha">
<li>(Optional): Run on the flatmugamma-mix (internalRMC would be better...) file list and look at the genealogy of those tracks</li>
</ol>
 
=== Exercise 5: Adding supplemental tracks ===
From the last exercise, we can see that we get muon tracks. But we were only looking at the result of the electron-hypothesis fit. We can have TrkAna write out the results of "supplement" fits (e.g. downstream muon track)
 
<ol style="list-style-type:lower-alpha">
<li>Create a TrkAna tree with CRY-cosmic-general-mix tracks and include the result of downstream mu-minus fits:</li>
<nowiki>mu2e -c fcl/TrkAnaTutEx05.fcl -S filelists/mcs.mu2e.CRY-cosmic-general-mix-cat.MDC2018h.1-file.lst</nowiki>
<li>Open up the fcl file and you will see that we have created a couple of blocks to handle branch definitions:</li>
<nowiki>DeM : { input : "KFF"
        branch : "de"
        suffix : "DeM"
        fillMC : true
}
DmuM : { input : "KFF"
        branch : "dm"
        suffix : "DmuM"
        fillMC : false
}</nowiki>
We have kept the same definition for the downstream electron (although we have changed the branch name to <code>de</code>) and we have added a definition for the downstream mu-minus tracks. These are then used here:
<nowiki>physics.analyzers.TrkAnaEx05.candidate : @local::DeM
physics.analyzers.TrkAnaEx05.supplements : [ @local::DmuM ]</nowiki>
where we are making the DmuM fits a "supplement" track. This means that for each <code>KalSeed</code> in the DeM collection, TrkAna will look in the DmuM collection and write out the track information for the DmuM track that is closest in time to the DeM track.
<li>Open up the ROOT file and look in the tree. You will see that we now have both <code>de*</code> branches and <code>dm*</code> branches</li>
<li>Plot resolution for DeM with and without demc==11 and demc==13</li>
<li>add truth for DmuM and do the same thing</li>
<li>Compare dequal.TrkQualDeM and dmqual.TrkQualDmuM (or TrkCaloHitPIDs?)</li>
<li>(Optional): swap candidate and supplement</li>
<li>(Optional): add ue</li>
</ol>
 
=== Exercise 6: TrkAnaReco ===
Run TrkAnaReco.fcl and explain branches. The detrkqual and detrkpid branches will be exaplained in an advanced exercise.
 
<ol style="list-style-type:lower-alpha">
<li>Run TrkAnaReco.fcl</li>
<nowiki>mu2e -c $MU2E_BASE_RELEASE/TrkDiag/fcl/TrkAnaReco.fcl -S filelists/mcs.mu2e.CRY-cosmic-general-mix-cat.MDC2018h.1-file.lst</nowiki>
<li>look for reflected tracks? compare DeM to DmuM?</li>
<li>check for CRV coincidence? </li>
</ol>
 
== Advanced Exercises ==
These exercises are designed to be run on v7_4_1 from $TUTORIAL_BASE/TrkAna
<nowiki>> whatever command to setup in docker
> cd $TUTORIAL_BASE/TrkAna
</nowiki>
 
=== Exercise 1: Looking at the hits in the track ===
With TrkAna we can get information about the individual hits by setting diagLevel to 2
 
This #includes TrkAnaReco.fcl and then edits the diagLevel
 
Very simple event display?
 
=== Exercise 2: Analyzing a TrkAna tree with a compiled macro ===
Write a compiled macro that loops through each event and uses the TrkInfo branches. This can just run TrkAnaReco.fcl
 
=== Exercise 3: Retraining TrkQual ===
This can just run TrkAnaReco.fcl but want to highlight that the TrkQualInfo is written out with the candidate.trkqual
Write out trkqual branch and retrain with a different background weight expression (or a different sample?)
Run in parallel with original TrkQual
 
=== Exercise 4: Estimating the DIO background (with event weighting) ===
Run on flateminus-mix with DIO weights and plot that
 
=== Exercise 5: Creating your own "TrkAna" tree using InfoStructs ===
For this exercise, you should know how to write an art module.
 
Explain info structs and info struct helper to create a new module to create a simple tree (need to know module writing)?
 
== Reference Materials ==
* Use this place to add links to reference materials.
* [[TrkAna]] wiki page
 
=== A Useful Glossary ===
; ROOT : data analysis framework developed at CERN
; KalSeed : data product that represents a track
; CeEndpoint-mix : dataset name for CeEndpoint (i.e. mono-energetic electrons) with background frames mixed in
; CeplusEndpoint-mix : dataset name for CeplusEndpoint (i.e. mono-energetic positrons) with background frames mixed in
; flatmugamma-mix : dataset name for flatmugamma (i.e. flate energy photons generated at muon stopping positions) with background frames mixed in
; KalFinalFit : the module name for the final stage of the Kalman filter fit for the track
; TrkQual : an artificial neural network (ANN) that takes parameters from the track and outputs a value between 0 (poorly reconstructed) and 1 (well-reconstructed)

Latest revision as of 18:21, 7 October 2023

The TrkAna tutorial is hosted within the GitHub repo here