CetMapVector

From Mu2eWiki
Jump to navigation Jump to search

Introduction

In November of 2010 a change was made to the SimParticleCollection class. This note discusses why the change was made and how it was implemented using a new Mu2e class template, MapVector<T>. This page also uses SimParticleCollection as an example to introduce the class template MapVector, which has a more general use than just for SimParticleCollection. First some background information is necessary.

Background

Particle Management in G4

For each event, G4 internally maintains a list of all the particles remaining to be tracked; this is called the "stack". Processing of one event begins when G4 calls a user supplied class that adds all of the generator level particles to the stack. Geant4 then pops the top particle off of the stack and begins tracking it. During tracking, Geant4 navigates the particle through the geometry, using knowledge of the magnetic fields, and applies all enabled physics processes, multiple scattering, energy loss, decay, bremstrahlung, hadronic interactions and so on. These physics process may create new particles, each of which is added to the stack as it is created. When G4 completes tracking a particle, it pops the top particle off of the stack and begins tracking it.

G4 has a mode in which tracking of a particle can be interrupted, its daughters tracked, then its tracking continued. Mu2e does not use this mode.

In Mu2e the generated particles are read from the event and added to the stack by the class Mu2eG4/src/PrimaryGeneratorAction.cc.

G4 also provides several opportunities for user code to filter particles and to collect information about them. Mu2e uses three of these. When a new particle is pushed onto the stack, G4 calls the method

ClassifyNewTrack(const G4Track* trk)

of the Mu2e class Mu2eG4/src/StackingAction.cc. This method decides whether or not a particles is interesting and it kills uninteresting particles; particles that have been killed will never be tracked. For example one can kill low energy particles that are created by cosmic rays in the earth overburden. This allows G4 to run more quickly, sometimes much more quickly. yet still produce results with acceptable accuracy. The other two methods are found in the class Mu2eG4/src/TrackingAction.cc:

  void TrackingAction::PreUserTrackingAction(const G4Track* trk); 
  void TrackingAction::PostUserTrackingAction(const G4Track* trk);

The first of these is called just before G4 begins tracking of a track and the second is called just after G4 finishes tracking of a track. Mu2e uses these methods to record information about the track: this is where the SimParticle objects are created and filled ( see ToyDP/inc/SimParticle.hh). In the PreUserTrackingAction, a SimParticle object is made for the new track and it is added to the SimParticleCollection; in the PostUserTrackingAction, the appropriate SimParticle is located and the end-of-tracking information is added.

Particle Numbering in G4

When G4 stacks a new track it assigns that track a unique identifier; this identifier is an integer (G4Int) that is reset to 1 at the start of each event and is incremented each time that G4 stacks a track. This identifier is called the ID of the track and is accessed by calling the GetTrackID() method of the class G4Track. The user stacking action is called after this identifier has been assigned. If the user action kills the track, the identifier is considered used and the next particle to be stacked will have an ID with the next number in the sequence. Geant4 uses this ID to maintain the parent/child information in the event.

Mu2e records this ID in the SimParticle class; it is accessed using the method id(). Mu2e also uses this identifier to use the parent/child information in the class SimParticle.


Stating the Problem

Mu2e creates a SimParticle for a G4 ID when tracking for that particle starts. Therefore, if some particles are killed by the user stacking action, there will be missing IDs in the SimParticleCollection. The point of this discussion is to understand how to deal with those missing IDs.

In early versions of the G4 software, we never killed particles in the user stacking action; therefore this issue did not come up and SimParticleCollection was just a typedef to std::vector<SimParticle>. That worked well and was sufficient for simple events that were used to debug geometry and other code issues. But we are now running simulations that produce many thousands of particles, most of which are uninteresting. So we need a solution that allows us to create a shortened list of SimParticles.


To make the problem a little more concrete, consider an event with four SimParticles with the IDs { 1, 9, 15, 16 }; also let 9 and 15 be children of 1 and 16 be a child of 15.

So the question is what is the right C++ container to hold these objects? Whatever the container, we would like to be able to use operator[] to cross-reference parent/child information.


What about std::vector<SimParticle>?

Could one use std::vector<SimParticle> as the collection type? This does not work because of the cross referencing problem. Consider that I have one of the Mu2e MC hit objects, for example a StepPointMC object. This object can return the G4 ID of the track that created it; similarly for other sorts of hit objects. So one might write:

    typedef std::vector<SimParticle> SimParticleCollection;
    
    StepPointMC const& hit =  ...;                  // get this from somewhere.
    SimParticle const& sim = sims.at(hit.trackId());

Consider the example event with 4 SimParticles that was described in the previous section. Suppose that the hit in the code fragment above was created by the particle with G4 ID=9. In this case the last line tries to evaluate sims[9], which is off of the end of the array and will throw an exception. It would even be worse if the line were coded as,

    SimParticle const& sim = sims[hit.trackId()];

This might core dump but, more likely, it would silently produce erroneous results.

One could fix this by padding out SimParticleCollection with empty information filling all of the missing slots. Or we could fix it by doing a massive renumbering job at the end of every event. Or we could fix it by providing a function to find the SimParticle that has the requested ID; but users can easily forget to use this function. None of these seems reasonable;

What about std::map<int,SimParticle>?

In this picture the first element of the map would be the G4 ID for which the SimParticle information is valid. Again using the above example, you would first try to write:

    typedef std::map<int,SimParticle> SimParticleCollection;

    StepPointMC const& hit =  ...;                  // Get it from somewhere.
    SimParticle const& sim = sims[hit.trackId()];

But this will not compile! The reason has to do with the behaviour of operator []. The C++ standard states that, if the key exists, operator[](key_type const& key) will return a reference to the value mapped to that key; if the key does not exist, operator[](key_type const& key) will default construct a SimParticle and add it to the map, mapped to the key given as an argument to operator[]; it will return a reference to that value. Therefore it is not possible to use operator[] on a const SimParticle collection! One would have to write:

   SimParticleCollection::const_iterator i = sims.find(hit.trackId());
   if ( i == sims.end() ){
     throw ....;
   }
   SimParticle const& sim = i->second;

This works but it is very verbose and it will make code hard to read.

The solution: MapVector<SimParticle>

The solution to the problem is to define a new class template, MapVector, that has the desired behaviour. Basically it behaves like a const map except that it redefines the behaviour of operator[] on const objects.

    typedef MapVector<SimParticle> SimParticleCollection;
    
    StepPointMC const& hit =  ...;                  // Get it from somewhere.
    SimParticle const& sim = sims[hit.trackId()];

This behaves as expected. Moreover, if there is a bug in the code that created the hits and a hit comes from a track that is not present in the SimParticleCollection, the operator[] will throw. The class template MapVector provides a few other useful accessor functions that are not present in a map:


  // The same as operator []( key_type );
  VALUE const& at( key_type key ) const;

  // Return true if the key exists in the map.
  // Otherwise return false.
  bool has( key_type key ) const;

  // If the key exists, return a pointer to the mapped value.
  // Otherwise return a zero pointer.
  VALUE const* findOrNull( key_type key ) const;

Here VALUE is the template parameter, which is SimParticle for purposes of this example. Please do not use the last accessor unless there is a very good reason to do so. Our intention is that the SimParticles corresponding to all hits should always be present in SimParticleCollection; so please just use operator[](key_type) and let it throw if the collection was not properly constructed. The one good reason to use the last accessor is when it is legal for a key to be missing from the map; in such a case one could also use the combination of has(key_type) and operator[](key_type) however that would require searching twice, which might be prohibitively expensive.


What is the key_type of MapVector<SimParticle>?

I decided not to make the key_type of a MapVector an integer or a type that can be automatically converted to an integer. The reason is that the following code is almost always wrong:

    typedef MapVector<SimParticle> SimParticleCollection;

    SimParticleCollection const& sims = ...; // Get it from somewhere

    for ( size_t i=0; i<sims.size(); ++i){
       SimParticle const& sim = sims.at(i);   <font color=red>// Wrong !!!!!!!</font>
    }

If some G4 particles were killed in user stacking action, then this will, eventually, throw.

The correct way to write a loop over a MapVector is,

    typedef MapVector<SimParticle> SimParticleCollection;

    SimParticleCollection const& sims = ...; // Get it from somewhere
    for ( SimParticleCollection::const_iterator i=simParticles->begin();
          i!=simParticles->end(); ++i ){

      SimParticle const& sim = i->second;
    }


Because it will be very tempting to write the incorrect form of the loop, it is desirable to make sure that it generates a compiler error. The solution is redefine the key_type of the MapVector. The class provided for this purpose is, GeneralUtilities/inc/MapVectorKey.hh. If you look at the source code, you will see that it is just an integer, but one cannot automatically convert an integer to at MapVectorKey ( this is the meaning of explicit in the constructor ) or automatically convert a MapVector to an integer. The argument of all of the accessor functions of MapVector<T> is an object of type MapVectorKey.


To make this work, the classes that can return a G4 ID must also be modified to return the correct data types. In ToyDP/inc/StepPointMC.hh:

    MapVectorKey             trackId()    const { return _trackId;   }

In ToyDP/inc/SimParticles.hh:

    typedef MapVectorKey key_type;

    key_type id()       const {return _id;}
    key_type parentId() const { return _parentId;}

    std::vector<key_type> const& daughterIds() const { return _daughterIds;}

Finally user code must be modified to use the correct data types. For example, in Mu2eG4/src/ReadBack.cc:

    <font color=red>SimParticleCollection::key_type trackId</font>(hit.trackId());

    // ...

    if ( haveSimPart ){
        SimParticle const& sim = simParticles->at(<font color=red>trackId</font>);
        // ...
    }

    // ...

    // Fill the ntuple.
    nt[0]  = event.id().event();
    nt[1]  = hit.<font color=red>trackId()</font>.asInt();

    // ...

    if ( _nAnalyzed < _maxFullPrint ){

        cerr << "Readback"
             // ...
             << hit.<font color=red>trackId()</font>   << "   "
             // ...
    }

If you follow through the definition of SimParticleCollection::key_type you will see that it resolves to MapVectorKey. Prefer to write it out in the long form: if we ever decide to change this type, then your code will not need to change. If you need find yourself writing out the long form a lot within one file you can save some typing by:

    typedef SimParticleCollection::key_type key_type;
    key_type trackId(hit.trackId());

The first line need only appear once inside the relevant scope. In the second fragment above, the local variable trackId is of the correct type to be used as an argument to simParticles->at(); so that line reads naturally. The third fragment shows that if you really need to convert a MayVectorKey to an integer, you can; the only use for this should be to fill histograms and ntuples. The fourth code fragment shows that you can print a MapVectorKey without needing to turn it into an int.

A final comment. Many people have noticed that a SimParticle that was created by a generator, not created internally by G4, has a parentId of -1. They look for SimParticles that correspond to generated particles by doing something like:

     SimParticle const& sim = ...; // Get this from somewhere

     if ( sim.parentId() == -1 ){    // <font color=red>Do NOT do this!!!</font>
     }

This will no longer compile, since one may not compare a MapVectorKey to an int. Even if it did compile, a more informative way to write these sorts of tests is to use the functions provided:

     if ( sim.fromGenerator() ){     // <font color=red>Prefer this</font>
     }

     if ( sim.madeInG4() ){          // <font color=red>Or this</font>
     }

     if ( sim.hasParent() ){         // <font color=red>Or this</font>
     }


To DO List

At present the MapVector template does have methods to insert. These will go away. It will only have a constructor to let it be created from an existing map. We will also change the internal representation ( now a map ) to either a pair of vectors or a vector of pairs. This is because these can be better optimized for persistency and lookup ( and, be definition, we do need insert).