ReadProducts: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 101: Line 101:
a StrawHitCollection; check the header file to see that StrawHitCollection is just a typedef for
a StrawHitCollection; check the header file to see that StrawHitCollection is just a typedef for
<code>std::vector<StrawHit></code>.
<code>std::vector<StrawHit></code>.
  This line is not strictly necessary; you can use hitsHandle as if it
 
This line is not strictly necessary; you can use hitsHandle as if it
were a pointer to the StrawHitsCollection. This is OK if you only want to use it a few times.
were a pointer to the StrawHitsCollection. This is OK if you only want to use it a few times.
But, everytime that you dereference a handle it does the safety check to see that the product
But, everytime that you dereference a handle it does the safety check to see that the product
Line 134: Line 135:
The full exploration of this topic is much too broad to be discussed here: is the information being returned many bytes or few?
The full exploration of this topic is much too broad to be discussed here: is the information being returned many bytes or few?
What is the lifetime of the information being returned relative to the lifetime of the caller? And so on.
What is the lifetime of the information being returned relative to the lifetime of the caller? And so on.
This page will discuss the limited case of what to do when the information being returned is guaranteed to have a lifetime
This section will discuss the limited case of what to do when the information being returned is guaranteed to have a lifetime
longer than that of the caller.  For example, if you are writing the produce, analyze or filter method of a module,
longer than that of the caller.  For example, if you are writing the produce, analyze or filter method of a module,
all information available the art services, and all information available in the art::Event, have a lifetime longer than one
all information available the art services, and all information available in the art::Event, have a lifetime longer than one
Line 166: Line 167:


Instead we encourage you to use one of the techniques below to insure that, if a user forgets to check for validity, and
Instead we encourage you to use one of the techniques below to insure that, if a user forgets to check for validity, and
if the pointer is indeed bad, the system itself with throw.  The recommended choice is [[#mayberef|maybe_ref]] .
if the pointer is indeed bad, the system itself with throw.  The recommended choice is [[#maybe_ref|maybe_ref]] .




Line 175: Line 176:
some ideas about using them wisely.  
some ideas about using them wisely.  


The short answer is that a handle is a type of safe pointer.
The short answer is that a handle is a type of safe pointer. It is safe in the following
It is safe in the following
sense: if a handle points to invalid data, and, if you try to follow the handle to access the underlying data, then the handle will throw an exception.   
sense: if a handle points to invalid data, and, if you try to  
follow the handle to access the underlying data, then the handle will throw
an exception.   
This will trigger the framework to shut down the job as
This will trigger the framework to shut down the job as
gracefully as possible, which usually means that all output files
gracefully as possible, which usually means that all output files
Line 190: Line 188:
incorrect answer.
incorrect answer.


 
The one down side of using some handles is that every time you use it, the check for validity is done.  If this sort
The one down side of using some handles is that everytime you use it, the check for validity is done.  If this sort
of handle is used inside nested loops the code  might run significantly more slowly.  The solution is to extract the bare pointer or bare reference from the handle immediately upon getting the handle; then use the bare pointer or bare reference in downstream code.  This guarantees that the check is performed only once.  This is safe because we are only considering cases in which the lifetime of the pointee of the handle is longer than the lifetime of the handle.  In non-Mu2e environments, it  
of handle is
used inside nested loops the code  might run significantly more slowly.  The solution is to extract the bare pointer or bare
reference from the handle immediately upon getting the handle; then use the bare pointer or bare reference in downstream
code.  This guarantees that the check is performed only once.  This is safe because we are only considering cases in which
the lifetime of the pointee of the handle is longer than the lifetime of the handle.  In non-Mu2e environments, it  
might happen that the pointee of a handle has a lifetime shorter than that of the handle; in such a case it is necessary
might happen that the pointee of a handle has a lifetime shorter than that of the handle; in such a case it is necessary
to check for validity with every access and one should always access information via the handle.
to check for validity with every access and one should always access information via the handle.
Line 202: Line 195:
Finally, in Mu2e, all handles are implemented as inline templated classes so
Finally, in Mu2e, all handles are implemented as inline templated classes so
your code will not normally incur function call overhead on top of the validity checks.
your code will not normally incur function call overhead on top of the validity checks.




Line 211: Line 202:
When you get information from the framework or from the event, it is always returned
When you get information from the framework or from the event, it is always returned
as some sort of handle to the information.  Consider, for example the following code
as some sort of handle to the information.  Consider, for example the following code
fragments abstracted from  
fragments abstracted from <code>Mu2eG4/src/Readback.cc]</code>:
[http://cdcvs.fnal.gov/cgi-bin/public-cvs/cvsweb-public.cgi/mu2e/Offline/Mu2eG4/src/ReadBack.cc?rev=HEAD&content-type=text/x-cvsweb-markup Mu2eG4/src/Readback.cc] :
<pre>
 
  void ReadBack::beginJob( ){
 
    // Get a handle to the TFile service.
    <font color=red>art::ServiceHandle&lt;art::TFileService&gt;</font> tfs;
 
    // Lots of code deleted.
 
  }


  void ReadBack::analyze(const art::Event& event ) {
void ReadBack::beginJob( ){


  // Get a handle to the TFile service.
  <font color=red>art::ServiceHandle&lt;art::TFileService&gt;</font> tfs;


    // Get a handle to the geometry for the LTracker.
  // Lots of code deleted.
    <font color=red>GeomHandle&lt;TTracker&gt;</font> ttracker;
}


    // Ask the event to give us a handle to the requested hits.
void ReadBack::analyze(const art::Event& event ) {
    <font color=red>art::Handle&lt;StepPointMCCollection></font> hitsHandle;
    event.getByLabel("g4run","tracker",hitsHandle);
    StepPointMCCollection const& hits(*hitsHandle);


  // Get a handle to the geometry for the LTracker.
  <font color=red>GeomHandle&lt;TTracker&gt;</font> ttracker;
  // Ask the event to give us a handle to the requested hits.
  <font color=red>art::Handle&lt;StepPointMCCollection></font> hitsHandle;
  event.getByLabel("g4run","tracker",hitsHandle);
  StepPointMCCollection const& hits(*hitsHandle);
     // Lots of code deleted.
     // Lots of code deleted.
 
     SimParticle const& sim = ... ;
     SimParticle const& sim = ... ;
 
    // Get particle properties of the simulated particle.
  // Get particle properties of the simulated particle.
    <font color=red>ConditionsHandle&lt;ParticleDataTable></font> pdt("ignored");
  <font color=red>ConditionsHandle&lt;ParticleDataTable></font> pdt("ignored");
    <font color=red>ParticleDataTable::maybe_ref</font> particle = pdt->particle(sim.pdgId());
  <font color=red>ParticleDataTable::maybe_ref</font> particle = pdt->particle(sim.pdgId());
 
    string pname = particle ?
  string pname = particle ? particle.ref().name() : unknownPDGIdName(pdgId);
          particle.ref().name() :
          unknownPDGIdName(pdgId);
 
   }
   }


</pre>
This code fragment illustrates five different sorts of handles:
This code fragment illustrates five different sorts of handles:
<ol>
<ol>

Revision as of 20:19, 23 March 2017

Introduction

You can think of a "data product" as a piece of information to which you may get access by calling an appropriate get function on the event object. Equivalently, you can think of the art::Event object as an art::EventId plus a collection of data products. Most data products are collections of simpler objects but a few are just a single object; an example of a single object is the StatusG4 object that describes the completion status of Geant4.

Actually, the definition of a data product is slightly broader than this. The art::Run object is just an art::RunId object plus a collection of data products; similarly for the art::SubRun object. An example of a data product in the Run record is PhysicalVolumeInfoCollection which summarizes geometry information that applies to all events in the Run.

A data product is defined by a class, for example RecoDataProducts/inc/StrawHit.hh which contains variables defined for one hit in the straws. Another class RecoDataProducts/inc/StrawHitCollection.hh represents all the straw hits in an event, and this object is what is actually written and accessed by the user. Art stores each data product as a root branch. When you look at a class like StrawHitCollection, you will see that it is just a typedef to std::vector<StrawHit>. Why do we do this? Suppose that one day we wish to extend StrawHitCollection so that it behaves like a std::vector<StrawHit> plus some additional features. When we do this, your code should continue to work without any editing.

The definitions of the data products can be found in the header files in DataProducts/inc, MCDataProducts/inc or RecoDataProduct/inc.

Identifiers of a Data Product

Each data product within an event is unqiuely identified by a 4 part identifier, with the parts separated by an underscore character:

 DataType_ModuleLabel_InstanceName_ProcessName
  1. DataType is a "friendly" version of the name of the data type that is stored in the product. The name includes all namespace information. The friendly part is the way that it deals with collection types:
    • If a product is of type T, then the friendly name is "T".
    • If a product is of type mu2e::T, then the friendly name is "mu2e::T".
    • If a product is of type std::vector<mu2e::T>, then the friendly name is "mu2e::Ts".
    • If a product is of type std::vector< std::vector<mu2e::T> >, then the friendly name is "mu2e::Tss".
    • If a product is of type cet::map_vector<mu2e::T>, then the friendly name is mu2e::Tmv. See below for a discussion about where underscores may not be used; this example is safe because of the substitution of mv for map_vector.
  2. ModuleLabel identifies the module that created the product; this is the module label, which distinguishes multiple instances of the same module within a produces; it is not the class name of the module.
  3. InstanceName is a label for the data product that distinguishes two or more data products of the same type that were produced by the same module, in the same process. If a data product is already unique within this scope, it is legal to leave this field blank. The instance label is the optional argument of the call to "produces" in the constructor of the module (xxxx below):
    produces<T>("xxxx");
    
  4. ProcessName is the name of the process that created this product. It is specified in the fcl file that specifies the run time configuration for the job (ReadBack02 below):
    process_name : ReadBack02
        

Because the full name of the product uses the underscore character to delimit fields, it is forbidden to use underscores in any of the names of the fields. Therefore none of the following may contain underscores:

  • The class name of a class that is a data product; the exception is the cet::map_vector template; when creating the friendly name, art internally recognizes this case and protects against it.
  • The namespace in which a data product class lives.
  • Module labels.
  • Data product instance names
  • Process names.

When accessing data products, or referring to them in fcl, it may be possible to use wildcards or omit in some fields. See below for details.


Getting Information by Module Label

There are several ways to get a data product out from an event. The recommended way is to ask for it by specifying the module label of the module that created it.

#include "RecoDataProducts/inc/StrawHitCollection.hh"

art::Handle<StrawHitCollection> hitsHandle;
event.getByLabel("strawHitMaker",hitsHandle);
StrawHitCollection const& hits = *hitsHandle;

In this example, it is presumed that the hits that we want were created by a module with the label "strawHitMaker". After a successful call to getByLabel the variable hitsHandle will hold a valid handle to a collection of StrawHits. You can read more about [[#Handles|handles] below. If getByLabel cannot successfully and uniquely do its job, it will return a handle that points to nothing. If a user attempts to use such a handle, the handle will throw. It will throw if any of the following is true:

  1. the event contains no collection of StrawHits.
  2. the event contains one or more collections of StrawHits but none were created by the module with the label strawHitMaker.
  3. the event contains more than 1 collections of StrawHits made by the module with the label strawHitMaker.
  4. the event contains data products made by the module with the label strawHitMaker but none of those data products are of type StrawHitCollection.

The last line in the example strips away the handle-ness and leaves a const reference to the a StrawHitCollection; check the header file to see that StrawHitCollection is just a typedef for std::vector<StrawHit>.

This line is not strictly necessary; you can use hitsHandle as if it were a pointer to the StrawHitsCollection. This is OK if you only want to use it a few times. But, everytime that you dereference a handle it does the safety check to see that the product really exists. If you want to write a multiply nested loop over the StrawHitCollection, perhaps finding all triplets of hits, you do not want to do this safety check every time that you access a hit. By construction of the framework, if the hits are there the first time you dereference the handle, they will always be there for the rest of the event. So you need only do the check once.

For beginners: the & in the following:

StrawHitCollection const <font color=red>&</font> hits = *hitsHandle;
 

makes the variable hits a reference; this means that hits is a compile time alias which imposes no additional run time cost in either memory or CPU. If you forget the & the code will compile but the variable hits will be a copy of the hits that are found in the event; this may create a significant CPU and memory penalty.

Handles

Introduction

When a method of a class wishes to return some information held by that class, the author of the class has to make a choice about how to return that information. The choices are:

  • Return by value
  • Return by some sort of pointer
    • bare pointer or bare pointer to const
    • reference or reference to const
    • handle or some other sort of safe pointer.

The full exploration of this topic is much too broad to be discussed here: is the information being returned many bytes or few? What is the lifetime of the information being returned relative to the lifetime of the caller? And so on. This section will discuss the limited case of what to do when the information being returned is guaranteed to have a lifetime longer than that of the caller. For example, if you are writing the produce, analyze or filter method of a module, all information available the art services, and all information available in the art::Event, have a lifetime longer than one invocation of your method. Moreover this information is owned by the Service or the art::Event and your code will never take ownership of it and will never be allowed to modify it. Under these restricted circumstances the answer is:

  • If the object is a native type ( int, float, double ) return it by value.
  • Otherwise, if the object is guaranteed to exist, return it by reference to const. Do this even for "smallish" things like CLHEP::Hep3Vectors and std::strings.
  • If it is possible that, under some circumstances, there is nothing to return, then return the information via some sort of handle. Several options for this are discussed below.
  • There are very few circumstances in which it is acceptable to return information by bare pointer. There are only two examples that come to mind: when you are writing a Geant4 UserAction class, for which the G4 interface specifies return by bare pointer; or when you are writing code for which ROOT specficies an interface of a bare pointer.


A Practice Discouraged in Mu2e Code

In both ROOT and Geant4 a common practice is that information is returned by bare pointer and, if there is no valid information to return, then the value of the returned pointer is zero. The end user must check that the pointer is non-zero before using it.

In Mu2e we discourage this practice because of the following scenario. Early in code development it often happens that exceptional circumstances are not modeled. So the function which returns the bare pointer is guaranteed to always return valid information; in this case most people will not write code to check for a non-zero pointer. As the experiment matures and the level of fidelity in the simulation increases, it can happen that the code will sometimes return a zero pointer. At that time all code that never bothered to check for a valid pointer will fail; debugging this sort of error can be time consuming.

Instead we encourage you to use one of the techniques below to insure that, if a user forgets to check for validity, and if the pointer is indeed bad, the system itself with throw. The recommended choice is maybe_ref .


Introduction to Handles

This section discusses what handles are, discusses several kinds of them and presents some ideas about using them wisely.

The short answer is that a handle is a type of safe pointer. It is safe in the following sense: if a handle points to invalid data, and, if you try to follow the handle to access the underlying data, then the handle will throw an exception. This will trigger the framework to shut down the job as gracefully as possible, which usually means that all output files will be properly flushed and closed. Therefore, if the problem arises 19 hours into a 20 hour job, you will have the results from your first 19 hours of work.

Contrast this with returning the information by bare pointer. If a bare pointer points to invalid data, and, if you forget to check for a non-zero pointer, this will normally crash the program, leaving the output files unreadable and most of your work will be lost. And that's if you are lucky! If you are unlucky, you might silently get a subtly incorrect answer.

The one down side of using some handles is that every time you use it, the check for validity is done. If this sort of handle is used inside nested loops the code might run significantly more slowly. The solution is to extract the bare pointer or bare reference from the handle immediately upon getting the handle; then use the bare pointer or bare reference in downstream code. This guarantees that the check is performed only once. This is safe because we are only considering cases in which the lifetime of the pointee of the handle is longer than the lifetime of the handle. In non-Mu2e environments, it might happen that the pointee of a handle has a lifetime shorter than that of the handle; in such a case it is necessary to check for validity with every access and one should always access information via the handle.

Finally, in Mu2e, all handles are implemented as inline templated classes so your code will not normally incur function call overhead on top of the validity checks.


Examples of Handles

When you get information from the framework or from the event, it is always returned as some sort of handle to the information. Consider, for example the following code fragments abstracted from Mu2eG4/src/Readback.cc]:

void ReadBack::beginJob( ){
  // Get a handle to the TFile service.
  art::ServiceHandle<art::TFileService> tfs;
  // Lots of code deleted.
}
void ReadBack::analyze(const art::Event& event ) {
 // Get a handle to the geometry for the LTracker.
 GeomHandle<TTracker> ttracker;

 // Ask the event to give us a handle to the requested hits.
 art::Handle<StepPointMCCollection> hitsHandle;
 event.getByLabel("g4run","tracker",hitsHandle);
 StepPointMCCollection const& hits(*hitsHandle);

   // Lots of code deleted.

   SimParticle const& sim = ... ;

 // Get particle properties of the simulated particle.
 ConditionsHandle<ParticleDataTable> pdt("ignored");
 ParticleDataTable::maybe_ref particle = pdt->particle(sim.pdgId());

 string pname = particle ?  particle.ref().name() : unknownPDGIdName(pdgId);

 }

This code fragment illustrates five different sorts of handles:

  1. Handle to a Service
  2. Handle to a data product
  3. Handle to a geoemtry subsystem
  4. Handle to a Conditions entity
  5. maybe_ref: a very general but primitive sort of handle

Why do we have five different handle classes? The first four have special knowledge about some other part of the system, either Services, data products, the geometry system or the conditions system. The last class has no special knowledge and can be used in many places.


Handle to a Service

The syntax:

    art::ServiceHandle<T> handle;

contacts the Service registry and asks for a handle to the service whose class is specified by the template argument T.

If the service already exists, the service registry simply returns a pointer to the service; the ServiceHandle holds this pointer. If the service does not already exist, the service registry will attempt to construct it; if construction is successfull it will return a bare pointer to the service; if construction fails the service registry will thrown an exception. Therefore the constructor of a ServiceHandle will either return a valid handle or it will throw an exception. One can use a ServiceHandle as a pointer to the service:

      handle->someMethodOftheService(argument1, argument2);
    (*handle).someMethodOftheService(argument1, argument2);

The run-time cost of using a ServiceHandle is exactly the same as the cost of using a bare pointer. This last behaviour has changed compared to the equivalent code in the pre-art framework; in that framework, ServiceHandles did a lookup in the registry for every dereference operation.


Art has been designed so that ServiceHandles may be cached. That is, you may have a ServiceHandle as member data of your class or as a static data member of a method or free function; the ServiceHandle need only be initialized once at the start of the job and it can be used for the remainder of the job. This property is unique to ServiceHandles; all other handle types are dangerous to cache. In addition, if you extract information from a service, it is not likely to be safe to cache that information.

Handle to a Data Product

There are many ways to get a handle to a data product; these handles are different than those used for for accessing a Service but their main features are similar. This section will discuss the recommended pattern for accessing a data product:

    <font color=red>art::Handle</font><<font color=blue>T</font>> <font color=green>handle</font>;
    event.getByLabel("moduleLabel","instanceName",<font color=green>handle</font>);
    <font color=blue>T</font> <font color=red>const&</font> <font color=green>t</font>(*<font color=green>handle</font>);

    // Use t in the code below.

The example above first constructs an empty handle to a data product of type T. This handle does not know anything about the data product and it will throw if you try to use it. The second line asks the event to fill the handle; the arguments "moduleLabel" and "instanceName" are part of the unique identifier of a data product; this is discussed further in the [DataProducts.shtml#identifiers section about the identifiers of data products] . The third line extracts a const reference to the data product from the handle; at first this might seem unnecessary but there is a good reason to do this, as is discussed below. If the event successfully resolves the request and fills the handle, then the handle can be used to access the underlying data product. If the event cannot successfully resolve the request the handle remains in an invalid state and any attempt to use the handle will cause the handle to throw.

The rest of this section will use a more concrete example:

    #include "MCDataProdcuts/inc/StepPontMCCollection.hh"

    <font color=red>art::Handle</font><<font color=blue>StepPointMCCollection</font>> <font color=green>stepsHandle</font>;
    event.getByLabel("g4run","tracker",<font color=green>stepsHandle</font>);
    <font color=blue>StepPointMCCollection</font> <font color=red>const&</font> <font color=green>steps</font>(*<font color=green>stepsHandle</font>);

    cout << "The number of tracker StepPointMC's in this event is: " << <font color=green>steps</font>.size() << endl;

In the third line of the middle part, the const& has been highlighted in red. The presence of the & requests that the variable steps be a reference to the StepPointMCCollection that resides in the event, not a copy of it. A reference behaves like a compile-time alias; it does not occupy any memory of its own. Because a StepPointMCCollection can be quite large, a copy would be wasteful of both CPU time and memory. A common novice mistake is to forget the &, which causes the variable steps to be a copy of the data product in the event. If you make this mistake, your code will compile without errors and will happily waste resources; so please double check this. Because the event grants only readonly access to data products, the reference must also be const. Another novice mistake is to omit the const; fortunately the compiler will catch this mistake, issue a diagnostic and abort the compilation. Try removing the const and observing the diagnostic.


If one tried to use the handle before filling it, the handle would be invalid and would throw. If the event does not find the requested data product it will leave the handle in an invalid state and the code fragment above would throw when initializing the reference steps. If you know that all events will contain the requested data product, there is no need to explicitly test for validity. If, on the other hand, some events may not contain the requested data product, then one may test for validity by,

    if ( !<font color=green>steps</font>.isValid() ){
       cerr << "No tracker StepPointMC's.  Skipping this event: " << event.id() << endl;
       return;
    }

The Syntax of getByLabel


When the framework was designed, why was the handle passed as an argument into the accessor function and not returned by the accessor function? The short answer is that this would have resulted in a syntax that is even uglier than the existing syntax:

    art::Handle<StepPointMCCollection> <font color=green>stepsHandle</font> = event.getByLabel<StepPointMCCollection>("g4run","tracker");

The method getByLabel needs to know the data type of the product that it is looking for. In the accepted alternative, the data type can be deduced from the data type of the third argument. In the rejected alternative, there is no third argument from which to learn the data type; therefore the user must write the template parameter two times, once in the declaration of the handle and once as a tempalte argument to getByLabel. When thinking this through, remember that the rules of C++ forbid getByLabel to infer a type based on the type of the return argument; this is one of the consequences of a rule usually stated as "the signature of a function does not include its return type".


Isn't the third line extraneous?


In the above example there is an apparently extraneous line,

    StepPointMCCollection const& <font color=green>steps</font>(*<font color=green>stepsHandle</font>);

After all, one can write stepsHandle->size() just as easily has steps.size(). Why bother with the extra step?


The answer is that whenever one uses stepsHandle there is a check that the handle is valid. In the recommended pattern, that check is done exactly once. If, on the other hand, one uses stepsHandle deep inside multiply nested loops, the check will be repeated many times. Because data in the event is guaranteed to exist and to be unchanged throughout the rest of the event processing, it is wasteful to do the test more than once. Any one mistake of this sort will be insignificant but there are many, many places to make mistakes of comparable magnitude and the overall impact can be signficiant.

Having said this, if your code uses the handle only a handful of times, and if you feel that omitting the third line in favour of using the handle directly makes your code easier to follow. Then do so. The key is to consciously make a choice and to be aware of those occasions when the choice is important.


On References vs Pointers


Why did I choose to write the third line as:

    StepPointMCCollection const& <font color=green>steps</font>(*<font color=green>stepsHandle</font>);

and not as

    StepPointMCCollection const* <font color=green>steps</font>(<font color=green>stepsHandle->product()</font>);

With optimization enabled, both versions should have exactly the same run-time performance, so why not just use the familiar pointer? The ultimate reason is that the majority of problems with both C and C++ arise from careless use of bare pointers: memory leaks, double deletes and memory corruption. Therefore we encourage the practice of only using bare pointers when no other reasonable solution exists; we hope that this will make it easier to identify mis-use of bare pointers.

Having said that, if, in this context, you use pointers correctly it's OK with me.

GeomHandle

Another handle illustrated above is a handle to an entity within the GeometryService.

    GeomHandle<TTracker> ttracker;

The GeomHandle class is actually a fast and dirty Mu2e-written handle class. Compared to the previously discussed handle classes the safety is implemented differently and the class is used slightly differently: if the requested detector element is not present, then the constructor of the GeomHandle will throw. There is no safety check when using a GeomHandle so the variable ttracker in the above fragment behaves exactly like a TTracker const*, with no additional run-time overhead.

In the rare case that one wishes to write code that will work some element of the geometry is missing, one can model their code on the fragment below:

   art::ServiceHandle<GeometryService> geom;

   if ( geom->hasElement<TTracker>() ){
       GeomHandle<TTracker> ttracker;
       // do something for the TTracker

   }

The constructor of the GeomHandle class contacts the Service registry and gets a handle to the geometry service. It then asks the geometry service to return a pointer to the detector component specified by the template argument. If no such component exists, or if the geometry service is not present, the constructor of the GeomHandle class will throw. Because a GeomHandle does all of its checking in the constructor, there is no checking upon dereferencing; therefore there is no run-time penalty to using a GeomHandle like a pointer.

Do not cache GeomHandles. The Mu2e geometry is permitted to change at Run and SubRun boundaries. Therefore one must not cache geometry information across these boundaries. Getting a GeomHandle is a relatively light-weight operation so we encourage you to do get these handles on a per event basis.

ConditionsHandle

Like a GeomHandle, a ConditionsHandle is Mu2e written class that knows how to find entities within the ConditionsService and to return these by handle. The class behaves the same as a GeomHandle; that is, it will return a valid handle or it will throw in the constructor. There is no run time penalty for using this class like a pointer.

The text string argument to the ConditionsHandle constructor is a placeholder for future development.


maybe_ref

The class template [1] cet::maybe_ref</a> acts like a very primitive handle. The namespace cet:: indicates that this class lives in a utility library maintained by the "Computing Enabling Technologies" group within the Fermilab Computing Division; this is the group that develops and maintains the framework. The library cetlib is a home for utilities that do not yet have a better home; it is used by art, among other projects.

This class template was created to work around a common but unsafe coding practice. A very common convention is that accessor functions return the requested information by pointer; if the requested information is not availabe the function will return a null pointer. In this convention, it is the responsibility of the end user to check the pointer before using it. The problem is that, when the information is almost always available, people forget to check for validity and the program has a segmentation fault whenever the information is not available. Many of the packages that are used by Mu2e, particularly ROOT, GEANT4 and HEPPDT often use this convention.

Mu2e accesses HEPPDT through a thin Mu2e-written interface layer. In this layer, the functions that returned information by bare pointer were changed to return information using this new type, cet::maybe_ref.


The class template cet::maybe_ref internally holds a bare pointer and provides only a few functions: one to check for validity, one to access the pointee by const reference. one to to access the pointee by reference and one to change the object to which it points. The methods that return a reference always check for validity and they throw an exception if the pointer is invalid. Unlike the other handle classes, this class has no special knowledge about any other part of the software. The advantage of using may_ref over the bare pointer convention is that, when the requested information is not available, and if the code does not check for validity, then the maybe_ref object will throw an exception. This will trigger art to begin an orderly shutdown of the program, the endSubRun, endRun and endJob methods will be called, buffers will be flushed, and so on.


In the fragment above, from ReadBack.cc, the code has a reference to a SimParticle ( the variable sim ) and wants to know some information about what sort of particle it is. To do this it extracts the PDG ID from the SimParticle and asks the Particle Data Table for its information about that PDG ID. However this operation can fail: the SimParticles are created by Geant4 which has many particle ID codes that are not found in the PDG list; the additional ID codes represent nuclei, ions and their excited states.

When one asks the ParticleDataTable class for a particle, the information is returned as ParticleDataTable::maybe_ref. One can see in ConditionsService/inc/ParticleDataTable.hh that this is a typedef to cet::maybe_ref<HepPDT::ParticleData const>.

The example above shows how to check for validity of a maybe_ref; it is convertable to a bool. It also shows how to extract the reference to the underlying data, using the ref() method.


OrphanHandle has been Removed

Another class template from the CMS framework, OrphanHandle&#60T>, has been removed from art. That class was required in order to create, within one module, two data products, one of which contains persistent pointers to objects in the other data product. A more robust solution has been provided within art. See the discussion about [InterProductReferences.shtml art::Ptr] .


Private Interfaces

The above discussion considered the public interface of classes. For private interfaces that are in non-time critical parts of the code, please use the same safe methods discussed above. For the parts of the private interface that are in time critical code, feel free to use bare pointers for accessors and modifiers; but, for reasons of exception safety, do not use bare pointers to manage the lifetime of objects. In all cases where bare pointers are used, be sure to document ownership and lifetime of the pointees.


Pointers to Products

Introduction

One of the longstanding technical problems facing HEP software has been how to make pointers persistable; that is, if my code uses a pointer to express a relationship between two objects, how can I write those two objects to disk or tape, read them back in and have the pointer restored so that it correctly points at the other object? The issue is that the pointee will have one address in memory when the job is first run but it will have a different address in memory when it is read back in a later job; so the value of the pointer must be reset to match the new location of the pointee. Ideally the pointer should be restored in an automatic way that requires no special action on the part of the physicist reading the persistent pointer and only minimal special action on the part of the physicist creating the persistent pointer.

Since the adoption of C++ by the HEP community, many HEP experiments have developed their own solutions to this problem, all of which work at some level but none of which are entirely satisfactory. The art team, working with Mu2e, NOvA and the Liquid Argon TPC experiments, has developed a solution that builds on the experiences, good and bad, of many previous experiments. This solution has two main components, each described below, art::Ptr<T> and art::Assns<A,B,D>  ; there is also a special case of art::Ptr<T> that, in some cases, allows a more compact persistent form, art::PtrVector<T> . A critical feature of all of these tools is that, when the pointee exists in memory, the persistent pointer behaves as a naive user would expect; if, for any reason, the pointee does not exist, then the persistent pointer will throw an exception; it is possible to test for the existence of the pointee.



art::Ptr<T>

The art::Ptr<T> will be introduced by way of a concrete example from Mu2e code.

When Mu2e runs Geant4, it exports information out of Geant4 and stores that information as art data products. One of the data products is a full mother daughter history of all particles known to Geant4; this includes all of the particles created by event generators and imported into Geant4; and it includes all of the particles created by Geant4. The information about one such particle is stored as an object of type SimParticle, MCDataProducts/inc/SimParticle.hh . The ensemble of all such particles is stored as an object of type SimParticleCollection, MCDataProducts/inc/SimParticleCollection.hh . A SimParticleCollection is a art data product and can be obtained directly from the art::Event. An equivalent statement is that it is a first tier object within the event-data model.


Another data product is a collection of objects that record that a particular SimParticle took a step of some length inside a particular sensitive volume. One such step is stored as an object of type StepPointMC, MCDataProducts/inc/StepPointMC.hh . Each StepPointMC records the starting point of the step, the length of the step, the energy deposited in the material during the step and so on. It also records the the identity of the SimParticle making the step, which is recorded as a persistable pointer to the appropriate SimParticle. If one has a StepPointMC object, one may access the SimParticle that took the step as follows:

   StepPointMC const& step = ....; // get this from somewhere
   SimParticle const& sim  = *step.simParticle();

As usual in most Mu2e code, objects in the event have a lifetime that is long compared to your code that is reading the event, so it is recommended to receive these objects by const reference, not by value.


The class MCDataProducts/inc/StepPointMC.hh has a data member and an accessor:

  private:
  art::Ptr<SimParticle> _track;

  public:
  art::Ptr<SimParticle> const& simParticle() const { return _track; }

The data member is the persistable pointer and the template argument tells us that it points to an object of type SimParticle. To learn what else one can do with an art::Ptr, inspect the header file, art/Persistency/Common/Ptr.h . Some examples include:

   StepPointMC const& step = ....; // get this from somewhere
   art::Ptr<SimParticle> const & simPtr(step.simParticle());

   if ( simPtr.isAvailable() ) {
      // Do something only if the pointee is available
   } else {
      // Do something else if the pointee is not available
   }

   // These accessors throw if the pointee does not exist.
   SimParticle const&  sim   = *simPtr;
   SimParticle const*  sim1  = simPtr();

   art::ProductID const& id            = simPtr.id();
   art::Ptr<SimParticle>::key_type key = simPtr.key();

   // Return a pointer to the pointee, if it exists;
   // if it does not exist, return a null pointer.
   // It is the end user's responsibility to check for non-null.
   SimParticle const* sim1(simPtr.get());

The SimParticle class also uses art::Ptr's to implement its mother/daughter history and to link to generated particles. Each SimParticle either has a mother SimParticle or it has an associated GenParticle, always one but never both.


   private:
     // Data members
     art::Ptr<GenParticle>               _genParticle;
     art::Ptr<SimParticle>               _parentSim
     std::vector<art::Ptr<SimParticle> > _daughterSims;

   public:
     // Accessors
     art::Ptr<GenParticle> const&               genParticle() const { return _genParticle;  }
     art::Ptr<SimParticle> const&               parent()      const { return _parentSim;    }
     std::vector<art::Ptr<SimParticle> > const& daughters()   const { return _daughterSims; }

     // Where was this particle created: in the event generator or in G4?
     bool isSecondary()   const { return _parentSim.isNonnull(); }
     bool isPrimary()     const { return _genParticle.isNonnull(); }

     // Some synonyms for the previous two accessors.
     bool hasParent()     const { return _parentSim.isNonnull(); }
     bool fromGenerator() const { return _genParticle.isNonnull(); }
     bool madeInG4()      const { return _genParticle.isNull();    }

An art::Ptr has an operator->() that behaves just like that of any other pointer type:

   if ( step.simParticle().isPrimary() ){
     cout << "Generator id is: " << step.simParticle()->genParticle().generatorId() << endl;
   }

The two code fragments above illustrate two different notions of validity. The test simPtr.isAvailable() checks both that the requested data product is available and that the requested key is present in the data product. The tests _genParticle.isNull() and _genParticle.isNonnull() only check whether or not the value of the key is the reserved value used to indicate a default constructed art::Ptr. This latter check is sufficient for the isPrimary() and isSeconary() methods of SimParticle; the isAvailable() test must be used when the existence of the data product, or of the key within the data product, is in doubt.


In the examples shown so far, all of the art::Ptr objects have been found inside data products. But this is not necessary; one may create and use an art::Ptr as a variable or argument in any code; the art::Ptr need not be in a data product but the object to which it points must be.

An art::Ptr object is copyable so one may create a collection of art::Ptr's, for example:

  std::vector<art::Ptr<T> >

In such a collection, the pointees may live in many different data products ( all of which must be of the same data type ). This is done, for example, in /HitMakers/src/MakeStrawHit_module.cc , which looks at many StepPointMCCollections and forms one StrawHitCollection from the ensemble of all StepPointMC.


The art::Ptr technology has, by design, several limitations. One can write an art::Ptr to an object if and only if two things are true:

  1. The object is an element of one of the supported a collection types , such as std::vector, std::map and cet::map_vector.
  2. The collection type is a data product; that is, it is a first tier object within the art::Event.

An equivalent statement is that is one can only write an art:Ptr to a second tier object and one may do so only if the first tier object is an appropriate collection type. One cannot write an art::Ptr that points a first tier object; instead one should get the data product directly from the event. And one cannot write an art::Ptr that points to a third or lower tier event-data object; to access such objects, follow the art::Ptr to the second tier object and call the appropriate method of the second tier object; and so on to access lower tier objects.

These limitations are present in order to keep art::Ptr simple enough that is robust and is maintainable by a small staff. Moreover, it is straightforward for user code to access objects that cannot be directly accessed using an art::Ptr.

There is an important new feature planned for art::Ptr. In the near future it will be possible to write an art::Ptr that points at a second tier object that lives within either the Run or SubRun objects associated with the current art::Event object. At present, the pointee must live within the art::Event object.


Creating an art::Ptr

There are two distinct use cases for making an art::Ptr object.

  • Creating an art::Ptr into a data product that was created by a previous module.
  • Creating an art::Ptr into a data product that is being created in the same module as is the art::Ptr.

Both use cases are illustrated in the saveSimParticleStart method of

    [http://cdcvs.fnal.gov/cgi-bin/public-cvs/cvsweb-public.cgi/mu2e/Offline/Mu2eG4/inc/TrackingAction.hh?rev=HEAD&content-type=text/x-cvsweb-markup Mu2eG4/inc/TrackingAction.hh] 
    [http://cdcvs.fnal.gov/cgi-bin/public-cvs/cvsweb-public.cgi/mu2e/Offline/Mu2eG4/src/TrackingAction.cc?rev=HEAD&content-type=text/x-cvsweb-markup Mu2eG4/src/TrackingAction.cc] 

This method is called from the PreUserTrackingAction method and it creates a new SimParticle inside the SimParticleCollection. If a SimParticle is a primary particle, then it will hold an art::Ptr<GenParticle> that points to the corresponding GenParticle in the GenParticleCollection, which is already in the event. If a SimParticle is a secondary particle it will have an art::Ptr<SimParticle> that points to its mother particle within the same SimParticleCollection; however the SimParticleCollection will not exist as a data product until after G4 is finished with the event. Therefore two different constructors are used. If a particle is a primary particle it will have a null art::Ptr<SimParticle> if it is a secondary particle, it will have a null art::<GenParticle>.


After the ... lines, the following code is taken from TrackerAction.cc:

   // Passed in as an argument on each call to saveSimParticleStart
   const G4Track* trk = ...;

   // The next three items are computed elsewhere and passed into
   // the TrackingAction class at the start of each event.

   // The productId for the SimParticleCollection was reserved in the constructor of
   // G4_module.cc before event processing began.
   art::ProductID _simID = ...;

   art::Event const * _event = ...;
   art::Handle<GenParticleCollection> const* _gensHandle = ...;

   // The remainder of this example is taken verbatim ( but with irrelevant code deleted ):

   int id       = trk->GetTrackID();
   int parentId = trk->GetParentID();

   // GenParticle numbers start at 0 but G4 track IDs start at 1.
   int generatorIndex = ( parentId == 0 ) ? id-1: -1;

   art::Ptr<GenParticle> genPtr;
   art::Ptr<SimParticle> parentPtr;
   if ( parentId == 0 ){
     genPtr = art::Ptr<GenParticle>(*_gensHandle,generatorIndex);
   } else{
     parentPtr = art::Ptr<SimParticle>( _simID, parentId, _event->productGetter(_simID));
   }

Note that genPtr and parentPtr require different constructors because there is no handle to the SimParticleCollection at this stage in the execution of the program. One should prefer the genPtr form of the constructor because, as soon as genPtr is instantiated, it is fully formed and is available to be used. However parentPtr, as constructed, is not usable and it will not be usable until the start of the next module to be executed.

If you create a class that has an art::Ptr as a data member, and if that class will be part of a data product, remember to add the required lines to classes_def.xml and classes.h, as discussed in the [DataProducts.shtml instructions for making data products] . For examples look at MCDataProducts/src/classes_def.xml and MCDataProducts/src/classes.h ; remember that the Wrapper lines are only needed for data products ( ie first tier objects ), not for objects within data products.


art::PtrVector<T>

It was discussed in the previous section that one may create an std::vector of art::Ptr<T> objects. In the general case, each art::Ptr object in the vector may point an object in a different data product. In the special case that all of the art::Ptr objects point to objects in a single data product, art provides a specialized class, art::PtrVector<T>. This class has a persistent representation that is smaller than that of a std::vector<art::Ptr<T> >; it is smaller because it only needs to store the art::ProductID once. The transient representation does not have a smaller memory footprint than does a std::vector<art::Ptr<T> >; indeed, under the covers, an art::PtrVector<T> holds a std::vector<art::Ptr<T> >. For more details see [2] art/Persistency/Common/PtrVector.h</a>.

In an earlier implementation of art::PtrVector, the one inherited from CMS, the transient representation also benefited from a reduced memory footprint, but at the expense of greater execution time. When the art development team started to add new features to art::Ptr and art::PtrVector, they decided to sacrifice the transient memory footprint in favour of faster execution. Mu2e signed off on this trade-off.


art::Assns<A,B,D>

This class template will be introduced by reference to a use case that will soon come up in the Mu2e reconstruction code.

Consider a reconstruction job in which one module finds and fits tracks in the TTracker, another module finds and classifies clusters in the calorimeter and a third module determines if any of the tracks, when extrapolated to the calorimeter, intersect any of the calorimeter clusters. Two key features of this use case are that the track-cluster match is done in a separate module and that the module operates on track and cluster data products already present in the event. It would be convenient to represent track-cluster matches using some sort of bi-direction persistable pointer.

Art provides a solution in the art::Assns class template. For definiteness of notation, lets presume that the track and cluster classes produced by the first two modules are named RecoTrack and RecoCalCluster; and also presume that these objects live in data products named RecoTrackCollection and RecoCalClusterCollection; none of these classes currently exist in the Mu2e code. The module that computes track-cluster matches can choose to store its output as a data product of type

art::Assns<RecoTrack,RecoCalCluster>

This data product is a collection of objects, each of which expresses an association between one RecoTrack and one RecoCalCluster; under the covers it holds pairs of art::Ptr<RecoTrack> and art::Ptr<RecoCalCluster>. The art::Assns class template supports 1:1, 1:many, many:1 and many:many associations; it implicitly supports 1:0 and 0:1 associations via the absence of an association object.

A physicist who wants to inspect track-cluster matches can choose to do so in one of three ways. One may loop over all associations; one may write an outer loop over RecoTracks and then an inner loop over all matched RecoCalClusters; or one may write an outer loop over RecoCalClusters and then an inner loop over all matched RecoTracks. When writing these loops, it is not important whether the art::Assns object was declared as shown above or with its template arguments reversed:

art::Assns<RecoCalCluster,RecoTrack>

When using an art::Assns object, one asks for each side of the relationship by type, not by ordinal number of template argument; that is, unlike std::pair, it does not have the notion of first and second.

The art::Assns class template provides one more important feature. The module that computes track-cluster matches could instead have chosen to store its output as a data product of type

art::Assns<RecoTrack,RecoCalCluster,MatchInfo>

where MatchInfo is an arbitrary user defined class. Presumably one would use it to store information such as the footprint, of the track on the calorimeter, the Chi-squared of the match and so on. The third template argument is optional.

Another use case for which an art::Assns would be a good solution is the following: form a simulated event, creating many data products; reconstruct the event as if it were real data, creating many data products; in a final module, or modules, determine which RecoTracks and RecoCalClusters match to which SimParticles. These results are naturally stored as data products of type:

art::Assns<RecoTrack,SimParticle,MatchInfoSimTrack>
art::Assns<RecoCalCluster,SimParticle,MatchInfoSimCluster>

where the two MatchInfo classes are arbitrary user defined classes that hold some information about the quality of the match. As with the track-cluster use case, the code that finds the relationships between the simulated and reconstructed particles is done in a separate module that operates on data products already present in the event.

Mu2e is not yet using art::Assns but we expect too soon. As we get experience, we will write additional documentation, including examples of creating and using art::Assns. In the mean time, if you would like to learn more about art::Assns, consult the art documentation on Inter-Product References .

Comments on Some Rejected Ideas

In the previous section, it was explained why it is illegal to create a data product with an empty art::Ptr that is to be filled in later by a different module. This section will comment on some other ideas for the track-cluster match use case and explain why the art::Assns solution is preferred.

Rejected Option 1

One could expand the MatchInfo class to include an art::Ptr<RecoTrack> and an art::Ptr<RecoCalCluster>; with this change the matching code could add an std::vector<MatchInfo> to the event. This would have worked and, provided one only wanted to loop over associations, it would be very close the features provided by art::Assns. The big difference is that art::Assns provides code to simplify the other two looping models: an outer loop over RecoTracks with and inner loop over matched RecCalClusters, and vice versa. Experience in other experiments has shown that writing such loops from first principles is a common source of errors that produce incomplete but otherwise correct output; such errors are notoriously hard to recognize. With art::Assns, these looping constructs are written correctly, in one place, for all types of associations.


Rejected Option 2

One could expand the RecoCalCluster class by adding an art::PtrVector<RecoTrack> and a std::vector<MatchInfo>, running the track reconstruction first and then integrating the track-cluster matching into the cluster finding algorithm. The main problem with this approach is that cluster finding and track-cluster matching are logically separate operations that should not be coupled through accidental constraints of the event-data model. The recommended approach uses three modules to implement three logically separate steps in the data reconstruction chain and each step puts its own output into the event. With the rejected approach, as one evolves the MatchInfo class, all code that knows about RecoCalCluster objects must be recompiled; this always complicate the code development cycle.

The rejected approach also introduces an artificial asymmetry in looping over matches. The code to implement an outer loop over RecoCalClusters with an inner loop over match RecoTracks will look completely different than the code to implement an outer loop over RecoTracks with an inner loop over matched RecoCalClusters. Experience with other experiments has shown that artificial asymmetries that exist only because of accidental code constraints are a common source of errors.

Another feature of the recommended approach is that it simplifies test driving alternate track-cluster matching algorithms; one may run several such modules in one job, with each algorithm operating on exactly the same input and each algorithm having a well defined place to put its output. In the rejected alternative, there is just one well defined place to write the output of the matching algorithm, as part of RecoCalClusters object. This can be made to work but the symmetries present in the ideas are not reflected in the code.



Should I Use an art::Ptr and or an art::Assns?

There is conflicting advice on this.

Rob Kutschke's advice:

When possible, and when the reference is really a one-directional thing, prefer a Ptr over and Assns. This choice makes the end user code much simpler: the end user just follows a Ptr as if it were a bare pointer; therefore it's very easy to teach. I think that hitting a new user with an art Assns early in the teaching process will be very difficult but, to be fair, I have not yet tried.

One downside of this approach is that it complicates event mixing. If a data product class contains embedded art::Ptr objects, then event mixing code needs to know where the Ptr's are located and, at mix-in time, update them by hand.

So this really boils down to a choice of where to take the pain: at the end user code or in code written by experts. My advice is to let the experts take the pain.


When is it possible or not possible to use a Ptr? One of the fundamental design rules of art is that, once a data product is put into an event, that data product may never be modified. This rule is present to help ensure a robust audit trail of how data products were created. Therefore, it is illegal to create a data product that includes a empty art::Ptr and, in a later module, to fill that art::Ptr with real information. Therefore, if one wishes to associate objects from two data products at that are already in the event, the only choice is art::Assns.

Otherwise, if an art::Ptr<T> or an art::PtrVector<T> will completely solve the problem at hand, both now and in the future, then it should be preferred over art::Assns. The reason is that an art::Ptr is simpler both to create and to read; presumably this makes it less error prone. They only difficult part in making the choice is looking into possible future uses for your code; provided each piece of code does a small, well defined thing, the choice should usually be clear.


Marc Paterno's advice:

Always prefer an Assns over a Ptr. Similar questions are well researched in in the database world. The unanimous conclusion of the data base experts is that an Assns is the right answer.

Technical Appendix

Inside an art::Ptr

Few Mu2e physicists will need to understand the insides of an art::Ptr; this section is provided for reference. An art::Ptr has two parts, a persistent part, which behaves exactly like any other persistable event-data object, and a transient part that is divorced from the persistency mechanism. The persistent part consists of an art::ProductID and a key; the art::ProductID uniquely identifies the data product in which the pointee lives; the key uniquely identifies the pointee within the data product. Under the covers, the product ID and the key are simply a tuple of integral types and are persisted as are any other integral data.

The transient part of an art::Ptr also has two parts, a bare pointer to const that points to the pointee and a pointer to an object that can compute the bare pointer given the persistent information. When an art::Ptr is read back from an event-data file, the bare pointer is set to zero and the function pointer is properly initialized. When an art::Ptr is used, the Ptr code first checks to see if the bare pointer is non-null; if it is non-null, the Ptr simply returns it; if it is null, the Ptr calls the function to initialize the pointer and then returns the pointer; it is this function that will throw if the pointee cannot be found.


Requirements on Container Types

Once the art team provides appropriate documentation, this section should be changed to point there.

The section describing art::Ptr<T> states that an art::Ptr may only point at a second tier object within an art::Event and that it may only do so if the first tier object (the data product) is of a container type that satisfies certain requirements. The requirements on the collection type are:

  1. It must have a begin method that returns an appropriate interator type.
  2. The interator must be a normal iterator in the sense of a call to std::advance(iterator,n); the iterator must have traits that describe it as an input iterator, or better, such as a random access iterator.