Handles

From Mu2eWiki
Jump to navigation Jump to search

Introduction

When a method of a class wishes to return some information held by that class, the author of the class has to make a choice about how to return that information. The choices are:

  • Return by value
  • Return by some sort of pointer
    • bare pointer or bare pointer to const
    • reference or reference to const
    • handle or some other sort of safe pointer.

The full exploration of this topic is much too broad to be discussed here: is the information being returned many bytes or few? What is the lifetime of the information being returned relative to the lifetime of the caller? And so on. This section will discuss the limited case of what to do when the information being returned is guaranteed to have a lifetime longer than that of the caller. For example, if you are writing the produce, analyze or filter method of a module, all information available the art services, and all information available in the art::Event, have a lifetime longer than one invocation of your method. Moreover this information is owned by the Service or the art::Event and your code will never take ownership of it and will never be allowed to modify it. Under these restricted circumstances the answer is:

  • If the object is a native type ( int, float, double ) return it by value.
  • Otherwise, if the object is guaranteed to exist, return it by reference to const. Do this even for "smallish" things like CLHEP::Hep3Vectors and std::strings.
  • If it is possible that, under some circumstances, there is nothing to return, then return the information via some sort of handle. Several options for this are discussed below.
  • There are very few circumstances in which it is acceptable to return information by bare pointer. There are only two examples that come to mind: when you are writing a Geant4 UserAction class, for which the G4 interface specifies return by bare pointer; or when you are writing code for which ROOT specficies an interface of a bare pointer.


A Practice Discouraged in Mu2e Code

In both ROOT and Geant4 a common practice is that information is returned by bare pointer and, if there is no valid information to return, then the value of the returned pointer is zero. The end user must check that the pointer is non-zero before using it.

In Mu2e we discourage this practice because of the following scenario. Early in code development it often happens that exceptional circumstances are not modeled. So the function which returns the bare pointer is guaranteed to always return valid information; in this case most people will not write code to check for a non-zero pointer. As the experiment matures and the level of fidelity in the simulation increases, it can happen that the code will sometimes return a zero pointer. At that time all code that never bothered to check for a valid pointer will fail; debugging this sort of error can be time consuming.

Instead we encourage you to use one of the techniques below to insure that, if a user forgets to check for validity, and if the pointer is indeed bad, the system itself with throw. The recommended choice is maybe_ref .


Introduction to Handles

This section discusses what handles are, discusses several kinds of them and presents some ideas about using them wisely.

The short answer is that a handle is a type of safe pointer. It is safe in the following sense: if a handle points to invalid data, and, if you try to follow the handle to access the underlying data, then the handle will throw an exception. This will trigger the framework to shut down the job as gracefully as possible, which usually means that all output files will be properly flushed and closed. Therefore, if the problem arises 19 hours into a 20 hour job, you will have the results from your first 19 hours of work.

Contrast this with returning the information by bare pointer. If a bare pointer points to invalid data, and, if you forget to check for a non-zero pointer, this will normally crash the program, leaving the output files unreadable and most of your work will be lost. And that's if you are lucky! If you are unlucky, you might silently get a subtly incorrect answer.

The one down side of using some handles is that every time you use it, the check for validity is done. If this sort of handle is used inside nested loops the code might run significantly more slowly. The solution is to extract the bare pointer or bare reference from the handle immediately upon getting the handle; then use the bare pointer or bare reference in downstream code. This guarantees that the check is performed only once. This is safe because we are only considering cases in which the lifetime of the pointee of the handle is longer than the lifetime of the handle. In non-Mu2e environments, it might happen that the pointee of a handle has a lifetime shorter than that of the handle; in such a case it is necessary to check for validity with every access and one should always access information via the handle.

Finally, in Mu2e, all handles are implemented as inline templated classes so your code will not normally incur function call overhead on top of the validity checks.


Examples of Handles

When you get information from the framework or from the event, it is always returned as some sort of handle to the information. Consider, for example the following code fragments abstracted from Mu2eG4/src/Readback.cc]:

void ReadBack::beginJob( ){

  // Get a handle to the TFile service.
  art::ServiceHandle<art::TFileService> tfs;

  // Lots of code deleted.
}

void ReadBack::analyze(const art::Event& event ) {

 // Get a handle to the geometry for the LTracker.
 GeomHandle<TTracker> ttracker;

 // Ask the event to give us a handle to the requested hits.
 art::Handle<StepPointMCCollection> hitsHandle;
 event.getByLabel("g4run","tracker",hitsHandle);
 StepPointMCCollection const& hits(*hitsHandle);

   // Lots of code deleted.

   SimParticle const& sim = ... ;

 // Get particle properties of the simulated particle.
 ConditionsHandle<ParticleDataTable> pdt("ignored");
 ParticleDataTable::maybe_ref particle = pdt->particle(sim.pdgId());

 string pname = particle ?  particle.ref().name() : unknownPDGIdName(pdgId);

 }

This code fragment illustrates five different sorts of handles:

  1. Handle to a Service
  2. Handle to a data product
  3. Handle to a geometry subsystem
  4. Handle to a Conditions entity
  5. maybe_ref: a very general but primitive sort of handle

Why do we have five different handle classes? The first four have special knowledge about some other part of the system, either Services, data products, the geometry system or the conditions system. The last class has no special knowledge and can be used in many places.


Handle to a Service

The syntax:

art::ServiceHandle<T> handle;

contacts the Service registry and asks for a handle to the service whose class is specified by the template argument T.

If the service already exists, the service registry simply returns a pointer to the service; the ServiceHandle holds this pointer. If the service does not already exist, the service registry will attempt to construct it; if construction is successfull it will return a bare pointer to the service; if construction fails the service registry will thrown an exception. Therefore the constructor of a ServiceHandle will either return a valid handle or it will throw an exception. One can use a ServiceHandle as a pointer to the service:

handle->someMethodOftheService(argument1, argument2);
(*handle).someMethodOftheService(argument1, argument2);

The run-time cost of using a ServiceHandle is exactly the same as the cost of using a bare pointer. This last behaviour has changed compared to the equivalent code in the pre-art framework; in that framework, ServiceHandles did a lookup in the registry for every dereference operation.


Art has been designed so that ServiceHandles may be cached. That is, you may have a ServiceHandle as member data of your class or as a static data member of a method or free function; the ServiceHandle need only be initialized once at the start of the job and it can be used for the remainder of the job. This property is unique to ServiceHandles; all other handle types are dangerous to cache. In addition, if you extract information from a service, it is not likely to be safe to cache that information.

Handle to a Data Product

There are many ways to get a handle to a data product; these handles are different than those used for for accessing a Service but their main features are similar. This section will discuss the recommended pattern for accessing a data product:

art::Handle<T> handle;
event.getByLabel("moduleLabel","instanceName",handle);
T const& t(*handle);
// Use t in the code below.

The example above first constructs an empty handle to a data product of type T. This handle does not know anything about the data product and it will throw if you try to use it. The second line asks the event to fill the handle; the arguments "moduleLabel" and "instanceName" are part of the unique identifier of a data product; this is discussed further in the identifiers section about the identifiers of data products. The third line extracts a const reference to the data product from the handle; at first this might seem unnecessary but there is a good reason to do this, as is discussed below. If the event successfully resolves the request and fills the handle, then the handle can be used to access the underlying data product. If the event cannot successfully resolve the request the handle remains in an invalid state and any attempt to use the handle will cause the handle to throw.

The rest of this section will use a more concrete example:

#include "MCDataProdcuts/inc/StepPontMCCollection.hh"

art::Handle<StepPointMCCollection> stepsHandle;
event.getByLabel("g4run","tracker",stepsHandle);
StepPointMCCollection const& steps(*stepsHandle);

cout << "The number of tracker StepPointMC's in this event is: " << steps.size() << endl;

In the third line of the middle part, the const& has been highlighted in red. The presence of the & requests that the variable steps be a reference to the StepPointMCCollection that resides in the event, not a copy of it. A reference behaves like a compile-time alias; it does not occupy any memory of its own. Because a StepPointMCCollection can be quite large, a copy would be wasteful of both CPU time and memory. A common novice mistake is to forget the &, which causes the variable steps to be a copy of the data product in the event. If you make this mistake, your code will compile without errors and will happily waste resources; so please double check this. Because the event grants only readonly access to data products, the reference must also be const. Another novice mistake is to omit the const; fortunately the compiler will catch this mistake, issue a diagnostic and abort the compilation. Try removing the const and observing the diagnostic.


If one tried to use the handle before filling it, the handle would be invalid and would throw. If the event does not find the requested data product it will leave the handle in an invalid state and the code fragment above would throw when initializing the reference steps. If you know that all events will contain the requested data product, there is no need to explicitly test for validity. If, on the other hand, some events may not contain the requested data product, then one may test for validity by,

if ( !stepsHandle.isValid() ){

  cerr << "No tracker StepPointMC's.  Skipping this event: " << event.id() << endl;
  return;
}


The Syntax of getByLabel

When the framework was designed, why was the handle passed as an argument into the accessor function and not returned by the accessor function? The short answer is that this would have resulted in a syntax that is even uglier than the existing syntax:

art::Handle<StepPointMCCollection> stepsHandle = event.getByLabel<StepPointMCCollection>("g4run","tracker");

The method getByLabel needs to know the data type of the product that it is looking for. In the accepted alternative, the data type can be deduced from the data type of the third argument. In the rejected alternative, there is no third argument from which to learn the data type; therefore the user must write the template parameter two times, once in the declaration of the handle and once as a tempalte argument to getByLabel. When thinking this through, remember that the rules of C++ forbid getByLabel to infer a type based on the type of the return argument; this is one of the consequences of a rule usually stated as "the signature of a function does not include its return type".


Isn't the third line extraneous?

In the above example there is an apparently extraneous line,

StepPointMCCollection const& steps(*stepsHandle);

After all, one can write stepsHandle->size() just as easily has steps.size(). Why bother with the extra step?

The answer is that whenever one uses stepsHandle there is a check that the handle is valid. In the recommended pattern, that check is done exactly once. If, on the other hand, one uses stepsHandle deep inside multiply nested loops, the check will be repeated many times. Because data in the event is guaranteed to exist and to be unchanged throughout the rest of the event processing, it is wasteful to do the test more than once. Any one mistake of this sort will be insignificant but there are many, many places to make mistakes of comparable magnitude and the overall impact can be signficiant.

Having said this, if your code uses the handle only a handful of times, and if you feel that omitting the third line in favour of using the handle directly makes your code easier to follow. Then do so. The key is to consciously make a choice and to be aware of those occasions when the choice is important.


On References vs Pointers

Why did I choose to write the third line as:

StepPointMCCollection const& steps(*stepsHandle);

and not as

StepPointMCCollection const* steps(stepsHandle->product());

With optimization enabled, both versions should have exactly the same run-time performance, so why not just use the familiar pointer? The ultimate reason is that the majority of problems with both C and C++ arise from careless use of bare pointers: memory leaks, double deletes and memory corruption. Therefore we encourage the practice of only using bare pointers when no other reasonable solution exists; we hope that this will make it easier to identify mis-use of bare pointers.

Having said that, if, in this context, you use pointers correctly it's OK with me.

GeomHandle

Another handle illustrated above is a handle to an entity within the GeometryService.

GeomHandle<TTracker> ttracker;

The GeomHandle class is actually a fast and dirty Mu2e-written handle class. Compared to the previously discussed handle classes the safety is implemented differently and the class is used slightly differently: if the requested detector element is not present, then the constructor of the GeomHandle will throw. There is no safety check when using a GeomHandle so the variable ttracker in the above fragment behaves exactly like a TTracker const*, with no additional run-time overhead.

In the rare case that one wishes to write code that will work some element of the geometry is missing, one can model their code on the fragment below:

 art::ServiceHandle<GeometryService> geom;
 
 if ( geom->hasElement<TTracker>() ){
    GeomHandle<TTracker> ttracker;
    // do something for the TTracker
 }

The constructor of the GeomHandle class contacts the Service registry and gets a handle to the geometry service. It then asks the geometry service to return a pointer to the detector component specified by the template argument. If no such component exists, or if the geometry service is not present, the constructor of the GeomHandle class will throw. Because a GeomHandle does all of its checking in the constructor, there is no checking upon dereferencing; therefore there is no run-time penalty to using a GeomHandle like a pointer.

Do not cache GeomHandles. The Mu2e geometry is permitted to change at Run and SubRun boundaries. Therefore one must not cache geometry information across these boundaries. Getting a GeomHandle is a relatively light-weight operation so we encourage you to do get these handles on a per event basis.

ConditionsHandle

Like a GeomHandle, a ConditionsHandle is Mu2e written class that knows how to find entities within the ConditionsService and to return these by handle. The class behaves the same as a GeomHandle; that is, it will return a valid handle or it will throw in the constructor. There is no run time penalty for using this class like a pointer.

The text string argument to the ConditionsHandle constructor is a placeholder for future development.


maybe_ref

The class template cet::maybe_ref acts like a very primitive handle. The namespace cet:: indicates that this class lives in a utility library maintained by the "Computing Enabling Technologies" group within the Fermilab Computing Division; this is the group that develops and maintains the framework. The library cetlib is a home for utilities that do not yet have a better home; it is used by art, among other projects.

This class template was created to work around a common but unsafe coding practice. A very common convention is that accessor functions return the requested information by pointer; if the requested information is not availabe the function will return a null pointer. In this convention, it is the responsibility of the end user to check the pointer before using it. The problem is that, when the information is almost always available, people forget to check for validity and the program has a segmentation fault whenever the information is not available. Many of the packages that are used by Mu2e, particularly ROOT, GEANT4 and HEPPDT often use this convention.

Mu2e accesses HEPPDT through a thin Mu2e-written interface layer. In this layer, the functions that returned information by bare pointer were changed to return information using this new type, cet::maybe_ref.


The class template cet::maybe_ref internally holds a bare pointer and provides only a few functions: one to check for validity, one to access the pointee by const reference. one to to access the pointee by reference and one to change the object to which it points. The methods that return a reference always check for validity and they throw an exception if the pointer is invalid. Unlike the other handle classes, this class has no special knowledge about any other part of the software. The advantage of using may_ref over the bare pointer convention is that, when the requested information is not available, and if the code does not check for validity, then the maybe_ref object will throw an exception. This will trigger art to begin an orderly shutdown of the program, the endSubRun, endRun and endJob methods will be called, buffers will be flushed, and so on.


In the fragment above, from ReadBack.cc, the code has a reference to a SimParticle ( the variable sim ) and wants to know some information about what sort of particle it is. To do this it extracts the PDG ID from the SimParticle and asks the Particle Data Table for its information about that PDG ID. However this operation can fail: the SimParticles are created by Geant4 which has many particle ID codes that are not found in the PDG list; the additional ID codes represent nuclei, ions and their excited states.

When one asks the ParticleDataTable class for a particle, the information is returned as ParticleDataTable::maybe_ref. One can see in ConditionsService/inc/ParticleDataTable.hh that this is a typedef to cet::maybe_ref<HepPDT::ParticleData const>.

The example above shows how to check for validity of a maybe_ref; it is convertable to a bool. It also shows how to extract the reference to the underlying data, using the ref() method.


OrphanHandle has been Removed

Another class template from the CMS framework, OrphanHandle<T>, has been removed from art. That class was required in order to create, within one module, two data products, one of which contains persistent pointers to objects in the other data product. A more robust solution has been provided within art. See the discussion about [[art::Ptr<T> art::Ptr]] below.


Private Interfaces

The above discussion considered the public interface of classes. For private interfaces that are in non-time critical parts of the code, please use the same safe methods discussed above. For the parts of the private interface that are in time critical code, feel free to use bare pointers for accessors and modifiers; but, for reasons of exception safety, do not use bare pointers to manage the lifetime of objects. In all cases where bare pointers are used, be sure to document ownership and lifetime of the pointees.