MakeProducts: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by 2 users not shown)
Line 14: Line 14:
[https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts the CMS documentation] for making new products.
[https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCreatingNewProducts the CMS documentation] for making new products.


In 2022, art provided and updated method for putting products in an event, explained [https://indico.fnal.gov/event/53206/contributions/234465/attachments/151953/196516/2022-2-15.pdf here]


== A Minimal Module==
== A Minimal Module==
Line 140: Line 141:


In the above description it was presumed that the class to be added to the event
In the above description it was presumed that the class to be added to the event
was already known to the framework.  A class is made known to the framework using
was already known to the framework.  A class is made known to art by creating a ROOT
the genreflex system from ROOT, as described below.
dictionary by using ROOT's genreflex system, as described below.


Declaring a data product to the system uses two files, named
Making a ROOT dictionary requires two files, named
classes_def.xml and classes.h.  By convention these files are located
classes_def.xml and classes.h.  The convention used by the Mu2e build system is that these files are located
in the src subdirectory of each data product package, for example
in the src subdirectory of each data product package, for example
RecoDataProducts/src/classes_def.xml
RecoDataProducts/src/classes_def.xml
and RecoDataProducts/src/classes.h
and RecoDataProducts/src/classes.h .
In principal every cvs module could define its own
data products but we have chosen, instead, to segregate the data products in
a small number of packages.   This enforces the separation of data classes and
algorithm classes and makes is possible to load the data product
libraries without having to load the much more complex algorithm classes.
 
 
If the only data product we had were StrawHitCollection, then classes_def.xml would look like:
If the only data product we had were StrawHitCollection, then classes_def.xml would look like:
<pre>
<pre>
Line 165: Line 159:
and classes.h would look like:
and classes.h would look like:
<pre>
<pre>
#include ...
#include <vector>
#include "ToyDP/inc/StrawHitCollection.hh"
#include "RecoDataProducts/inc/StrawHitCollection.hh"
template class art::Wrapper<mu2e::StrawHitCollection>;
</pre>
</pre>


The rule for classes.h is that the Wrapper line must be present for every class
The rule for classes_def.xml is that the art::Wrapper line must be present for every class
that can be given to the event using a call to event.put(...); in this case that is just
that can be given to the event using a call to event.put(...); in this case that is just
the StrawHitCollection.  The non-wrapper lines
<code>art::Wrapper<mu2e::StrawHitCollection></code>You also need a line for the class that is the template argument of art::Wrapper, in this case <code>StrawHitCollection</code>.  And you need a line for every type that is used within StrawHitCollection.
must be present for that class, StrawHitCollection, and for all of the classes that are among
the persistent data of StrawHitCollection, either directly or indirectly.
This applies recursively until only primitive
This applies recursively until only primitive
objects are found ( that is, we do not need lines for int, double, float, char and so on).
types are found ( that is, we do not need lines for int, double, float, char and so on).


There is one exception to the rule that you must recursively declare all classes that are
There is one exception to the rule that you must recursively declare all classes that are
Line 182: Line 173:
that is known art.  For example none of the Mu2e dictionaries includes a reference to CLHEP::Hep3Vector
that is known art.  For example none of the Mu2e dictionaries includes a reference to CLHEP::Hep3Vector
or CLHEP::HepLorentzVector; these are found in
or CLHEP::HepLorentzVector; these are found in
<code>$ART_DIR/source/art/art/Persistency/CLHEPDictionaries/classes_def.xml</code>.


Other dictionaries defined in the art source area:
<code>$CANVAS_ROOT_IO_DIR/source/canvas_root_io/Dictionaries/clhep/classes_def.xml</code>.
<pre>
 
art/Framework/IO/ProductMix/classes_def.xml
Mu2e has adopted the convention that if we need a dictionary entry for a type that is used more than one dictionary, that type should be defined in <code>DataProducts/src/classses_def.xml</code> and <code>DataProducts/src/classses_def.h</code>.
art/Persistency/CetlibDictionaries/classes_def.xml
 
art/Persistency/WrappedStdDictionaries/classes_def.xml
You can learn what other dictionaries are defined by art by giving the following unix command:
art/Persistency/CLHEPDictionaries/classes_def.xml
art/Persistency/FhiclCppDictionaries/classes_def.xml
art/Persistency/Common/classes_def.xml
art/Persistency/StdDictionaries/classes_def.xml
art/Persistency/Provenance/classes_def.xml
</pre>


<code>find $CANVAS_ROOT_IO_DIR -name classes_def.xml</code>


If we had decided that it made sense to add a single StrawHit as a data product, then we would
If we had decided that it made sense to add a single StrawHit as a data product, then we would
Line 202: Line 187:
storing that collection.
storing that collection.


Every class for which there is a wrapper line in class_def.xml
Every header file that is need to recursively resolve classes declared in classes_def.xml must be #included'ed in classes.h.  In earlier versions of root, it was necessary to include some explicit template instantiations in classes.h but these is no longer needed and should not be present.
must also be declared in classes.h; but classes from the non-wrapper lines of classes_def.xml
should not be present in classes.h.  The appropriate #include must also be present for
the header file of the classes that appear in the dictionary section of classes.h.
 
There is a second class of things that must be present in classes.h.  If any data product
has a data member that is an an instantiation of a templated class, then the templated class
must be present in classes.h.  Look, for example, at
<code> Offline/ToyDP/src/classes.h</code>.  The class mu2e::SimParticleCollection has a data member of type <em>std::map<MapVectorKey,mu2e::SimParticle></em>;
that class has a data member of type <em>std::pair<MapVectorKey,mu2e::SimParticle></em>.  Both of these classes must be declared in <em>classes.h</em>.
 


There is a syntax to make only a subset of the data members of a class persistent. There is
There is a syntax to make only a subset of the data members of a class persistent. There is
Line 281: Line 256:


[[Category:Computing]]
[[Category:Computing]]
[[Category:Computing/Code]]
[[Category:Code]]

Latest revision as of 02:37, 31 March 2023


Introduction

A Data Product is anything that you can add to an event or see in an event. Examples include the generated particles, the simulated particles produced by Geant4, the hits produced by Geant4, tracks found by the reconstruction algorithms, clusters found in the calorimeters and so on.

This page contains a short description of how to add a data product to an event and how to define a new type of data product. For more complete information, consult the the CMS documentation for making new products.

In 2022, art provided and updated method for putting products in an event, explained here

A Minimal Module

The code fragment below shows a minimal example of an EDProducer module that adds a StrawHitCollection to the event. If you look back through the nested header files, you will see the StrawHitCollection is just a typedef for std::vector<mu2e::StrawHit> and that StrawHit is a very simple class, really nothing more than a simple struct.

#include "art/Framework/Core/EDProducer.h"
#include "art/Framework/Core/ModuleMacros.h"
#include "MCDataProducts/inc/StrawHitCollection.hh"

namespace mu2e{

 class MyClass : public art::EDProducer {

  public:
    explicit MyClass(fhicl::ParameterSet const& pSet)
    {
      produces<StrawHitCollection>();
    }
    virtual ~MyClass() { }
    virtual void produce(art::Event& e );
 };

 void MyClass::produce(art::Event& event ) {

  unique_ptr<StrawHitCollection> p(new StrawHitCollection);

  // Some sort of loop to fill the collection:
  for ( int i=0; i<10; ++i){
      p>push_back(StrawHit(...));
   }

   event.put(std::move(p));
 }

} // end of namespace mu2e

using mu2e::G4; DEFINE_ART_MODULE(G4);

In the above fragment there is a member function of MyClass named produce (singular) and a member function of the base class named produces (plural); the second function is called in the constructor of MyClass. The following text refers to both - so pay attention to which of the two is being discussed. The following pattern describes any producer module:

  1. The class must inherit from art::EDProducer.
  2. The constructor must tell the framework what it produces; it does so via the call to produces<StrawHitCollection>(). This is described in more depth below.
  3. Data products are added to an event inside the produce method. A three step pattern is used:
    1. Create an unique_ptr to an empty object.
    2. Fill the object.
    3. Give the unique_ptr to the event.

    It might be possible to create a fully formed object in step 1; in that case there is no step 2.

  4. The code must invoke the macro DEFINE_ART_MODULE as shown in the last two lines. This line may appear anywhere in the file after the definition of the class. Mu2e has adopted the convention of putting it at the end of the file.
  5. After the call to event.put(...), the variable p no longer points at anything. If you try to use it, you will get a run-time error. Therefore you should run diagnostics and other things that read your data product before the call to event.put(...).

You might try the following: call event.put(....) and then get the data product out of the event using one of the get methods. This will not work. The reason is that a data product is not actually registered with the event until the produce method of the module returns. The logic behind this restriction is that if a module fails, then none of its data products should be available via the get interface; therefor event.put(...) only schedules the data product for addition to the event and that addition occurs when the module returns from the produce call.


More about produces<T>();

In the constructor, there is a call to a function template produces<T>(). This tells the framework that when the produce method of this class is called, it is expected to add a data product of type T to the event. If the produce method is expected to add more than one data product to the event, then there must be a corresponding call to produces for each data product.

If the produce method tries to add a product for which it did not make a produce<T>() call, then the framework will throw. The default response to this exception is to stop event processing and to shut down as gracefully as possible; normally this means that your histogram files an log files will be flushed and closed properly.

One natural question is "what should I do if this particular event has no StrawHits"? One needs to distinguish two cases here. If it is perfectly normal that some events will produce no StrawHits, then you should put an empty StrawHitCollection into the event. The event data model is perfectly happy to hold empty collections. If it is an error for any event to produce no StrawHits, then you should issue an appropriate error message using the message logger. If it is sufficiently severe error, then you should throw an appropriate exception.

In an earlier version of this document it was stated that the framework would throw an exception if a model failed to produce one of its data products advertised via produces calls. This is not true and never was true - the older document was wrong. It remains true that, for objects that are collection types, the recommended procedure is to put an empty collection into the event rather than to put nothing into the event; this greatly simplifies code that reads your output.


If you are wondering where the produces function lives, it comes from deep down in an inheritance chain. First look in the header file for the base class, EDProducer. That class inherits from some other class; check its header file. After several levels you will find the base class that defines produces.

If one module wishes to produce two or more data products of the same data type, these can be distinguished using the instance name argument to produces and put:

SampleProducer(fhicl::ParameterSet const& ps){
 produces<T>("version1");
 produces<T>("version2");
}

void SampleProducer::produce(art::Event& e ){
   std::unique_ptr<SampleCollection> result1(new SampleCollection);
   std::unique_ptr<SampleCollection> result2(new SampleCollection);
   // ... fill the collections ...
   e.put(std::move(result1),"version1");
   e.put(std::move(result2),"version2");
}

where the text strings must be unique but have no other requirements.

Declaring new Data Products

In the above description it was presumed that the class to be added to the event was already known to the framework. A class is made known to art by creating a ROOT dictionary by using ROOT's genreflex system, as described below.

Making a ROOT dictionary requires two files, named classes_def.xml and classes.h. The convention used by the Mu2e build system is that these files are located in the src subdirectory of each data product package, for example RecoDataProducts/src/classes_def.xml and RecoDataProducts/src/classes.h . If the only data product we had were StrawHitCollection, then classes_def.xml would look like:

<lcgdict>
 <class name="mu2e::StrawHit"/>
 <class name="mu2e::StrawHitCollection"/>
 <class name="art::Wrapper<mu2e::StrawHitCollection>"/>
</lcgdict>

and classes.h would look like:

#include <vector>
#include "RecoDataProducts/inc/StrawHitCollection.hh"

The rule for classes_def.xml is that the art::Wrapper line must be present for every class that can be given to the event using a call to event.put(...); in this case that is just art::Wrapper<mu2e::StrawHitCollection>. You also need a line for the class that is the template argument of art::Wrapper, in this case StrawHitCollection. And you need a line for every type that is used within StrawHitCollection. This applies recursively until only primitive types are found ( that is, we do not need lines for int, double, float, char and so on).

There is one exception to the rule that you must recursively declare all classes that are data members of your class. You must not declare them if they are already found in another dictionary that is known art. For example none of the Mu2e dictionaries includes a reference to CLHEP::Hep3Vector or CLHEP::HepLorentzVector; these are found in

$CANVAS_ROOT_IO_DIR/source/canvas_root_io/Dictionaries/clhep/classes_def.xml.

Mu2e has adopted the convention that if we need a dictionary entry for a type that is used more than one dictionary, that type should be defined in DataProducts/src/classses_def.xml and DataProducts/src/classses_def.h.

You can learn what other dictionaries are defined by art by giving the following unix command:

find $CANVAS_ROOT_IO_DIR -name classes_def.xml

If we had decided that it made sense to add a single StrawHit as a data product, then we would also need to write the wrapper line for StrawHit. Instead we decided that if you would like to store a single StrawHit, you need to store it by creating a collection with only one member and storing that collection.

Every header file that is need to recursively resolve classes declared in classes_def.xml must be #included'ed in classes.h. In earlier versions of root, it was necessary to include some explicit template instantiations in classes.h but these is no longer needed and should not be present.

There is a syntax to make only a subset of the data members of a class persistent. There is also a syntax to tell the framework to make a data product purely transient: that is, it can be added to the event so that other modules may use it, but it will never be written out. For details see the next two sections and see also the CMS documentation for making new products.

Transient Data Products

It is possible to tell the framework that it should allow data products of a certain type to be added to the event but that it should never write out data products of that type. This is useful, for example, for data products that are full of bare pointers. To declare the class MyClass as a transient data product you need to add one line to classes_def.xml

 <class name="MyClass" persistent="false"/>

and one line to classes.h,

#include "MyClass.hh"

One should not provide the lines for the art::Wrapper<MyClass> to either of these files. Moreover it is not necessary to provide lines in classes_def.xml that describe the classes used as data members inside MyClass. When an output module encounters this data product it will not try to persist the data product.

See also the CMS documentation for making new products.


Transient Data Members within a Persistable Class

It is also possible to declare that a data member of a class is transient. This is done in classes_def.xml. Suppose that the class MyClass has a data member with the name _field of type T. The data member can be declared transient using the syntax:

 <class name="MyClass">
    <field name="_field" transient="true"/>
 </class>

In this case the data member _field will not be written to the output file but the remaining data members of MyClass will be. When these objects are read back, the data member _field will be invalid and the user needs to know not to access this data member until it can be properly initialized by some other method. Ideally MyClass should protect against illegal access either by initializing on demand or by throwing.

If there are no persisted objects of type T in the any of the data products, then it is not necessary to declare the type T in classes_def.xml.

See also the CMS documentation for making new products.

Identifiers of a Data Product

Please see the documentation on the art naming convention for products.

Writing only Selected Events and Selected Data Products

It is possible to configure an art job so that it writes selected events to one or more different output files. It is also possible to configure each output file so that only selected data products are written to that file. These operations are described in the web page on configuring output files.