ArtServices

From Mu2eWiki
Jump to navigation Jump to search

Introduction

This page describes services that are distributed as part of art. It does not discuss services that are part of the Mu2e Offline software.



Tracer

This service will print an informational message at the start and end of every call to a module or to a user service; it indicates depth within the event loop state machine using "++" to indicate top level, "++++" to indicate the next level, and so on.

The service can be enabled or disabled from the mu2e command line:

  > mu2e --trace   -c input.fcl
  > mu2e --notrace -c input.fcl

It can also be enabled by adding the following fragment within the services parameter set in your .fcl file:

services : {
  scheduler: { wantTracer : true }
}

If there is both a wantTracer: true/false parameter in the .fcl file and a command line argument, the command line argument takes precedence.

Under the covers, the tracer registers a callback for every state machine transition for which a service may register. At each transition, it will call the registered callback, which will print an informational message to the log file. For the underlying code see:

  • [1] Framework/Services/Optional/Tracer_service.cc</a>

One can use this code as a model for how to register service code to respond to state machine transitions.


TimeTracker

The art TimeTracker service provides you with information about the time used by each module in an art job. It can be configured to provide just summary information in the job log or to write an SQL database that has detailed information about how long each module spent on each event.

For documentation, see the art wiki page describing the TimeTracker service.

Caveats About Event Timing

There are two important caveats about event timing. When running Mu2e GEANT4 simulations, there are outlier events that take up to a few hundred times longer than the median event. These events occur at the percent level. When you run a short grid job in order to estimate the time for a much longer job, be aware that your estimates could be off because too many or too few of these events are in the small sample sample. Be sure to run long enough test jobs to average this out; a test job that executes for 30 minutes should usually be safe.

I presume that a similar issue exists for running full reconstruction on simulated events but I have not yet quantified it.

A corollary of the above is that the numbers from the timing report should be more robust than the estimate you would get by runninng two test jobs, one with few events and one with many events, in order to measure the startup overhead. This method is vulnerable to one of the events in the first job being a long-cpu-time outlier.


The second issue is that different grid slots have different CPU power. Most of the GP Fermigrid nodes have the same CPU power per slot as the interactive nodes, mu2egpvm*. But there are some older machines around, some of which are slower per core but some of which are faster per core ( but with fewer cores per chip ). If you run jobs opportunistically, which the mu2egrid scripts do by default, then you may be exposed to machines with either more or less CPU power than those on which you ran your test jobs.

Taken together, these two effects can lead to a factor of two difference in execution time among grid processes in one grid job cluster.

Temporary Files Used by TimeTracker

Most of the Mu2e production and example jobs enable the use of the art TimeTracker service to get the end of job timing summary. If you use one of these jobs as the base for your development work, the information below is important to you.

If you enable the TimeTracker but do not specify an output database file, SQLITE will create a database in a temporary file in a location described below; art uses the temporary file to accumulate information that it will use to create the end of job report. The temporary file will be deleted at the end of the job. The location of the temporary file is governed by SQLITE's policy on temporary files: see the list in item 5 on https://www.sqlite.org/tempfiles.html . art is built so that option 1 from that list does not apply. The standard Mu2e environment does not define either SQLITE_TMPDIR or TMPDIR; so options 2 and 3 do not apply. The result on our interactive machines is that the temporary file is written to /var/tmp. I am not sure about grid machines but it will be in one of the tmp areas.

If you run a short job, the temporary database will be memory resident. If the in-memory database grows too big then the temporary file will be created. However the temporary file is not visible when you do an ls of /var/tmp. It is visible using lsof (list open files); however a regular user can only see their own open files, not those belonging to others.

If you run a long job interactively, which you should not be doing, it may create a multi-GB temporary file in /var/tmp. Depending on other concurrent use of /var/tmp, the database may fill /var/tmp causing the programs to throw an exception and exit.

There are three solutions:

  1. Disable the time tracker, either in fcl or using the --no-timing command line option.
  2. Define SQLITE_TMPDIR to point to your space on /exp/mu2e/data/users. The temporary file will be written there and then deleted
  3. Specify an output file name for the time tracker database, either in fcl or using the "--timing-db file.db" command line option.


You can read how to do this in fcl on the art wiki page describing the TimeTracker service.

MemoryTracker

The art MemoryTracker service provides you with information about the memory used by each module in an art job. It can be configured to provide just summary information in the job log or to write an SQL database that has detailed information about how much memory was used by each module for each event.

For documentation, see the [2] art wiki page describing the MemoryTracker service</a>.



TFileService

When you want to make make histograms inside a module you will normally use a package named ROOT . Here "histograms" is shorthand for any of histograms, ntuples, TTrees, TGraphs and all other sorts of data presentation and data summary objects. When other people wish their modules to do the same, they will also use ROOT. In addition, the art event-data IO system uses ROOT's IO subsystem. You are free to use other data presentation tools but the only supported tool is ROOT.

ROOT, however, is fragile and, when several independent pieces of code all use ROOT, it is very easy for these codes to collide with each other. Sometimes this will crash the program but more often it will produce subtlety incorrect results.

The remainder of this section presumes that you have a minimal familiarity with ROOT: you know how to create, fill and write out histograms and ntuples.

Art supplies a service, named TFileService, that if used as intended, will automatically avoid these problems without any need for any person to be aware of what others are doing. TFileService opens one ROOT output file to hold all of the histograms, ntuples, TTrees, TGraphs, TCanvases, etc that are produced by the modules; this file is distinct from the event-data output files that are created by output modules; this file is only created as needed and will not be created if no histograms, ntuples etc are created.

Within the ROOT output file, TFileService makes a subdirectory for each module instance; the subdirectory is named using the module label. When a module instance runs, any histograms that it creates are automatically created inside the correct subdirectory. In this way two modules can make a histogram named "multiplicity" and these two histograms will automatically be distinguished. Moreover, if an art job contains two or more instances of the same module, the histograms from each module instance are distinguished as each is created in a uniquely named directory. Finally, TFileService ensures that user created histograms do not interfere event-data IO.

The name TFileService comes from ROOT's TFile class, which is the name of the ROOT class that manages disk files.

There are two parts to the pattern of using TFileService. The first part is to include the following fragments in your .fcl file.

services :
{
  TFileService : { fileName : "readback.root" }
}
physics :
{
  analyzers:{
    <font color=red>mylabel1</font> : {
       module_type : MyClass
       eMin        : 0.001
    }
  }
}

The fileName parameter to TFileService specifies the name of the file in which the histograms for all modules will be found. If none of the modules in your job create histograms, then the TFileService parameter set may be omitted; that, however, should be very, very rare. In this example, the module that makes the histograms happens to be an analyzer, but nothing significant changes if it is a producer or a filter.

The second part of the pattern is inside your module class:

namespace mu2e {

  class MyClass : public art::EDAnalyzer {
  public:

    explicit MyClass(fhicl::ParameterSet const& pset);
    virtual void beginJob() override;
    void analyze(const art::Event& event) override;

  private:
    double eMin;
    std::string _makerModuleLabel;
    <font color=red>TH1F* _hNSimulated;</font>
  };

  MyClass::MyClass(fhicl::ParameterSet const& pset):
     eMin(pset.get<int>("eMin"),
     _makerModuleLabel("makeSH"),
     <font color=red>_hNSimulated</font>(0){
  }

  // At the start of the job, create histograms.
  void MyClass::beginJob(){
    <font color=red>art::ServiceHandle<art::TFileService> tfs;
    _hNSimulated = tfs->make<TH1F>( "hNSimulated", "Number of SimParticles", 100, 0., 300. );</font>
  }

  void MyClass::analyze(const art::Event& event) {

    art::Handle<StrawHitCollection> hitsHandle;
    evt.getByLabel(_makerModuleLabel,hitsHandle);
    StrawHitCollection const& hits = *hitsHandle;

    int n(0);
    for ( StrawHitCollection::const_iterator i=hits.begin(), e=hits.end();
          i !=e ; ++i ){
       StrawHit const& hit = *i;
       if ( hit->energyDep() > eMin ){
          ++n;
       }
    }

    <font color=red> _hNSimulated->Fill(n);</font>
  }

This example creates and fills one histogram. That histogram will be found in the output file with the name /mylabel1/hNSimulated.

In the beginJob method, the first step is to get a ServiceHandle to the TFileService. The next step is to use the TFileService to create a histogram; compared to native ROOT, the only real difference is wrapping the "new TH1F" call inside "tfs->make<TH1F>". You may create additional ROOT objects using a similar syntax. You may also make ROOT subdirectories and create new histogram objects in those directories.

Once the histogram has been created, access to it is available via the bare pointer that is held as member data of the class. In the above example, the analyze method fills the histogram using this pointer. Another standard ROOT behaviour remains unchanged: the histogram will automatically be written to the ROOT output file when that file is closed.

During each call to tfs->make<TH1F> TFileService checks if it needs to open the ROOT output file and checks if it needs to create the per-module-label subdirectory. If either of these actions are necessary, TFileService will perform them. Therefore, if a module creates no histograms, there will be no corresponding per-module-label subdirectory in the ouptut ROOT file; and, if an entire art job that creates no histograms, no ROOT output file will be created. This is part of the art philosophy that art will have no unnecessary side-effects.

Behind the scenes, TFileService does the following work:

  1. Before calling modules, save selected parts of the state of ROOT.
  2. On any call to tfs->make<TH1F>:
    • If not already done, open the ROOT output file.
    • If not already done, create, in the ROOT output file, a subdirectory named /mylabel1.
    • Forward the call to ROOT; in this example it creates the requested histogram.
  3. On return from the call to MyClass::beginRun, restore the state of ROOT to that saved in step 1.
  4. Before subsequent calls to most methods of the module labeled mylabel1, cd to the ROOT directory /mylabel1; the methods that TFileService knows about are, the constructor, analyze/produce/filter, BeginJob, EndJob, BeginRun, EndRun, BeginSubRun, and EndSubRun. TFileService does not do this for module methods that respond to opening/closing of input/output event-data files.
  5. On subsequent calls to the list methods of MyClass, restore the state of ROOT to that saved in step 1.

In the above example, the histograms were created in the beginJob method. Alternatively they could been created in the beginRun method; at present, the end result would be the same. We are considering adding an option to TFileService such that histograms created in beginRun will be maintained on a per run, not per job basis.

One could also have chosen to create the histograms in the constructor of MyClass. For this simple example, creating histograms in the constructor would have worked. But be aware that the geometry and conditions data will, in general, only be defined on a per run, or per subRun basis; therefore histograms that require this sort of information at creation time can only be created after that information is defined.


Now consider the following change to the run time configuration:

physics :
{
  analyzers:{
    <font color=red>mylabel1</font> : {
       module_type : MyClass
       eMin        : 0.001
    }
    <font color=red>mylabel2</font> : {
       module_type : MyClass
       eMin        : 0.002
    }
  }
}

In this case TFileService will make two subdirectories, /mylabel1 and /mylabel2; there will be two histograms, which will be named /mylabel1/hNSimulated and /mylabel2/hNSimulated


We strongly advise that you do not open your own ROOT files and manage them; it is very easy to cause conflicts with the art IO subsystem. If you do need to do this, speak with the Mu2e Software team before you do so.


Making ROOT Subdirectories

In your module it is possible to create subdirectories of the directory that TFileService has made for you. The syntax is,

   art::ServiceHandle<art::TFileService> tfs;
   art::TFileDirectory tfdir           = tfs->mkdir( "EjectedNeutronGun" );
   _hMultiplicity                      = tfdir.make<TH1D>( "hMultiplicity", "Neutron Multiplicity", 20, 0, 20  );

where _hMultiplicity is a member datum of type TH1D*.


RandomNumberService

This is important enough to have a whole page of its own.