FclIntro

From Mu2eWiki
Revision as of 17:40, 19 December 2017 by Rlc (talk | contribs) (→‎ReDefinitions)
Jump to navigation Jump to search

Introduction

This page will explain the system used by art to provide run-time configuration; that is, it will explain what you see inside the .fcl files.

Run-time configuration for art is written in the Fermilab Hierarchical Configuration Language ( FHiCL, pronounced "fickle"), a language that was developed at Fermilab to support run-time configuration for several projects, including art. By convention, the names of FHiCL files end in .fcl. The FHiCL documentation is still under development but draft copies of two documents are available; in addition the source code and development notes are also available via redmine:

FHICL-CPP is the C++ toolkit that we use to read FHiCL documents within art.

This page will discuss both features that are part of the FHiCL language itself and features that are defined by art. A few features of FHiCL that are not important for Mu2e will be skipped over.

At present, FHiCL documents are just .fcl files in the file system. In the future there will be an option to store FHiCL documents in a database and for art programs to access them from that database; this will aid in maintenance of the audit trail of which file was produced using which code with which configuration.


FHiCL Basics

An art run-time configuration is just a FHiCL document. A FHiCL document is a file that contains a collection of definitions of the form

 name : value

where many types of values are possible, from simple atomic values to highly structured values; a value may also be a reference to a previously defined value. The white space on either side of the : is optional; whitespace is defined to be any of the space, tab, newline or carriage return characters. Two definitions are separated by any whitespace. FHiCL provides a C++ API (see below) to allow users to ask for values by name.

The fragment below will be used to illustrate some of the basics of FHiCL:

# A comment.
// Also a comment.

 name0  : 123                    # A numeric value.  Trailing comments work too.

 name00 : "A quoted comment prefix, # or //, is just part of a quoted string, not a comment"

 name1:456.                      # Another numeric value; whitespace is not important within a definition
 name2 : -1.e-6
 name3 : true                    # A boolean value
 NAME3 : false                   # The other boolean value; names are case sensitive.
 name4 : red                     # Simple strings need not be quoted
 name5 : "a quoted string"
 name6 : 'another quoted string'

 name7 : 1 name8 : 2            # Two definitions on one line, separated by whitespace.

 name9                          # Same as name9:3 ; newlines are just whitespace, which is not important.
 :
 3

 namea : [ abc, def, ghi, 123 ]  # A sequence of atomic values. FHiCL allows heterogeneous sequences.
                                 # But heterogeneous sequences are not usable via the [[#api|C++ API]] .

 nameb :                         # A table of definitions; tables may nest.
 {
    name0: 456
    name1: [7, 8, 9, 10 ]
    name2:
    {
      name0: 789
    }
 }

 namec : [ name0:{ a:1 b:2 } name1:{ a:3 c:4 } ]   # A sequence of tables.

 named : [] # An empty sequence
 namee : {} # An empty table

 namef : @nil   # An atomic value that is undefined.

Comments are delimited either using the shell style pound/hash character (#) or the C++ style //. In both cases, all text to the end of line is considered a comment; comments may start in any column and comments are permitted to start after valid document text. Trailing comments are permitted. If a comment prefix is found within a quoted string, it just a part of that string, not a comment prefix. The following characters, including the 2-character sequence ::, are reserved to FHiCL:

 , : :: @ [ ] { } ( )

In addition the following strings have special meaning to FHiCL:

  true, false, @nil, infinity, +infinity, -infinity, BEGIN_PROLOG, END_PROLOG

The first six strings are reserved identifiers when they are in lower cases and unquoted; the last two are only reserved identifiers when they are in upper case, unquoted and at the start of a line. Otherwise they are just strings. One may include the above reserved characters and identifiers in a string by quoting the string.

In the above, the choice of example names was deliberately boring; a FHiCL name may be any unique string that begins with a letter or an underscore and contains only letters, numbers and underscores; it may not contain whitespace. Names are case sensitive. FHiCL names are hierarchical and understand scope; so the top level name0 is distinguished nameb.name0 and both are distinguished from nameb.name2.name0

While FHiCL supports the underscore character as just another letter, art explicitly forbids the use of the underscore character in process names, module labels and data product instance names.


A value may be one of

an atom, a sequence of values, a table of definitions, a reference to a previously defined value

Atomic Values

An atom may be one of:

a number; a string; a boolean value; one of the reserved identifiers: true, false, @nil, infinity, +infinity, -infinity

By definition, atoms may not contain internal whitespace. Atoms are separated by white space or by one of the reserved to FHiCL delimiters discussed above. The six identifiers reserved to FHiCL must be in lower case and unquoted.

Numbers may be represented in the usual ways. For example,

  a : 100
  b : 100.
  c : 1.E2
  d : +1000.E-1

all mean the same thing. FHiCL has an internal sense of an atom that represents a mal-formed numeric value; if it detects such an atom it will throw an exception as soon as it encounters the atom. See below for some additional details about numeric values, including a discussion of signed and unsigned infinity.

The two boolean values are represented by the identifiers true and false; only lower case is recognized as a boolean value. There are no other valid representations of boolean values; while some languages permit numerical representations, FHiCL does not.


If a string contains no embedded whitespace and begins with a letter or underscore, the string may be unquoted; or it may be quoted as one prefers. If a string does contain embedded whitespace, or if it begins with some other type of character, it must be quoted. If an atom would trigger FHiCL's internal sense of a mal-formed numeric value, then it must also be quoted. A string may be quoted with single or double quotes; these differ in their treatment of escaped characters but that will not be important for Mu2e; you may read about it in the FHiCL documentation.

The case sensitive atomic value @nil means that the FHiCL name is present in the FHiCL document but it's value is undefined. In the C++ API, if one attempts to get the value of a parameter whose value is @nil, FHiCL will throw an exception. Mu2e uses @nil in the following use case. It often happens that a parameter set has many values that are normally set by experts and also has one value that must be set by the end user. In this case it is often convenient to provide a default value for the parameter set in a PROLOG; the issue is how to deal with the parameter that must be set by the end user. One could simply leave it out but this does a bad job of documenting the parameter set. One could define it and comment it out. The recommended alternative is to provide the parameter and set its value to @nil; if the end user fails to properly set the parameter, the code that reads the parameter set will throw an exception.

Sequences

A FHiCL sequence is a list of values surrounded by square brackets; it uses a comma as an internal delimiter,

 namea : [ abc, def, ghi, 123 ]

Because FHiCL is not typed, sequences may mix numbers and strings (and more, see below). Heterogeneous sequences are natural in many scripting languages but it are not natural in C++. It is not possible to access FHiCL heterogeneous sequences via FHiCL's C++ API ; so Mu2e will not use heterogeneous sequences.


Tables

A FHiCL table is a collection of definitions that is surrounded by braces and internally delimited by whitespace:

 name :
 {
   name0 : 123
   name1 : 456.
 }


ReDefinitions

There are two forms of redefinition supported by FHiCL, a repeated definition within the same scope and a redefinition using the fully qualified name of a parameter.

Redefinitions at intermediate scope are not supported; they do not cause a parse error but they do produce undefined results.

If a definition is repeated twice within the same scope, the second definition will win. For example,

abc : 1
abc : 2
def : [ 1, 2, 3 ]
def : [ 4, 5, 6 ]
name : {
  abc : 1
  abc : 2
}

will result in the parameter abc having a value of 2 and def having a value of [4,5,6]. This works the same way for parameters whose value type is a table.

Because FHiCL is not strongly typed, the following is legal. But is not every likely to be interesting to Mu2e ( because the C++ API is strongly typed):

  abc : 1
  abc : { def : [ x, y, z ] }


The second form of redefinition is using the fully qualified name:

 a : {
    b : {
      c : 1
    }
 }
a.b.c : 2

This will be used frequently by Mu2e to make small changes to standard parameter sets.

The following is an example of a redefinition at intermediate scope, which is not supported by FHiCL,

 a : {
    b : {
      c : 1
    }
    b.c : 2
 }

While this sometimes works, it should be avoided since it is not formally defined inside FHiCL.

Numeric Values

There are two senses in which a value might be a valid numeric value. The first sense is what the FHiCL parser thinks it is; the second sense is whether or not FHiCL's C++ API can successfully convert that value to a C++ numeric type.

When FHiCL parses a document it looks at each atom and decides if it represents a numeric value. If an atom begins with a numeral or with +-. ( plus sign, minus sign, decimal point), then the atom is presumed to represent a numeric value. If, upon subsequent inspection, the atom is not a well-formed numeric value, then FHiCL will throw an exception. This exception is thrown by FHiCL, not by art. At this stage, FHiCL never tests quoted strings to see if they are well-formed or mal-formed numeric values; they are just strings. For example, when FHiCL parses the definition

a :  1a

it will decide that the value is a mal-formed numeric value and will throw an exception. On the other hand,

a :  "1a"

defines a value that is just a string type. FHiCL requires the concepts of a well-formed and mal-formed numeric values in order to reduce a document to its canonical form.

The C++ API for accessing FHiCL information is strongly typed; that is you must ask for a numeric value as one of the C++ numeric types, int, float, double and so on. The API will attempt to convert any atomic or string value to the requested integral or floating point type; if the conversion fails, the API will throw an exception. FHiCL will do the obvious type conversions for you; for example, you may ask for

name1 : 456.

as either an integral type or as a floating type; in both cases you will get the expected result.

There is one behaviour of the C++ API that must be called out clearly. Consider the definition,

a : 1.5

The C++ API will let a user ask for this value either as an integral type, as a floating point type or as an std::string. If you ask for it as an integral type the result will be 1. This is consistent with automatic type conversion within C++. We have asked the art development team to expand the C++ API to allow Mu2e select different behaviour: that, if we try to convert a numeric value with a non-zero fractional part to an integral type, it will either print a warning or throw an exception.

If the value of a defintion is @nil, an attempt to convert it to any numeric type will cause the API to throw an exception.

If the value of a definition is one of the reserved identifiers, infinity, +infinity, -infinity, the value may be converted to a floating point type but not to an integral type. The value of the floating point type will be the architecture dependent representation of signed infinity. If the architecture does not support such a representation, the code will throw an exception.

FHiCL only recognizes the decimal point, not the European style comma, as a delimiter for the integer and fractional part of of a floating point number; FHiCL does not support the use of a comma (or the European style decimal point) to delimit thousands, millions and so on.


Additional Information

Except for being a delimiter between definitions, whitespace is unimportant. Therefore the FHiCL fragments

 name :
 {
   name0 : 123
   name1 : 456.
 }
 namea : [ abc, def, ghi, 123 ]

could have been written as,

 name:{name0:123 name1:456.} namea:[abc,def,ghi,123]


The namec: line in the big example shows that a sequences and tables can be nested inside each other. They can be nested to arbitrary depth. The next two lines show that it is legal to define empty sequences and tables. Finally, the last line defines namef as a name that is present but has an undefined value.


FHiCL supports facilities to pre-define values so that they can be used later in multiple places. FHiCL also supports the ability to modify values, or subsets of values, after they have been defined. These will be discussed later.

One can see from the above that, after a FHiCL document has been parsed, the result is just a FHiCL table.


Configuration of a Module

From the point of view of a Mu2e physicist, art is the tool that drives the event loop and calls user code at the appropriate places in the event loop. User code is found in art modules and the following FHiCL fragment illustrates how to specify the run-time configuration an art module:

moduleLabel :{
  module_type : ClassName
  pname0 : 1234.
  pname1 :  [ abc, def]
  pname2 : {
       name0: {}
  }
}

A valid configuration for an art module is expressed as a FHiCL table. Within art, FHiCL tables are visible to the user as objects of type fhiclcpp/ParameterSet.h. From here forward, this document will usually refer to the run time configuration of a module as its parameter set, even when discussing the FHiCL table representation of that parameter set.


The moduleLabel is a FHiCL name, chosen by the Mu2e physicist. It must obey the rules for FHiCL names, be unique within the configuration of an art job and not contain the underscore character. In this context, the FHiCL name module_type is a identifier reserved to art and must be present in the configuration of a module; if it is absent, art will throw an exception. The FHiCL value ClassName is the name of the C++ class that holds the code that user wishes to execute. By convention, this code is found in a file, somewhere in the Mu2e Offline hierarchy, with the name ClassName_module.cc; the Mu2e build system will compile this file into a dynamic library named Offline/lib/libClassName_module.os. At run-time, art looks in the environment variable LD_LIBRARY_PATH to find a file named libClassName_module.os; it will load this dynamic library and find the code for the module inside.

At present there is one problem with this convention: if, in two different subdirectories, there are two modules with the same filename, both will produce libraries name lib/libClassName_module.os. A plan is in place to ensure that they make distinctly named libraries and for art to either unambiguously load the correctly library or issue a run-time diagnostic.

The remaining lines of the parameter set are just FHiCL definitions that will be formed into a fhicl::ParameterSet object and passed to the module as an argument in its constructor. The names in this parameter set are meaningful only to the module, not to art itself. While most of the early Mu2e examples only use parameters with atomic values, it is legal to use the full power of FHiCL within a parameter set; that is, the parameter set used to configure a module may include sequences and parameter sets nested to arbitrary depth.

The minimum legal configuration of a module is,

moduleLabel : { module_type : ClassName }


It is meaningful within one art configuration to define two modules that have the same ClassName that differ by some elements in the remainder of their configuration. These two instances of the module are distinguished by having different moduleLabels. This capability might be used if one wished to run the same algorithm twice in one job, perhaps once with loose cuts and once with tight cuts. Any data products produced by these two module instances will automatically be labeled in a way that distinguishes which module instance produced them. Any histograms or ntuples produced by these two module instances will automatically be put into separate ROOT directories; these directories are named using the moduleLabels.



Configuration of a Service

Art services behave like agents that manage a resource and allow other code to access that resource. There are some services that are native to art and others that are written by Mu2e; Mu2e uses services to manage, among other things, geometry and conditions information. There is more information available about services in art .

The following FHiCL fragment illustrates the run-time configuration of a service:

ClassName :{
  pname0 : 1234.
  pname1 :  [ abc, def]
}

As for a module, a service is configured with a FHiCL table that is seen by the C++ code as a fhicl::ParameterSet, passed as an argument to the constructor of the service class.

The FHiCL name ClassName is the name of the C++ class that implements the service. This class must live somewhere in either the art or Mu2e code bases as two files with filenames ClassName_service.cc and ClassName.hh. The mu2e build system will compile these files into a shared library with the name ClassName_service.so; the art build system will compile these files into a shared library with the name dir_subdir_ClassName_service.so, where the string dir_subdir, is the file system path from the root of art to ClassName_service.cc.


By definition there may be at most one instance of any service within an art job. Therefore the analog of a moduleLabel does not exist for services: the ClassName alone is sufficient to specify which service is requested.


services : {
  // ParameterSets for zero or more services,
  // both services defined by art and those defined by Mu2e
}

In this context, the FHiCL name services is an identifier reserved to art.

A valid run-time configuration must include a parameter set in the FHiCL table services for each Mu2e written service that will be used in the course of the art job. This is true even if the service has no run-time configurable parameters; in that case an empty parameter must be supplied in the .fcl file; art will not provide a default.

In earlier versions of art there was a convention that the service block should have the structure:

services : {
  // ParameterSets for zero or more art defined services.
  user : {
     // ParameterSets for zero or more Mu2e defined services
  }
}

This experiment failed and this style is deprecated.



Overall Structure of an art Run-time Configuration

The example below illustrates the top level view of an art run-time configuration; some details have been omitted for clarity. An art run-time configuration is just a FHiCL document. At present they live in simple files but we expect that, at some future date, the configurations will be kept in databases; this will allow a more robust audit trail of how each data file was processed.

In the following, the identifiers highlighted in red are reserved to art; these are in addition to the identifier module_type discussed earlier and the identifier @local:: discussed later.

process_name : helloWorld      # The process name must NOT contain any underscores
 
source : {
   # Parameters for exactly one source module
}
 
services : {
   # ParameterSets for zero or more services.
}
 
physics: {
 
  producers : {
     # ParameterSets for zero or more producer modules
  }
  analyzers: {
     # ParameterSets for zero or more analyzer modules
  }
 
  filters : {
     # ParameterSets for zero or more filter modules
  }
 
  path0 : [  comma separated list of module labels of producer or filter modules  ]
  path1 : [  comma separated list of module labels of producer or filter modules  ]
 
  path2 : [  comma separated list of module labels of analyzer or output modules  ]
  path3 : [  comma separated list of module labels of analyzer or output modules  ]
 
  trigger_paths: [ path0, path1 ]
  end_paths:     [ path2, path3 ]
}
 
outputs: {
  # ParameterSets for zero or more output modules
}


The parameter process_name identifies this art job. It is used as part of the identifier for data products produced in this job. For this reason, the process name may not contain underscore characters. If the process_name is absent, art substitutes a default value of "DUMMY".


The source parameter set describes where events come from. There may be at most one source module declared in an art configuration. At present there are two options for choosing a source module:

  • module_type : RootInput
    art::Events will be read from an input file or from a list of input files; files are specified by giving their pathname within the file system. In the future Mu2e will support a file catalog but, at present, there is no such system.
  • module_type : EmptyEvent
    Internally art will start the processing of each event by incrementing the event number and creating an empty art::Event. Subsequent modules then populate the art::Event. This is the normal procedure for generating simulated events.

See the [[IOModules| web page about configuring input and output modules] for details about what other parameters may be supplied to these parameter sets. If no source parameter set is present, art substitutes a default parameter set of:

source : {
  module_type : EmptyEvent
  maxEvents : 1
}


The configuration of art services was discussed above. If the services parameter set is missing entirely, art will supply a default that configures only the message logger.

If an art-supplied service is requested by the code, and if there is no corresponding parameter set in the .fcl file, then art will supply a default parameter set. If a Mu2e-defined service is requested by the code, and if there no corresponding parameter set in the .fcl file, then art will throw.

Some of the art-supplied services can be turned on from command line switches; these include a service to trace all of the module and service calls made by art, a service to present timing information and a service to profile memory usage. When the command line switch is present, the service need not be included in the run-time configuration; art will supply defaults if needed. There is additional information available about the behaviour of the art-supplied services ; this discusses their default parameter sets and how to request them from the command line.


The physics parameter set has five reserved identifiers: filters, analyzers, producers, trigger_paths and end_paths. The first three must have values that are FHiCL tables of parameter sets and the last two must have values that are FHiCL sequences of art path names; an art path name is a FHiCL sequence of module labels. Any other top level parameter within the physics parameter set will be interpreted as an art path name; that is, it must be a FHiCL sequence of module labels. There is another web page that [Paths.shtml discusses paths in more detail] .

The <fonte color=red>physics.producers parameter set should contain parameter sets, of the form shown above, that are used to configure EDProducer modules. Similarly the physics.analyzers and physics.filters parameter sets should hold the configuration information for the EDAnalyzer and EDFilter modules, respectively. At present these rules are not rigorously enforced but they will be soon.

If the physics parameter set, or any of its components are missing, art will substitute a default value of an empty parameter set or an empty sequence, as appropriate.

The final element in a run-time configuration is the outputs parameter set. It contains parameter sets that configure zero or more output modules. The rules to send one subset of the events to one output file and a different subset of the events to a different output file are described on the web page that discusses paths in more detail.

If the outputs parameter set is missing, art will supply a default value of an empty parameter set.

This leaves a short discussion of art paths and the identifiers trigger_paths and end_paths. An art path is just a FHiCL sequence of module labels; there are four such paths defined in this example, path0 through path3. Any first level name in the physics parameter set, except for the five identifiers reserved to art, will be interpreted as the name of an art path. The trigger_paths definition is a FHiCL sequence of art path names; if an art path is an element of the trigger_paths sequence, then moduleLabels in that path

  1. must be labels of EDProducer and EDFilter modules.
  2. will be executed in the specified order.

The end_paths definition is also a FHiCL sequence of art path names; if an art path is an element of the end_paths sequence, then modulesLabels in that path

  1. must be labels of EDAnalyzer and output modules.
  2. may be executed in any order

A full discussion of paths is available elsewhere.

If either trigger_paths or end_paths is absent from a configuration, art will substitute a default value of an empty sequence.


Unimportance of Ordering

Once any redefinitions have been processed, neither art nor FHiCL care about the order of items within the resulting FHiCL document. Both do care about how definitions nest inside each other. FHiCL always cares about the order of items within a sequence but art only cares about the order of items within those paths that are part of the trigger_paths sequence.

In summary, in the above example art only cares about the order of the elements inside the sequences path0 and path1. Any other reordering that preserves the nesting structure is equivalent to that shown.


Command Line Arguments

Some elements of an art run-time configuration may be overridden by parameters that appear on the art command line. To see what the options are:

mu2e --help

At this writting (Sept 2011), the allowed command line parameters are:

mu2e <-c <config-file>> <other-options> [<source-file>]+:
  -T [ --TFileName ] arg   File name for TFileService.
  -c [ --config ] arg      Configuration file.
  -e [ --estart ] arg      Event # of first event to process.
  -h [ --help ]            produce help message
  -n [ --nevts ] arg       Number of events to process.
  --nskip arg              Number of events to skip.
  -o [ --output ] arg      Event output stream file.
  -s [ --source ] arg      Source data file (multiple OK).
  -S [ --source-list ] arg file containing a list of source files to read, one
                           per line.
  --trace                  Activate tracing.
  --notrace                Deactivate tracing.
  --memcheck               Activate monitoring of memory use.
  --nomemcheck             Deactivate monitoring of memory use.

All command line parameters are optional and many have both a short and a long form. In general the command line parameters can modify the names of files, the flow of the event loop and whether or not some monitoring services are enabled. If a parameter is specified both within the .fcl file and on the command line, the command line value takes precedence.

In the original design of art, it was planned that no parameters that control the physics behaviour would be exposed on the command line; they would only be modifiable by editing the .fcl file. It was planned that file names, debug levels and other parameters that do not change the physics behaviour would be modifiable from the command line. This restriction was chosen because of the then existing ideas about how to maintain an audit trail for run-time configurations. Recently the NOvA experiment, who also uses art, asked to remove this restriction; the plan to retain a strict audit trail is that the final state of the .fcl file, after all command line substitutions, will be stored in each of the event-data output files.

It is not clear yet if Mu2e will stick with the original plan or if we will choose to follow the route chosen by NOvA. Another option is to define the idea of an art "production mode". The idea behind production mode is that there are many convenience functions that are valuable for development and debugging but which make it difficult to maintain an audit trail. Among these convenience functions is the ability to override an arbitrary .fcl parameter from the command line. One could imagine that, when production mode is not set, all of these convenience functions would be available but, when production mode is set, they would be disabled.


The Canonical Form and the Hash Code

When FHiCL prints a document to the screen or to a file the output appears in what's known as the canonical form, in which the source formatting is entirely lost. In this form all comments are stripped, there is standardized indentation, and all strings, even atomic strings, are double quoted. Because the order of definitions is not important, they will appear in the canonical form in some well defined order meaningful to FHiCL; in general this order is not that in which they appeared in the input file.

In a FHiCL document, if a name is redefined, only the final definition will be present in the canonical form. All earlier definitions will be lost.

The other aspect of the canonical form is that all atoms that are identified by FHiCL as well-formed numeric types will be represented in a FHiCL-defined form. This form may be different than the form that appeared in the input file; but it will convert to the same bit pattern when converted to a numeric type. For example if the source file contains the definition,

a : 123.45

The canonical form will contain:

a : 1.e2345e2

The canonical form of a floating point number is scientific notation with enough digits to guarantee no loss of precision.

Other examples of source formats are,

   b: 1
   c: 1.
   d: 1
   e: +1
   f: 1.23e2
   g: 1.23e3

These will have the canonical forms:

   b: 1
   c: 1
   d: 1
   e: 1
   f: 123
   g: 1230

All of b through e have the same canonical form. One rule is that if a meaningless fractional part or sign is present, it will not appear in the canonical form. A second rule is that if a number is represented in the source format in scientific notation, and if that number is representable, without loss of precision, as an integer less than 999999, then the number will be represent as an integer; this rule is illustrated in items f and g.

Once a configuration has been reduced to its canonical form, one can hash that form to compute an almost-certainly-unique key. See the FHiCL documentation for details. One can use the hash codes as a short cut to ask if two configurations are the same or different.

Why bother with a canonical form? In particular, why bother with a canonical form for numeric values?

Many previous experiments have discovered that configuration files get handed around. In the process, trival changes accumulate. For example a definition that starts life as

a: 1

May end up a few generations later as

a: 1.0

This has the consequence that two configurations that really do the same physics produce different hash codes. If the configurations, and/or their hash codes, are tracked as part of the meta-data, this makes it tedious to identify files that were produced using the same configuration.

While having a canonical form for numeric values will certainly raise new issues, we expect it to be a net reduction in the complexity of tracking meta-data.

One issue currently under study is this: consider the task of running 1000 Monte Carlo jobs that differ only in their random number seeds and the in the names of their output files. Can we arrange things such that the properties of the ensemble of jobs are represented in a single FHiCL document while the details of the different jobs are represented in some other way? If we can do this, then all files produced by this set of jobs will have a configuration that has the same hash code as every other job in the set. At present this will not work because of the way that random number seeds are distributed.


Printing the Canonical Form

It is possible, but weird, to use art to print the canonical form of the run-time configuration:

export ART_DEBUG_CONFIG=1
mu2e -c file.fcl
unset ART_DEBUG_CONFIG

When this environment variable is set, art will get its parameter set from FHiCL, if needed insert some of its own defaults, insert any command line arguments into the parameter set, print the canonical form to the screen, and then exit.


@local

FHiCL has the ability to define a value as a reference to a previously defined value. Consider the following FHiCL fragment,

foo : 5
source : {
  module_type : EmptyEvent
  maxEvents : @local::foo
}

This fragment tells art to process a maximum of 5 events. If you print the canonical form of this parameter set, the @local::foo has entirely disappeared and has been replaced by 5. At this stage FHiCL no longer has knowledge of where the 5 came form.

The identifier @local:: is reserved to FHiCL. When FHiCL encounters a value defined by the syntax @local::foo it will look in its current file for a top level object named foo and it replace @local::foo with the value found in the earlier definition. This works for all kinds of values, atomic, sequences or tables.

FHiCL also has the ability to give a new value to a previously existing name:

source : {
  module_type : EmptyEvent
  maxEvents : 5
}
source.maxEvents : 2

This fragment tells art to process a maximum of 2 events. When you print the canonical form of this parameter set, the 5 will be replaced by a 2 and the redefinition line will have vanished.

FHiCL places no restrictions on what the type of the value may be. For example, FHiCL would have been perfectly happy if the last line in the previous fragment were:

source.maxEvents : [ foo { bar: 123 abc : [ 456 ] } ]

However art will throw an exception when it tries to convert this value to an integer!

Replacement also works for individual elements of a sequence, using the notation:

foo :{
  odd : [ 1, 3, 5, 7 ]
}
foo.odd[0] : 9       // Replace 1 with 9
foo.odd[4] : 11      // Extend the sequence with 11
foo.bar    : 42      // Add a new definition to the table foo.

After this substitution, the sequence foo.odd has the value [ 9, 3, 5, 7, 11] and the table foo contains two defintions, foo.odd and foo.bar.

FHiCL will let you do the following,

foo :{
  odd : [ 1, 3, 5, 7 ]
}
foo.odd[5] : 11   // Extend the sequence with two new members, @nil and 11.

After this substitution, the sequence foo.odd has the value [ 1, 3, 5, 7, @nil, 11 ]. This is well defined to FHiCL but it will cause problems for art, which will try to convert the @nil in element [4] to a numeric type, which will fail.

The redefinition mechanism only works with fully qualified names, ie names start from document scope. If you attempt the use the redefinition mechanism within an interior scope, it sometimes works but only by accident. So do not do it.



PROLOGs and #includes

FHiCL supports an include mechanism that behaves much like that of the C Preprocessor. Suppose that the file defaults.fcl contains:

BEGIN_PROLOG
g4run_default :
{
  module_type          : G4
  generatorModuleLabel : generate
  seed                 : [9877]
}
END_PROLOG

For the moment, ignore the BEGIN_PROLOG and END_PROLOG lines. After creating this file, one may write the following fragment as part of a run-time configuration,

#include "defaults.fcl"

physics : {
  producers : {
     g4run : @local::g4run_default
  }
}

FHiCL is forgiving about superfluous white space within an include statement. Using includes provides a mechanism to distribute a standard configuration for g4run that can be used by many people.

The purpose of the PROLOG markers is to tell FHiCL that the material inside the PROLOG is not part of the final FHiCL document; it is merely a collection of some useful definitions that may or may not be used. Therefore FHiCL excludes the prolog from the document that it sends to art. If one prints the canonical form of the final document, the PROLOG is absent.

One could have chosen to move the BEGIN/END_PROLOG from the included file to surrounding the #include statement in the top level file. FHiCL will be happy with either; the current recommendation is to put them inside the included file. This makes clear that the purpose of the included file is to be a PROLOG.

There may be many BEGIN/END_PROLOG sections within one .fcl file but they may not be nested. Includes may be nested so long as they do not cause PROLOGS to become nested.

Together the features, @local::, redefinition, PROLOGS and #include will allow Mu2e to create standard configurations and to express most actual jobs as one of the standard configurations plus a small collection of deltas. We hope that this will make it easier to understand what any given job actually did.


The C++ API

The following fragment abstracted from Analyses/src/ReadBack_module.cc shows how to read FHiCL parameters into a C++ program.

namespace mu2e {
 
  class ReadBack : public art::EDAnalyzer {
 
  public:
    explicit ReadBack(fhicl::ParameterSet const& pset);
 
  private:
    int         _diagLevel;
    std::string _g4ModuleLabel;
    std::string _generatorModuleLabel;
  };
 
  ReadBack::ReadBack(fhicl::ParameterSet const& pset) :
    _diagLevel           (pset.get<int>("diagLevel",0)),
    _g4ModuleLabel       (pset.get<std::string>("g4ModuleLabel")),
    _generatorModuleLabel(pset.get<std::string>("generatorModuleLabel")){
  }
}

In this example, art has extracted the parameter set for this module from the .fcl file and it is passed to the constructor as the variable pset. The line that initializes the member datum _diagLevel asks pset to find a parameter whose name is diagLevel, to convert it to an int and to return the value of that int; if pset cannot find a parameter named diagLevel, pset will return the value of the second argument to the get call, in this case 0. If the parameter exists and conversion to an int fails, then art will throw an exception. This can happen if the FHiCL parameter diagLevel has a value that is a string that is not a valid numeric value or if its value is @nil. There is no requirement that the name of the member datum, _diagLevel, match the name of the item in the parameter set; but it seems silly if they do not match.

Similarly, the next line looks for a FHiCL name "g4ModuleLabel" and attempts to convert it to a string. In this case there is no second argument to the get call; therefore, if "g4ModuleLabel" is not found in the parameter set, art will throw an exception.

The decision of whether or not to provide a default value is very important. We recommend that if a parameter influences physics behaviour and it is likely to be set by normal users, then that parameter should NOT have defaults. If a parameters influences only diagnostics, it should have defaults. The more difficult question is that some parameters do influence physics behaviour but should only be modified by experts; here a case can be made for providing defaults in the code:

  • If we expose the parameter, non-experts might be tempted to play with it.
  • If the configurations become too big they become too hard to manage.


The following presents a few more examples. The parameters

  intArray    : [ 1,  2,  3  ]
  doubleArray : [ 1., 2., 3. ]
  stringArray : [ "foo", "bar" ]
  paramSet    : { foo : 1 bar : 2 }

can be read into a C++ program with the following code fragments:

  std::vector<int>          iArray( pset.get<std::vector<int> >("intArray"));
  std::vector<double>       dArray( pset.get<std::vector<double> >("doubleArray"));
  std::vector<std::string>  sArray( pset.get<std::vector<std::string> >("stringArray"));
  fhicl::ParameterSet       params( pset.get<fhicl::ParameterSet>("paramSet"));

In all cases it is possible to provide default values as a second argument to the get method. At present there is no accessor method to let you get one element of a sequence; you must get the entire sequence.

It is the intention of the art team that an arbitrary class T can be initialized by,

  T t( pset.get<T>("Name"));

The precise details of how to make this work remain to be specified.


The argument pset passed to the constructor of a module or a service is a temporary constructed by art that will go out of scope soon after the return from the constructor. If one wishes to retain pset as member data in a module, it must be held by making a copy, not by holding a pointer or reference to the argument pset.

Additional Details

At present there is no tool to validiate an art run-time configuration. One can validate that a configuration is valid according to FHiCL by using the mu2e executable to print the canonical form. This will identify FHiCL errors but it will not identify errors that are valid FHiCL but invalid art. The only way to check that a configuration is valid according to art is to run art, giving it the configuration. There are many well known ways to provide a configuration checking tool but the best methods require a lot of work and are a low priority.

Utilities

A summary of what modules ran and filter results:

services.scheduler.wantSummary: true

A summary of the time that each modules used

services.TimeTracker.printSummary: true

It is also possible to get a database of time and memory usage, recorded for each event and module, into a database. You can fill in a non-empty value for the dbOutput.filename field.

services : {
   TimeTracker : {
     printSummary : true
     dbOutput : {
        filename  : "mydbfile.sqlite"
        overwrite : false
      }
   }
}

Then art will create a new file that contains an sqlite database containing information about the time taken by each call to each module so you can do things like plot histograms of execution time per module, identify events that took a long time etc.

There are some instructions on how to query the database:

There is a similar facility for tracking memory usage, a service named MemoryTracker.