FclIntro: Difference between revisions

Revision as of 04:49, 28 September 2023

Introduction

This page will explain the system used by art to provide run-time configuration; that is, it will explain what you see inside the .fcl files.

Run-time configuration for art is written in the Fermilab Hierarchical Configuration Language ( FHiCL, pronounced "fickle"), a language that was developed at Fermilab to support run-time configuration for several projects, including art. By convention, the names of FHiCL files end in .fcl. The FHiCL documentation is still under development but draft copies of two documents are available; in addition the source code and development notes are also available via redmine:

FHICL-CPP is the C++ toolkit that we use to read FHiCL documents within art.

This page will discuss both features that are part of the FHiCL language itself and features that are defined by art. A few features of FHiCL that are not important for Mu2e will be skipped over.

At present, FHiCL documents are just .fcl files in the file system. In the future there will be an option to store FHiCL documents in a database and for art programs to access them from that database; this will aid in maintenance of the audit trail of which file was produced using which code with which configuration.

FHiCL Basics

An art run-time configuration is just a FHiCL document. A FHiCL document is a file that contains a collection of definitions of the form

 name : value

where many types of values are possible, from simple atomic values to highly structured values; a value may also be a reference to a previously defined value. The white space on either side of the : is optional; whitespace is defined to be any of the space, tab, newline or carriage return characters. Two definitions are separated by any whitespace. FHiCL provides a C++ API (see below) to allow users to ask for values by name.

The fragment below will be used to illustrate some of the basics of FHiCL:

# A comment.
// Also a comment.

 name0  : 123                    # A numeric value.  Trailing comments work too.

 name00 : "A quoted comment prefix, # or //, is just part of a quoted string, not a comment"

 name1:456.                      # Another numeric value; whitespace is not important within a definition
 name2 : -1.e-6
 name3 : true                    # A boolean value
 NAME3 : false                   # The other boolean value; names are case sensitive.
 name4 : red                     # Simple strings need not be quoted
 name5 : "a quoted string"
 name6 : 'another quoted string'

 name7 : 1 name8 : 2            # Two definitions on one line, separated by whitespace.

 name9                          # Same as name9:3 ; newlines are just whitespace, which is not important.
 :
 3

 namea : [ abc, def, ghi, 123 ]  # A sequence of atomic values. FHiCL allows heterogeneous sequences.
                                 # Heterogeneous sequences can be represented in c++ as std::tuple
                                 # See: https://en.cppreference.com/w/cpp/utility/tuple 

 nameb :                         # A table of definitions; tables may nest.
 {
    name0: 456                   # Names are scoped; so the many different name0 variables in this example are distinct.
    name1: [7, 8, 9, 10 ]
    name2:
    {
      name0: 789
    }
 }

 namec : [ name0:{ a:1 b:2 } name1:{ a:3 c:4 } ]   # A sequence of tables.

 named : [] # An empty sequence
 namee : {} # An empty table

 namef : @nil   # An atomic value that is undefined.

Comments are delimited either using the shell style pound/hash character (#) or the C++ style //. In both cases, all text to the end of line is considered a comment; comments may start in any column and comments are permitted to start after valid document text. Trailing comments are permitted. If a comment prefix is found within a quoted string, it just a part of that string, not a comment prefix. The following characters, including the 2-character sequence ::, are reserved to FHiCL:

 , : :: @ [ ] { } ( )

In addition the following strings have special meaning to FHiCL:

  true, false, @nil, infinity, +infinity, -infinity, BEGIN_PROLOG, END_PROLOG

The first six strings are reserved identifiers when they are in lower cases and unquoted; the last two are only reserved identifiers when they are in upper case, unquoted and at the start of a line. Otherwise they are just strings. One may include the above reserved characters and identifiers in a string by quoting the string.

In the above, the choice of example names was deliberately boring; a FHiCL name may be any unique string that begins with a letter or an underscore and contains only letters, numbers and underscores; it may not contain whitespace. Names are case sensitive. FHiCL names are hierarchical and understand scope; so the top level name0 is distinguished nameb.name0 and both are distinguished from nameb.name2.name0

While FHiCL supports the underscore character as just another letter, art explicitly forbids the use of the underscore character in process names, module labels and data product instance names.

A value may be one of

an atom, a sequence of values, a table of definitions, a reference to a previously defined value

Atomic Values

An atom may be one of:

a number; a string; a boolean value; one of the reserved identifiers: true, false, @nil, infinity, +infinity, -infinity

By definition, atoms may not contain internal whitespace. Atoms are separated by white space or by one of the reserved to FHiCL delimiters discussed above. The six identifiers reserved to FHiCL must be in lower case and unquoted.

Numbers may be represented in the usual ways. For example,

  a : 100
  b : 100.
  c : 1.E2
  d : +1000.E-1

all mean the same thing. FHiCL has an internal sense of an atom that represents a mal-formed numeric value; if it detects such an atom it will throw an exception as soon as it encounters the atom. See below for some additional details about numeric values, including a discussion of signed and unsigned infinity.

The two boolean values are represented by the identifiers true and false; only lower case is recognized as a boolean value. There are no other valid representations of boolean values; while some languages permit numerical representations, FHiCL does not.

If a string contains no embedded whitespace and begins with a letter or underscore, the string may be unquoted; or it may be quoted as one prefers. If a string does contain embedded whitespace, or if it begins with some other type of character, it must be quoted. If an atom would trigger FHiCL's internal sense of a mal-formed numeric value, then it must also be quoted. A string may be quoted with single or double quotes; these differ in their treatment of escaped characters but that will not be important for Mu2e; you may read about it in the FHiCL documentation.

The case sensitive atomic value @nil means that the FHiCL name is present in the FHiCL document but it's value is undefined. In the C++ API, if one attempts to get the value of a parameter whose value is @nil, FHiCL will throw an exception. Mu2e uses @nil in the following use case. It often happens that a parameter set for some module has many values that are normally set by experts and also has one value that must be set by the end user. In this case it is often convenient to provide a default value for the parameter set in a PROLOG; the issue is how to deal with the parameter that must be set by the end user. One could simply leave it out but this does a bad job of documenting the parameter set. One could define it and comment it out. The recommended alternative is to provide the parameter and set its value to @nil; if the end user fails to properly set the parameter, the code that reads the parameter set will throw an exception.

Sequences

A FHiCL sequence is a list of values surrounded by square brackets; it uses a comma as an internal delimiter,

 namea : [ abc, def, ghi, 123 ]

Because FHiCL is not typed, sequences may mix numbers and strings (and more, see below). In C++ heterogeneous sequences are implemented as [0https://en.cppreference.com/w/cpp/utility/tuple std::tuple].

Tables

A FHiCL table is a collection of definitions that is surrounded by braces and internally delimited by whitespace:

 name :
 {
   name0 : 123
   name1 : 456.
 }

ReDefinitions

There are two forms of redefinition supported by FHiCL, a repeated definition within the same scope and a redefinition using the fully qualified name of a parameter.

Redefinitions at intermediate scope are not supported; they do not cause a parse error but they do produce undefined results.

If a definition is repeated twice within the same scope, the second definition will win. For example,

abc : 1
abc : 2
def : [ 1, 2, 3 ]
def : [ 4, 5, 6 ]
name : {
  abc : 1
  abc : 2
}

will result in the parameter abc having a value of 2 and def having a value of [4,5,6]. This works the same way for parameters whose value type is a table.

Because FHiCL is not strongly typed, the following is legal. But is not every likely to be interesting to Mu2e ( because the C++ API is strongly typed):

  abc : 1
  abc : { def : [ x, y, z ] }

The second form of redefinition is using the fully qualified name:

 a : {
    b : {
      c : 1
    }
 }
a.b.c : 2

This will be used frequently by Mu2e to make small changes to standard parameter sets.

The following is an example of a redefinition at intermediate scope, which is not supported by FHiCL,

 a : {
    b : {
      c : 1
    }
    b.c : 2  # Error!
 }

While this sometimes works, it should be avoided since it is not formally defined inside FHiCL.

Replacement also works for individual elements of a sequence, using the notation:

foo :{
  odd : [ 1, 3, 5, 7 ]
}
foo.odd[0] : 9       // Replace 1 with 9
foo.odd[4] : 11      // Extend the sequence with 11
foo.bar    : 42      // Add a new definition to the table foo.

After this substitution, the sequence foo.odd has the value [ 9, 3, 5, 7, 11] and the table foo contains two defintions, foo.odd and foo.bar.

FHiCL will let you do the following,

foo :{
  odd : [ 1, 3, 5, 7 ]
}
foo.odd[5] : 11   // Extend the sequence with two new members, @nil and 11.

After this substitution, the sequence foo.odd has the value [ 1, 3, 5, 7, @nil, 11 ]. This is well defined to FHiCL but it will cause problems for art, which will try to convert the @nil in element [4] to a numeric type, which will fail.

Numeric Values

There are two senses in which a value might be a valid numeric value. The first sense is what the FHiCL parser thinks it is; the second sense is whether or not FHiCL's C++ API can successfully convert that value to a C++ numeric type.

When FHiCL parses a document it looks at each atom and decides if it represents a numeric value. If an atom begins with a numeral or with +-. ( plus sign, minus sign, decimal point), then the atom is presumed to represent a numeric value. If, upon subsequent inspection, the atom is not a well-formed numeric value, then FHiCL will throw an exception. This exception is thrown by FHiCL, not by art. At this stage, FHiCL never tests quoted strings to see if they are well-formed or mal-formed numeric values; they are just strings. For example, when FHiCL parses the definition

a :  1a

it will decide that the value is a mal-formed numeric value and will throw an exception. On the other hand,

a :  "1a"

defines a value that is just a string type. FHiCL requires the concepts of a well-formed and mal-formed numeric values in order to reduce a document to its canonical form.

The C++ API for accessing FHiCL information is strongly typed; that is you must ask for a numeric value as one of the C++ numeric types, int, float, double and so on. The API will attempt to convert any atomic or string value to the requested integral or floating point type; if the conversion fails, the API will throw an exception. FHiCL will do the obvious type conversions for you; for example, you may ask for

name1 : 456.

as either an integral type or as a floating type; in both cases you will get the expected result.

There is one behaviour of the C++ API that must be called out clearly. Consider the definition,

a : 1.5

The C++ API will let a user ask for this value either as an integral type, as a floating point type or as an std::string. If you ask for it as an integral type the result will be 1. This is consistent with automatic type conversion within C++. We have asked the art development team to expand the C++ API to allow Mu2e select different behaviour: that, if we try to convert a numeric value with a non-zero fractional part to an integral type, it will either print a warning or throw an exception.

If the value of a defintion is @nil, an attempt to convert it to any numeric type will cause the API to throw an exception.

If the value of a definition is one of the reserved identifiers, infinity, +infinity, -infinity, the value may be converted to a floating point type but not to an integral type. The value of the floating point type will be the architecture dependent representation of signed infinity. If the architecture does not support such a representation, the code will throw an exception.

FHiCL only recognizes the decimal point, not the European style comma, as a delimiter for the integer and fractional part of of a floating point number; FHiCL does not support the use of a comma (or the European style decimal point) to delimit thousands, millions and so on.

Additional Information

Except for being a delimiter between definitions, whitespace is unimportant. Therefore the FHiCL fragments

 name :
 {
   name0 : 123
   name1 : 456.
 }
 namea : [ abc, def, ghi, 123 ]

could have been written as,

 name:{name0:123 name1:456.} namea:[abc,def,ghi,123]

The namec: line in the big example at the top of the FHiCL Basics section shows that a sequences and tables can be nested inside each other. They can be nested to arbitrary depth. The lines named and namee show that it is legal to define empty sequences and tables. Finally, the last line defines namef as a name that is present but has an undefined value.

FHiCL supports facilities to pre-define values so that they can be defined in one place and used later in multiple places. FHiCL also supports the ability to modify values, or subsets of values, after they have been defined. These will be discussed later.

One can see from the above that, after a FHiCL document has been parsed, the result is just a FHiCL table.

Configuration of a Module

From the point of view of a Mu2e physicist, art is the tool that drives the event loop and calls user code at the appropriate places in the event loop. User code is found in art modules and the following FHiCL fragment illustrates how to specify the run-time configuration an art module:

moduleLabel :{
  module_type : ClassName
  pname0 : 1234.
  pname1 :  [ abc, def]
  pname2 : {
       name0: {}
  }
}

A valid configuration for an art module is expressed as a FHiCL table. Within art, FHiCL tables are visible to the user as objects of type fhiclcpp/ParameterSet.h. From here forward, this document will usually refer to the run time configuration of a module as its parameter set, even when discussing the FHiCL table representation of that parameter set.

The moduleLabel is a FHiCL name, chosen by the Mu2e physicist. It must obey the rules for FHiCL names, be unique within the configuration of an art job and not contain the underscore character. In this context, the FHiCL name module_type is a identifier reserved to art and must be present in the configuration of a module; if it is absent, art will throw an exception. The FHiCL value ClassName is the name of the C++ class that holds the code that user wishes to execute. By convention, this code is found in a file, somewhere in the Mu2e Offline hierarchy, with the name ClassName_module.cc; the Mu2e build system will compile this file into a dynamic library named Offline/lib/libClassName_module.os. At run-time, art looks in the environment variable CET_PLUGIN_PATH to find a file named libClassName_module.os; it will load this dynamic library and find the code for the module inside.

At present there is one problem with this convention: if, in two different subdirectories, there are two modules with the same filename, both will produce libraries name lib/libClassName_module.os. A plan is in place to ensure that they make distinctly named libraries and for art to either unambiguously load the correctly library or issue a run-time diagnostic.

The remaining lines of the parameter set are just FHiCL definitions that will be formed into a fhicl::ParameterSet object and passed to the module as an argument in its constructor. The names in this parameter set are meaningful only to the module, not to art itself. While most of the early Mu2e examples only use parameters with atomic values, it is legal to use the full power of FHiCL within a parameter set; that is, the parameter set used to configure a module may include sequences and parameter sets nested to arbitrary depth.

The minimum legal configuration of a module is,

moduleLabel : { module_type : ClassName }

It is meaningful within one art configuration to define two modules that have the same ClassName that differ by some elements in the remainder of their configuration. These two instances of the module are distinguished by having different moduleLabels. This capability might be used if one wished to run the same algorithm twice in one job, perhaps once with loose cuts and once with tight cuts. Any data products produced by these two module instances will automatically be labeled in a way that distinguishes which module instance produced them. Any histograms or ntuples produced by these two module instances will automatically be put into separate ROOT directories; these directories are named using the moduleLabels.

Configuration of a Service

Art services behave like agents that manage a resource and allow other code to access that resource. There are some services that are native to art and others that are written by Mu2e; Mu2e uses services to manage, among other things, geometry and conditions information. There is more information available about services in art .

The following FHiCL fragment illustrates the run-time configuration of a service:

ClassName :{
  pname0 : 1234.
  pname1 :  [ abc, def]
}

As for a module, a service is configured with a FHiCL table that is seen by the C++ code as a fhicl::ParameterSet, passed as an argument to the constructor of the service class.

The FHiCL name ClassName is the name of the C++ class that implements the service. This class must live somewhere in either the art or Mu2e code bases as two files with filenames ClassName_service.cc and ClassName.hh. The mu2e build system will compile these files into a shared library with the name ClassName_service.so; the art build system will compile these files into a shared library with the name dir_subdir_ClassName_service.so, where the string dir_subdir, is the file system path from the root of art to ClassName_service.cc.

By definition there may be at most one instance of any service within an art job. Therefore the analog of a moduleLabel does not exist for services: the ClassName alone is sufficient to specify which service is requested.

services : {
  // ParameterSets for zero or more services,
  // both services defined by art and those defined by Mu2e
}

In this context, the FHiCL name services is an identifier reserved to art.

A valid run-time configuration must include a parameter set in the FHiCL table services for each Mu2e written service that will be used in the course of the art job. This is true even if the service has no run-time configurable parameters; in that case an empty parameter must be supplied in the .fcl file; art will not provide a default.

In earlier versions of art there was a convention that the service block should have the structure:

services : {
  // ParameterSets for zero or more art defined services.
  user : {
     // ParameterSets for zero or more Mu2e defined services
  }
}

This experiment failed and this style is deprecated.

Overall Structure of an art Run-time Configuration

The example below illustrates the top level view of an art run-time configuration; some details have been omitted for clarity. An art run-time configuration is just a FHiCL document. At present they live in simple files but a future option is that they might live in databases.

In the following, the identifiers highlighted in red are reserved to art; these are in addition to the identifier module_type discussed earlier and the identifier @local:: discussed later.

process_name : helloWorld      # The process name must NOT contain any underscores
 
source : {
   # Parameters for exactly one source module
}
 
services : {
   # ParameterSets for zero or more services.
}
 
physics: {
 
  producers : {
     # ParameterSets for zero or more producer modules
  }
  analyzers: {
     # ParameterSets for zero or more analyzer modules
  }
 
  filters : {
     # ParameterSets for zero or more filter modules
  }
 
  # "trigger" paths:
  path0 : [  comma separated list of module labels of producer or filter modules  ]
  path1 : [  comma separated list of module labels of producer or filter modules  ]
 
  # "end" paths:
  path2 : [  comma separated list of module labels of analyzer or output modules  ]
  path3 : [  comma separated list of module labels of analyzer or output modules  ]
 
  trigger_paths: [ path0, path1 ]  # Optional!
  end_paths:     [ path2, path3 ]  # Optional!
}
 
outputs: {
  # ParameterSets for zero or more output modules
}

The parameter process_name identifies this art job. It is used as part of the identifier for data products produced in this job. For this reason, the process name may not contain underscore characters. If the process_name is absent, art substitutes a default value of "DUMMY".

The source parameter set describes where events come from. There may be at most one source module declared in an art configuration. At present there are four options for choosing a source module:

module_type : RootInput
art::Events will be read from an input file or from a list of input files; files are specified by giving their pathname within the file system. In the future Mu2e will support a file catalog but, at present, there is no such system.
module_type : EmptyEvent
Internally art will start the processing of each event by incrementing the event number and creating an empty art::Event. Subsequent modules then populate the art::Event. This is the normal procedure for generating simulated events.
One of the Mu2e written source modules found in Mu2e Offline in Sources/src/*_source.cc .
A custom source module that is used in the trigger; it copies events from the live Data Acquisition (DAQ) system and makes them available inside the art process that run the trigger code.

See the web page about configuring input and output modules for details about what other parameters may be supplied to these parameter sets. If no source parameter set is present, art substitutes a default parameter set of:

source : {
  module_type : EmptyEvent
  maxEvents : 1
}

The configuration of art services was discussed above. If the services parameter set is missing entirely, art will supply a default that provides a default configuration for the message logger.

If an art-supplied service is requested by the code, and if there is no corresponding parameter set in the .fcl file, then art will supply a default parameter set. If a Mu2e-defined service is requested by the code, and if there no corresponding parameter set in the .fcl file, then art will throw an exception.

Some of the art-supplied services can be turned on from command line switches; these include a service to trace all of the module and service calls made by art, a service to present timing information and a service to profile memory usage. When the command line switch is present, the service need not be included in the run-time configuration; art will supply defaults if needed. There is additional information available about the behaviour of the art-supplied services ; this discusses their default parameter sets and how to request them from the command line.

The physics parameter parameter set has nine identifiers, filters, analyzers, producers, path0, path1, path2, and path3, trigger_paths and end_paths. The names in red are special names defined by art. The first three red parameters must have values that are FHiCL tables of parameter sets; they define how each module is configured. The names in blue are user defined: they are called "paths" and they must be sequences of module labels; they define what modules will be executed by this job and they provide constraints on the order in which modules are executed. The last two red parameters must have values that are FHiCL sequences of path names. More details about paths, trigger_paths and end_paths are in the section on paths.

The physics.producers parameter set should contain parameter sets that are used to configure EDProducer modules. Similarly the physics.analyzers and physics.filters parameter sets should hold the configuration information for the EDAnalyzer and EDFilter modules, respectively. If you cross-stitch the configuration, for example, putting an analyzer module into the physics.producers parameter set, art will catch the mistake at job startup and will throw an exception.

If the physics parameter set, or any of its components are missing, art will substitute a default value of an empty parameter set or an empty sequence, as appropriate.

The final element in a run-time configuration is the outputs parameter set. It contains parameter sets that configure zero or more output modules. The rules to send one subset of the events to one output file and a different subset of the events to a different output file are described on the web page that discusses paths in more detail.

If the outputs parameter set is missing, art will supply a default value of an empty parameter set.

Paths: Defining What Modules will be Run

The key to understanding what a given art job will do is understanding paths. There are two kinds of paths: "trigger" paths and "end" paths, both of which are sequences of module labels.

When art parses the physics parameter set it finds all top level identifiers, including the 5 reserved identifiers, and it tries to interpret each remaining identifier as the definition of a path. If any of these identifiers cannot be successfully interpreted as a path, art will throw an exception. To be interpreted as a path the following must be true:

The value of the definition must be a sequence.
All identifiers in the sequence must be module labels defined in one of the producers, analyzers, filters or outputs parameter sets.
A valid trigger path may only contain the module labels for producer and filter modules
A valid end path may only contain the module labels of analyzer and output modules.

A sequence of module labels that is neither a trigger path nor an end path is illegal.

The key distinction between trigger paths and end paths is this: modules found in trigger paths may add information to the art::Event but modules found in end paths may not.

Behaviour Without trigger_paths or end_paths

To understand order of execution, first consider the case that neither the trigger_paths nor the end_paths parameter is defined. In that case art will execute all of modules in all of the trigger paths, subject to the following rules:

Within each trigger path art guarantees that the module labels will be executed in the order specified by the trigger path.
If the same module label appears in more than one trigger path, art knows that it only needs to be executed once
If art detects that one path requires two module labels to be executed in the order (a,b) but another path requires the order (b,a) then it is a configuration error; art will detect this at job startup and throw an exception.
art makes no guarantees about the order in which trigger paths will be executed.

Once the above work is completed, art will execute all of the modules found in all of the end_paths, subject to these rules:

art is free to execute these modules in any order
If the same module label appears in more than one end path, art knows that it only needs to be executed once

A corollary is that a module label that is defined in the fcl file but is not present in any path is pruned from the configuration and is silently ignored. Such modules are never instantiated and never run. If you look at the output of "art --help" you will find command line options to modify this behaviour.

A cartoon picture of art internals is that it creates an internal object called a "schedule" which is an ordered list of module labels, from all paths, that satisfies all of the above constraints. For each event, art executes the modules in the order given by the schedule. Note that the schedule may mix together modules from different paths so long as the final result satisfies the above constraints. It is a mistake to think of art as executing the first trigger path, then the second and so on; that may be what art does today but it may do something else tomorrow.

Behaviour with trigger_paths or end_paths

If the trigger_paths parameter is defined, its value must be a sequence in which each item is the name of a valid trigger path. If a name in the sequence is not the name of a valid trigger path, then it is a configuration error and, at job start time, art will throw an exception. When trigger_paths is defined art will do much the same as described above with the exception that it will only execute the modules specified by the trigger paths paths requested by trigger_paths. Any trigger path defined in the fcl file but not referenced by trigger_paths will be pruned from the configuration. The order of trigger paths specified in trigger_paths has no meaning.

If end_paths parameter is present, the same story holds for end paths.

The summary of this behaviour is that neither trigger_paths nor end_paths are required but, if present, they are respected.

Command Line Arguments

Some elements of an art run-time configuration may be overridden by parameters that appear on the art command line. To see what the options are:

mu2e --help

At this writting (Sept 2011), the allowed command line parameters are:

mu2e <-c <config-file>> <other-options> [<source-file>]+:
  -T [ --TFileName ] arg   File name for TFileService.
  -c [ --config ] arg      Configuration file.
  -e [ --estart ] arg      Event # of first event to process.
  -h [ --help ]            produce help message
  -n [ --nevts ] arg       Number of events to process.
  --nskip arg              Number of events to skip.
  -o [ --output ] arg      Event output stream file.
  -s [ --source ] arg      Source data file (multiple OK).
  -S [ --source-list ] arg file containing a list of source files to read, one
                           per line.
  --trace                  Activate tracing.
  --notrace                Deactivate tracing.
  --memcheck               Activate monitoring of memory use.
  --nomemcheck             Deactivate monitoring of memory use.

Current versions of art support many more command line options. To learn what's available for the version of art that you are using, type the following command:

mu2e --help

which may be abbreviated to:

mu2e --h

All command line parameters are optional and many have both a short and a long form. In general the command line parameters can modify the names of files, the flow of the event loop and whether or not some monitoring services are enabled. If a parameter is specified both within the .fcl file and on the command line, the command line value takes precedence.

In the original design of art, it was planned that no parameters that control the physics behaviour would be exposed on the command line; they would only be modifiable by editing the .fcl file. It was planned that file names, debug levels and other parameters that do not change the physics behaviour would be modifiable from the command line. This restriction was chosen because of the then existing ideas about how to maintain an audit trail for run-time configurations. Recently the NOvA experiment, who also uses art, asked to remove this restriction; the plan to retain a strict audit trail is that the final state of the .fcl file, after all command line substitutions, will be stored in each of the event-data output files.

It is not clear yet if Mu2e will stick with the original plan or if we will choose to follow the route chosen by NOvA. Another option is to define the idea of an art "production mode". The idea behind production mode is that there are many convenience functions that are valuable for development and debugging but which make it difficult to maintain an audit trail. Among these convenience functions is the ability to override an arbitrary .fcl parameter from the command line. One could imagine that, when production mode is not set, all of these convenience functions would be available but, when production mode is set, they would be disabled.

The Canonical Form and the Hash Code

When FHiCL prints a document to the screen or to a file the output appears in what's known as the canonical form, in which the source formatting is entirely lost. In this form all comments are stripped, there is standardized indentation, and all strings, even atomic strings, are double quoted. Because the order of definitions is not important, they will appear in the canonical form in some well defined order meaningful to FHiCL; in general this order is not that in which they appeared in the input file.

In a FHiCL document, if a name is redefined, only the final definition will be present in the canonical form. All earlier definitions will be lost.

The other aspect of the canonical form is that all atoms that are identified by FHiCL as well-formed numeric types will be represented in a FHiCL-defined form. This form may be different than the form that appeared in the input file; but it will convert to the same bit pattern when converted to a numeric type. For example if the source file contains the definition,

a : 123.45

The canonical form will contain:

a : 1.e2345e2

The canonical form of a floating point number is scientific notation with enough digits to guarantee no loss of precision.

Other examples of source formats are,

   b: 1
   c: 1.
   d: 1
   e: +1
   f: 1.23e2
   g: 1.23e3

These will have the canonical forms:

   b: 1
   c: 1
   d: 1
   e: 1
   f: 123
   g: 1230

All of b through e have the same canonical form. One rule is that if a meaningless fractional part or sign is present, it will not appear in the canonical form. A second rule is that if a number is represented in the source format in scientific notation, and if that number is representable, without loss of precision, as an integer less than 999999, then the number will be represent as an integer; this rule is illustrated in items f and g.

Once a configuration has been reduced to its canonical form, one can hash that form to compute an almost-certainly-unique key. See the FHiCL documentation for details. One can use the hash codes as a short cut to ask if two configurations are the same or different.

Why bother with a canonical form? In particular, why bother with a canonical form for numeric values?

Many previous experiments have discovered that configuration files get handed around. In the process, trival changes accumulate. For example a definition that starts life as

a: 1

May end up a few generations later as

a: 1.0

This has the consequence that two configurations that really do the same physics produce different hash codes. If the configurations, and/or their hash codes, are tracked as part of the meta-data, this makes it tedious to identify files that were produced using the same configuration.

While having a canonical form for numeric values will certainly raise new issues, we expect it to be a net reduction in the complexity of tracking meta-data.

One issue currently under study is this: consider the task of running 1000 Monte Carlo jobs that differ only in their random number seeds and the in the names of their output files. Can we arrange things such that the properties of the ensemble of jobs are represented in a single FHiCL document while the details of the different jobs are represented in some other way? If we can do this, then all files produced by this set of jobs will have a configuration that has the same hash code as every other job in the set. At present this will not work because of the way that random number seeds are distributed.

Printing the Canonical Form

You can tell art to process the fcl file, print the canonical form of the resulting configuration and stop execution:

mu2e -c file.fcl --debug-config file_expanded.fcl

This will process the file named with the -c argument and print the canonical form of the resulting configuration to the file named as the value of the --debug-config argument. The output will include the effect of any command line arguments and will include any default values that are supplied by art.

In the early versions of art this feature was invoked using the environment variable ART_DEBUG_CONFIG; that method is no longer supported.

@local and @table

FHiCL has the ability to define a value as a reference to a previously defined value. Consider the following FHiCL fragment,

foo : 5
source : {
  module_type : EmptyEvent
  maxEvents : @local::foo
}

This fragment tells art to process a maximum of 5 events. If you print the canonical form of this parameter set, the @local::foo has entirely disappeared and has been replaced by 5. At this stage FHiCL no longer has knowledge of where the 5 came form and is equivalent to:

source : {
  module_type : EmptyEvent
  maxEvents : 5
}

The identifier @local:: is reserved to FHiCL. When FHiCL encounters a value defined by the syntax @local::foo it will look in its current file for a top level object named foo and it replace @local::foo with the value found in the earlier definition. This works for all kinds of values, atomic, sequences or tables.

@table:: is very similar except the substitution removes the brackets when the item is a table:

BEGIN_PROLOG
locvarname : 5
locname : {
  a : avalue
}
tabname : {
  b : bvalue
}

END_PROLOG

stanza : {
    locvar   : @local::locvarname
    locstanza: @local::locname  
    tabstanza : {
	@table::tabname  
    }
}

resolves to :

stanza: {
   locvar: 5
   locstanza: {
      a: "avalue"
   }
   tabstanza: {
      b: "bvalue"
   }
}

PROLOGs and #includes

FHiCL supports an include mechanism that behaves much like that of the C Preprocessor. Suppose that the file defaults.fcl contains:

BEGIN_PROLOG
g4run_default :
{
  module_type          : G4
  generatorModuleLabel : generate
  seed                 : [9877]
}
END_PROLOG

For the moment, ignore the BEGIN_PROLOG and END_PROLOG lines. After creating this file, one may write the following fragment as part of a run-time configuration,

#include "defaults.fcl"

physics : {
  producers : {
     g4run : @local::g4run_default
  }
}

FHiCL is forgiving about superfluous white space within an include statement. Using includes provides a mechanism to distribute a standard configuration for g4run that can be used by many people.

The purpose of the PROLOG markers is to tell FHiCL that the material inside the PROLOG is not part of the final FHiCL document; it is merely a collection of some useful definitions that may or may not be used. Therefore FHiCL excludes the prolog from the document that it sends to art. If one prints the canonical form of the final document, the PROLOG is absent.

One could have chosen to move the BEGIN/END_PROLOG from the included file to surrounding the #include statement in the top level file. FHiCL will be happy with either; the current recommendation is to put them inside the included file. This makes clear that the purpose of the included file is to be a PROLOG.

There may be many BEGIN/END_PROLOG sections within one .fcl file but they may not be nested. Includes may be nested so long as they do not cause PROLOGS to become nested.

Together the features, @local::, redefinition, PROLOGS and #include will allow Mu2e to create standard configurations and to express most actual jobs as one of the standard configurations plus a small collection of deltas. We hope that this will make it easier to understand what any given job actually did.

The C++ API (Validated FHiCL)

The following fragment shows how to read FHiCL parameters into a C++ program. For a long time there was no automated validation of fcl and it was easy to fcl to fail silently. For example if we try to set a parameter "abc" and accidentally write "adc", this will not be caught. The fcl will be interpreted and not setting "abc" and the line that was there ("adc") will be ignored. To close this problem in 2018, fcl validation was introduced. Mu2e will write all new fcl interfaces using this validation, described in this section, but the older system is still common, and is described in the following section.

The documentation on validated fhicl from the art team is available at: https://cdcvs.fnal.gov/redmine/projects/fhicl-cpp/wiki/Configuration_validation_and_fhiclcpp_types.

The following is an example of the c++ interface.

  class MyModule : public art::EDAnalyzer {
  public:

    struct Config {
      using Name=fhicl::Name;
      using Comment=fhicl::Comment;
      fhicl::Atom<int> verbose{Name("verbose"), Comment("verbosity level (0-10)"), 0};
      fhicl::Atom<double> tmin{Name("tmin"), Comment("min time cut"), 500.};
      fhicl::Atom<art::InputTag> input{ Name("input"), Comment("Tag of the product to analyze.")};
      // sequence of any length (no default)
      Sequence<int> ilist{Name("ilist"),Comment("list if ints")};
      // sequence of fixed length, any other length is an error (default is all zero)
      Sequence<double,3u> point{Name("point"),Comment("[x,y,z] as doubles"),{0.0,0.0,0.0}};
      OptionalAtom<std::string> message { Name("message"), Comment("print message"), "" };
    };

    # the following line is needed to enable art --print-description
    typedef art::EDAnalyzer::Table<Config> Parameters;

    explicit MyModule(const Parameters& conf);
    void analyze(const art::Event& evt) override;
  private:
    Config _conf;
    int _verbose;
  };

  //================================================================
  MyModule::MyModule(const Parameters& conf):
     _conf(conf()),_verbose(conf().verbose()) {
    ...
  }

  //================================================================
  void MyModule::analyze(const art::Event& event) {
    auto ih = event.getValidHandle<SimParticle>(_conf.input());
    if(_conf.verbose()>1) print ..
    CLHEP::Hep3Vector loc( _conf.point(0), _conf.point(1), _conf.point(2),);
  }

In this example, art has extracted the parameter set for this module from the .fcl file and it is passed to the constructor as the variable conf. In this example, conf is saved as a member since it is a reasonable container for the fcl parameters. At the same time, it shows that you can access the parameters immediately, in the constructor, as the "verbose" member is set.

The first argument to the Atom constructors is the name as it will appear in the fcl. The second is a description of the parameter which will be printed with the art --print-description command line option. Some of the parameters here have defaults, which are the third argument to the Atom constructor.

Parameters are required by default, so it is an error if they are missing, but if it should be optional, the parameter can be switched to OptionalAtom and it won't be an error if it is missing. Sequences may be of fixed length, in which case any other length is an error, or arbitrary length.

FIXME

The line that initializes the member datum _diagLevel asks pset to find a parameter whose name is diagLevel, to convert it to an int and to return the value of that int; if pset cannot find a parameter named diagLevel, pset will return the value of the second argument to the get call, in this case 0. If the parameter exists and conversion to an int fails, then art will throw an exception. This can happen if the FHiCL parameter diagLevel has a value that is a string that is not a valid numeric value or if its value is @nil. There is no requirement that the name of the member datum, _diagLevel, match the name of the item in the parameter set; but it seems silly if they do not match.

Similarly, the next line looks for a FHiCL name "g4ModuleLabel" and attempts to convert it to a string. In this case there is no second argument to the get call; therefore, if "g4ModuleLabel" is not found in the parameter set, art will throw an exception.

The decision of whether or not to provide a default value is very important. We recommend that if a parameter influences physics behaviour and it is likely to be set by normal users, then that parameter should NOT have defaults. If a parameters influences only diagnostics, it should have defaults. The more difficult question is that some parameters do influence physics behaviour but should only be modified by experts; here a case can be made for providing defaults in the code:

If we expose the parameter, non-experts might be tempted to play with it.
If the configurations become too big they become too hard to manage.

There are other more complex options in the validated fcl system. For example, some parameters can be excluded from validation. A tuple type is available.

Here are pointers to the documentation:

art wiki
FHICL wiki
page about converting FHICL into "nearly arbitrary C++ type"
An example in mu2e code: Analyses/src/SimParticleTimeMapAnalyzer_module.cc

If your code, which you would like to validate, is part of a larger fcl stanza which is not validated, then there are additional tricks that can be employed. For an example, see retrieveConfiguration() function in EventMixing/src/ResamplingMixer_module.cc. See also the function of the same name in this example.

The Deprecated C++ API

All new Mu2e code should be written with fcl validation, but as long as the older unvalidated system is still common, we include this section for reference.

The following fragment abstracted from Analyses/src/ReadBack_module.cc shows how to read FHiCL parameters into a C++ program.

namespace mu2e {
 
  class ReadBack : public art::EDAnalyzer {
 
  public:
    explicit ReadBack(fhicl::ParameterSet const& pset);
 
  private:
    int         _diagLevel;
    std::string _g4ModuleLabel;
    std::string _generatorModuleLabel;
  };
 
  ReadBack::ReadBack(fhicl::ParameterSet const& pset) :
    _diagLevel           (pset.get<int>("diagLevel",0)),
    _g4ModuleLabel       (pset.get<std::string>("g4ModuleLabel")),
    _generatorModuleLabel(pset.get<std::string>("generatorModuleLabel")){
  }
}

In this example, art has extracted the parameter set for this module from the .fcl file and it is passed to the constructor as the variable pset. The line that initializes the member datum _diagLevel asks pset to find a parameter whose name is diagLevel, to convert it to an int and to return the value of that int; if pset cannot find a parameter named diagLevel, pset will return the value of the second argument to the get call, in this case 0. If the parameter exists and conversion to an int fails, then art will throw an exception. This can happen if the FHiCL parameter diagLevel has a value that is a string that is not a valid numeric value or if its value is @nil. There is no requirement that the name of the member datum, _diagLevel, match the name of the item in the parameter set; but it seems silly if they do not match.

Similarly, the next line looks for a FHiCL name "g4ModuleLabel" and attempts to convert it to a string. In this case there is no second argument to the get call; therefore, if "g4ModuleLabel" is not found in the parameter set, art will throw an exception.

The decision of whether or not to provide a default value is very important. We recommend that if a parameter influences physics behaviour and it is likely to be set by normal users, then that parameter should NOT have defaults. If a parameters influences only diagnostics, it should have defaults. The more difficult question is that some parameters do influence physics behaviour but should only be modified by experts; here a case can be made for providing defaults in the code:

If we expose the parameter, non-experts might be tempted to play with it.
If the configurations become too big they become too hard to manage.

The following presents a few more examples. The parameters

  intArray    : [ 1,  2,  3  ]
  doubleArray : [ 1., 2., 3. ]
  stringArray : [ "foo", "bar" ]
  paramSet    : { foo : 1 bar : 2 }

can be read into a C++ program with the following code fragments:

  std::vector<int>          iArray( pset.get<std::vector<int> >("intArray"));
  std::vector<double>       dArray( pset.get<std::vector<double> >("doubleArray"));
  std::vector<std::string>  sArray( pset.get<std::vector<std::string> >("stringArray"));
  fhicl::ParameterSet       params( pset.get<fhicl::ParameterSet>("paramSet"));

In all cases it is possible to provide default values as a second argument to the get method. At present there is no accessor method to let you get one element of a sequence; you must get the entire sequence.

It is the intention of the art team that an arbitrary class T can be initialized by,

  T t( pset.get<T>("Name"));

The precise details of how to make this work remain to be specified.

The argument pset passed to the constructor of a module or a service is a temporary constructed by art that will go out of scope soon after the return from the constructor. If one wishes to retain pset as member data in a module, it must be held by making a copy, not by holding a pointer or reference to the argument pset.

Utilities

HELP! What command line options can I give to the mu2e program?

The following command will print all of the command line options that art knows about:

mu2e --help

Note that different versions of art may print different information.

HELP! How do I learn what fcl parameters are required by a module or service?

For most modules and services you can learn about the required and optional fcl parameters by giving the command:

mu2e --print-description module_class_name
mu2e --print-description service_class_name

For example.

mu2e --print-description EmptyEvent
mu2e --print-description TFileService

This information is available for all modules and services that are provided by art and by some modules and services that are part of Mu2e Offline. Why only some? The information is only available if the module or service uses the Validated FHiCL C++ API; some Mu2e modules, and most services, were originally written using the deprecated version of the C++ API and have not yet been updated to use the Validated FHiCL C++ API. If you find a module or service that is missing this information, and you want to use it, please let us know; even better, update the source code yourself and submit a pull request.

HELP! What fcl am I actually running and where did the parameters come from?

You can get art to print the final fully-interpreted fcl commands written to a file. This is useful to see what is happening with all the includes and "@local" and "@table" substitutions.

mu2e --debug-config fhicl_debug.txt -c myconfig.fcl --annotate

or

 fhicl-dump --lookup-policy after1 --annotate myconfig.fcl > fhicl_debug.txt

If the source annotation is too distracting and you just want to see the final fcl, remove the --annotate option. There is an option to display the annotation in a slightly different format, --prefix-annotate .

HELP! What modules and services are available for me to use?

You can get a list of all modules, services etc known to art by using the following commands:

mu2e --print-available source 
mu2e --print-available module --status-bar
mu2e --print-available service
mu2e --print-available tool

See the output of mu2e --help for other options. It takes a long time to find all modules so I added the --status-bar to the example to let you follow the progress.

The --print-available command will traverse the directories specified by the environment variable CET_PLUGIN_PATH to find all files that have names ending in _source.so, _module.so and so on.

For historical reasons the commands --print-available-modules and --print-available-services are synonyms for two of the above commands.

HELP! What does this fcl actually do?

Given a fcl file that is made from deeply nested includes and expands to more than 10,000 lines, it can be difficult to understand what the file actually does. An example of such a file is Validation/fcl/reco.fcl which runs many of the paths that make up the trigger plus the main conversion electron reconstruction path.

The rest of this section presumes a basic familiarity with the concept of art paths and of the parameters physics.trigger_paths and physics.end_paths. These are described in the section on Overall Structure of an art Run-time Configuration.

The first thing to try is to use --debug-config to expand the file. Look at the file to see if it contains definitions of trigger_paths and end_paths in the physics parameter set. If these are present, you can read off the paths that will be run by the job and you can look for the definition of each path to find out which modules will be run.

However trigger_paths and end_paths are optional elements and if they are omitted then the above solution fails. In that case you can use the command

mu2e --config-summary=detailed -c file.fcl

This will print summary information about the fcl file, including the names of all trigger paths. You can then use --debug-config to expand the original file and find the definitions of each path.

The command line option has 3 levels of verbosity: brief, detailed, full. You can experiment with these on the file Validation/fcl/reco.fcl, which, at this writing as 23 trigger paths and no trigger_paths.

HELP! Which module is being called?

Running with the switch mu2e --trace will cause a line to printed as each module is called. This can be a quick way to see what module is printing an unfamiliar line, or confirming the modules that are being called.

HELP! Which modules ran and how much time / memory did each use?

A summary of what modules ran and filter results:

services.scheduler.wantSummary: true

A summary of the time that each modules used

services.TimeTracker.printSummary: true

It is also possible to get a database of time and memory usage, recorded for each event and module, into a database. --timing-db timing.db --memcheck-db memory.db

Then art will create a file that contains an sqlite database containing information about the time or memory taken by each call to each module so you can do things like plot histograms of execution time per module, identify events that took a long time etc. Memory values are in MB.

There are some instructions on how to query the database, examples (may not be working) in $ART_DIR/tools/sqlite, and a cheat sheet below.

sqlite3 memory.db
sqlite> .help
sqlite> .tables
sqlite> .schema <tablename>;
sqlite> select * from <tablename>;
sqlite> .quit

sqlite3 memory.db
sqlite> .output dump.txt
sqlite> select * from ModuleInfo;
sqlite> .output stdout
sqlite> .quit

FclIntro: Difference between revisions

Revision as of 04:49, 28 September 2023

Contents

Introduction

FHiCL Basics

Atomic Values

Sequences

Tables

ReDefinitions

Numeric Values

Additional Information

Configuration of a Module

Configuration of a Service

Overall Structure of an art Run-time Configuration

Paths: Defining What Modules will be Run

Behaviour Without trigger_paths or end_paths

Behaviour with trigger_paths or end_paths

Command Line Arguments

The Canonical Form and the Hash Code

Printing the Canonical Form

@local and @table

PROLOGs and #includes

The C++ API (Validated FHiCL)

The Deprecated C++ API

Utilities

HELP! What command line options can I give to the mu2e program?

HELP! How do I learn what fcl parameters are required by a module or service?

HELP! What fcl am I actually running and where did the parameters come from?

HELP! What modules and services are available for me to use?

HELP! What does this fcl actually do?

HELP! Which module is being called?

HELP! Which modules ran and how much time / memory did each use?

Navigation menu

FclIntro: Difference between revisions

Revision as of 04:49, 28 September 2023

Introduction

FHiCL Basics

Atomic Values

Sequences

Tables

ReDefinitions

Numeric Values

Additional Information

Configuration of a Module

Configuration of a Service

Overall Structure of an art Run-time Configuration

Paths: Defining What Modules will be Run

Behaviour Without trigger_paths or end_paths

Behaviour with trigger_paths or end_paths

Command Line Arguments

The Canonical Form and the Hash Code

Printing the Canonical Form

@local and @table

PROLOGs and #includes

The C++ API (Validated FHiCL)

The Deprecated C++ API

Utilities

HELP! What command line options can I give to the mu2e program?

HELP! How do I learn what fcl parameters are required by a module or service?

HELP! What fcl am I actually running and where did the parameters come from?

HELP! What modules and services are available for me to use?

HELP! What does this fcl actually do?

HELP! Which module is being called?

HELP! Which modules ran and how much time / memory did each use?

Navigation menu

Search