FclIntro: Difference between revisions
No edit summary |
|||
Line 990: | Line 990: | ||
fhicl::Atom<double> tmin{Name("tmin"), Comment("min time cut"), 500.}; | fhicl::Atom<double> tmin{Name("tmin"), Comment("min time cut"), 500.}; | ||
fhicl::Atom<art::InputTag> input{ Name("input"), Comment("Tag of the product to analyze.")}; | fhicl::Atom<art::InputTag> input{ Name("input"), Comment("Tag of the product to analyze.")}; | ||
// sequence of any length (no default) | |||
Sequence<int> ilist{Name("ilist"),Comment("list if ints")}; | Sequence<int> ilist{Name("ilist"),Comment("list if ints")}; | ||
// sequence of fixed length, any other length is an error (default is all zero) | |||
Sequence< | Sequence<double,3u> point{Name("point"),Comment("[x,y,z] as doubles"),{0.0,0.0,0.0}}; | ||
OptionalAtom<std::string> message { Name("message"), Comment("print message"), "" }; | OptionalAtom<std::string> message { Name("message"), Comment("print message"), "" }; | ||
}; | }; |
Revision as of 18:53, 7 October 2019
Introduction
This page will explain the system used by art to provide run-time configuration; that is, it will explain what you see inside the .fcl files.
Run-time configuration for art is written in the Fermilab Hierarchical Configuration Language ( FHiCL, pronounced "fickle"), a language that was developed at Fermilab to support run-time configuration for several projects, including art. By convention, the names of FHiCL files end in .fcl. The FHiCL documentation is still under development but draft copies of two documents are available; in addition the source code and development notes are also available via redmine:
FHICL-CPP is the C++ toolkit that we use to read FHiCL documents within art.
This page will discuss both features that are part of the FHiCL language itself and features that are defined by art. A few features of FHiCL that are not important for Mu2e will be skipped over.
At present, FHiCL documents are just .fcl files in the file system. In the future there will be an option to store FHiCL documents in a database and for art programs to access them from that database; this will aid in maintenance of the audit trail of which file was produced using which code with which configuration.
FHiCL Basics
An art run-time configuration is just a FHiCL document. A FHiCL document is a file that contains a collection of definitions of the form
name : value
where many types of values are possible, from simple atomic values to highly structured values; a value may also be a reference to a previously defined value. The white space on either side of the : is optional; whitespace is defined to be any of the space, tab, newline or carriage return characters. Two definitions are separated by any whitespace. FHiCL provides a C++ API (see below) to allow users to ask for values by name.
The fragment below will be used to illustrate some of the basics of FHiCL:
# A comment. // Also a comment. name0 : 123 # A numeric value. Trailing comments work too. name00 : "A quoted comment prefix, # or //, is just part of a quoted string, not a comment" name1:456. # Another numeric value; whitespace is not important within a definition name2 : -1.e-6 name3 : true # A boolean value NAME3 : false # The other boolean value; names are case sensitive. name4 : red # Simple strings need not be quoted name5 : "a quoted string" name6 : 'another quoted string' name7 : 1 name8 : 2 # Two definitions on one line, separated by whitespace. name9 # Same as name9:3 ; newlines are just whitespace, which is not important. : 3 namea : [ abc, def, ghi, 123 ] # A sequence of atomic values. FHiCL allows heterogeneous sequences. # But heterogeneous sequences are not usable via the [[#api|C++ API]] . nameb : # A table of definitions; tables may nest. { name0: 456 name1: [7, 8, 9, 10 ] name2: { name0: 789 } } namec : [ name0:{ a:1 b:2 } name1:{ a:3 c:4 } ] # A sequence of tables. named : [] # An empty sequence namee : {} # An empty table namef : @nil # An atomic value that is undefined.
Comments are delimited either using the shell style pound/hash character (#) or the C++ style //. In both cases, all text to the end of line is considered a comment; comments may start in any column and comments are permitted to start after valid document text. Trailing comments are permitted. If a comment prefix is found within a quoted string, it just a part of that string, not a comment prefix. The following characters, including the 2-character sequence ::, are reserved to FHiCL:
, : :: @ [ ] { } ( )
In addition the following strings have special meaning to FHiCL:
true, false, @nil, infinity, +infinity, -infinity, BEGIN_PROLOG, END_PROLOG
The first six strings are reserved identifiers when they are in lower cases and unquoted; the last two are only reserved identifiers when they are in upper case, unquoted and at the start of a line. Otherwise they are just strings. One may include the above reserved characters and identifiers in a string by quoting the string.
In the above, the choice of example names was deliberately boring; a FHiCL name may be any unique string that begins with a letter or an underscore and contains only letters, numbers and underscores; it may not contain whitespace. Names are case sensitive. FHiCL names are hierarchical and understand scope; so the top level name0 is distinguished nameb.name0 and both are distinguished from nameb.name2.name0
While FHiCL supports the underscore character as just another letter, art explicitly forbids the use of the underscore character in process names, module labels and data product instance names.
A value may be one of
an atom, a sequence of values, a table of definitions, a reference to a previously defined value
Atomic Values
An atom may be one of:
a number; a string; a boolean value; one of the reserved identifiers: true, false, @nil, infinity, +infinity, -infinity
By definition, atoms may not contain internal whitespace. Atoms are separated by white space or by one of the reserved to FHiCL delimiters discussed above. The six identifiers reserved to FHiCL must be in lower case and unquoted.
Numbers may be represented in the usual ways. For example,
a : 100 b : 100. c : 1.E2 d : +1000.E-1
all mean the same thing. FHiCL has an internal sense of an atom that represents a mal-formed numeric value; if it detects such an atom it will throw an exception as soon as it encounters the atom. See below for some additional details about numeric values, including a discussion of signed and unsigned infinity.
The two boolean values are represented by the identifiers true and false; only lower case is recognized as a boolean value. There are no other valid representations of boolean values; while some languages permit numerical representations, FHiCL does not.
If a string contains no embedded whitespace and begins with a letter or underscore, the string may be unquoted;
or it may be quoted as one prefers. If a string does contain embedded whitespace, or if it begins with some
other type of character, it must be quoted.
If an atom would trigger FHiCL's internal sense of a mal-formed numeric value, then it must also be quoted.
A string may be quoted with single or double quotes; these
differ
in their treatment of escaped characters but that will not be important for Mu2e; you may read about it in the FHiCL documentation.
The case sensitive atomic value @nil means that the FHiCL name is present in the FHiCL document but it's value is undefined. In the C++ API, if one attempts to get the value of a parameter whose value is @nil, FHiCL will throw an exception. Mu2e uses @nil in the following use case. It often happens that a parameter set has many values that are normally set by experts and also has one value that must be set by the end user. In this case it is often convenient to provide a default value for the parameter set in a PROLOG; the issue is how to deal with the parameter that must be set by the end user. One could simply leave it out but this does a bad job of documenting the parameter set. One could define it and comment it out. The recommended alternative is to provide the parameter and set its value to @nil; if the end user fails to properly set the parameter, the code that reads the parameter set will throw an exception.
Sequences
A FHiCL sequence is a list of values surrounded by square brackets; it uses a comma as an internal delimiter,
namea : [ abc, def, ghi, 123 ]
Because FHiCL is not typed, sequences may mix numbers and strings (and more, see below). Heterogeneous sequences are natural in many scripting languages but it are not natural in C++. It is not possible to access FHiCL heterogeneous sequences via FHiCL's C++ API ; so Mu2e will not use heterogeneous sequences.
Tables
A FHiCL table is a collection of definitions that is surrounded by braces and internally delimited by whitespace:
name : { name0 : 123 name1 : 456. }
ReDefinitions
There are two forms of redefinition supported by FHiCL, a repeated definition within the same scope and a redefinition using the fully qualified name of a parameter.
Redefinitions at intermediate scope are not supported; they do not cause a parse error but they do produce undefined results.
If a definition is repeated twice within the same scope, the second definition will win. For example,
abc : 1 abc : 2 def : [ 1, 2, 3 ] def : [ 4, 5, 6 ] name : { abc : 1 abc : 2 }
will result in the parameter abc having a value of 2 and def having a value of [4,5,6]. This works the same way for parameters whose value type is a table.
Because FHiCL is not strongly typed, the following is legal. But is not every likely to be interesting to Mu2e ( because the C++ API is strongly typed):
abc : 1 abc : { def : [ x, y, z ] }
The second form of redefinition is using the fully qualified name:
a : { b : { c : 1 } } a.b.c : 2
This will be used frequently by Mu2e to make small changes to standard parameter sets.
The following is an example of a redefinition at intermediate scope, which is not supported by FHiCL,
a : { b : { c : 1 } b.c : 2 }
While this sometimes works, it should be avoided since it is not formally defined inside FHiCL.
Replacement also works for individual elements of a sequence, using the notation:
foo :{ odd : [ 1, 3, 5, 7 ] } foo.odd[0] : 9 // Replace 1 with 9 foo.odd[4] : 11 // Extend the sequence with 11 foo.bar : 42 // Add a new definition to the table foo.
After this substitution, the sequence foo.odd has the value [ 9, 3, 5, 7, 11] and the table foo contains two defintions, foo.odd and foo.bar.
FHiCL will let you do the following,
foo :{ odd : [ 1, 3, 5, 7 ] } foo.odd[5] : 11 // Extend the sequence with two new members, @nil and 11.
After this substitution, the sequence foo.odd has the value [ 1, 3, 5, 7, @nil, 11 ]. This is well defined to FHiCL but it will cause problems for art, which will try to convert the @nil in element [4] to a numeric type, which will fail.
Numeric Values
There are two senses in which a value might be a valid numeric value. The first sense is what the FHiCL parser thinks it is; the second sense is whether or not FHiCL's C++ API can successfully convert that value to a C++ numeric type.
When FHiCL parses a document it looks at each atom and decides if it represents a numeric value. If an atom begins with a numeral or with +-. ( plus sign, minus sign, decimal point), then the atom is presumed to represent a numeric value. If, upon subsequent inspection, the atom is not a well-formed numeric value, then FHiCL will throw an exception. This exception is thrown by FHiCL, not by art. At this stage, FHiCL never tests quoted strings to see if they are well-formed or mal-formed numeric values; they are just strings. For example, when FHiCL parses the definition
a : 1a
it will decide that the value is a mal-formed numeric value and will throw an exception. On the other hand,
a : "1a"
defines a value that is just a string type. FHiCL requires the concepts of a well-formed and mal-formed numeric values in order to reduce a document to its canonical form.
The C++ API for accessing FHiCL information is strongly typed; that is you must ask for a numeric value as one of the C++ numeric types, int, float, double and so on. The API will attempt to convert any atomic or string value to the requested integral or floating point type; if the conversion fails, the API will throw an exception. FHiCL will do the obvious type conversions for you; for example, you may ask for
name1 : 456.
as either an integral type or as a floating type; in both cases you will get the expected result.
There is one behaviour of the C++ API that must be called out clearly. Consider the definition,
a : 1.5
The C++ API will let a user ask for this value either as an integral type, as a floating point type or as an std::string. If you ask for it as an integral type the result will be 1. This is consistent with automatic type conversion within C++. We have asked the art development team to expand the C++ API to allow Mu2e select different behaviour: that, if we try to convert a numeric value with a non-zero fractional part to an integral type, it will either print a warning or throw an exception.
If the value of a defintion is @nil, an attempt to convert it to any numeric type will cause the API to throw an exception.
If the value of a definition is one of the reserved identifiers, infinity, +infinity, -infinity, the value may be converted to a floating point type but not to an integral type. The value of the floating point type will be the architecture dependent representation of signed infinity. If the architecture does not support such a representation, the code will throw an exception.
FHiCL only recognizes the decimal point, not the European style comma, as a delimiter for the integer and fractional part of of a floating point number; FHiCL does not support the use of a comma (or the European style decimal point) to delimit thousands, millions and so on.
Additional Information
Except for being a delimiter between definitions, whitespace is unimportant. Therefore the FHiCL fragments
name : { name0 : 123 name1 : 456. } namea : [ abc, def, ghi, 123 ]
could have been written as,
name:{name0:123 name1:456.} namea:[abc,def,ghi,123]
The namec: line in the big example shows that a sequences and tables can be nested inside each other. They can
be nested to arbitrary depth. The next two lines show that it is legal to define empty sequences and tables. Finally,
the last line defines namef as a name that is present but has an undefined value.
FHiCL supports facilities to pre-define values so that they can be used later in multiple places.
FHiCL also supports the ability to modify values, or subsets of values, after they have been defined. These will be discussed later.
One can see from the above that, after a FHiCL document has been parsed, the result is just a FHiCL table.
Configuration of a Module
From the point of view of a Mu2e physicist, art is the tool that drives the event loop and calls user code at the appropriate places in the event loop. User code is found in art modules and the following FHiCL fragment illustrates how to specify the run-time configuration an art module:
moduleLabel :{ module_type : ClassName pname0 : 1234. pname1 : [ abc, def] pname2 : { name0: {} } }
A valid configuration for an art module is expressed as a FHiCL table. Within art,
FHiCL tables are visible to the user as objects of type
fhiclcpp/ParameterSet.h
.
From here forward, this document will usually refer to the run time configuration of
a module as its parameter set, even when discussing
the FHiCL table representation of that parameter set.
The moduleLabel is a FHiCL name, chosen by the Mu2e physicist. It must obey
the rules for FHiCL names, be unique within the
configuration of an art job and not contain the underscore character. In
this context, the FHiCL name
module_type is a identifier reserved to art and must be present in the configuration of a module;
if it is absent, art will throw an exception. The FHiCL value
ClassName
is the name of the C++ class that holds the code that user wishes to execute. By convention,
this code is found in a file, somewhere in the Mu2e Offline hierarchy, with the name
ClassName_module.cc; the Mu2e build system will compile this
file into a dynamic library named
Offline/lib/libClassName_module.os.
At run-time, art looks in the environment variable LD_LIBRARY_PATH to find a file named
libClassName_module.os; it will load this dynamic library and find the code
for the module inside.
At present there is one problem with this convention: if, in two different subdirectories, there are two modules with the same filename, both will produce libraries name lib/libClassName_module.os. A plan is in place to ensure that they make distinctly named libraries and for art to either unambiguously load the correctly library or issue a run-time diagnostic.
The remaining lines of the parameter set are just FHiCL definitions that will be formed into a fhicl::ParameterSet object and passed to the module as an argument in its constructor. The names in this parameter set are meaningful only to the module, not to art itself. While most of the early Mu2e examples only use parameters with atomic values, it is legal to use the full power of FHiCL within a parameter set; that is, the parameter set used to configure a module may include sequences and parameter sets nested to arbitrary depth.
The minimum legal configuration of a module is,
moduleLabel : { module_type : ClassName }
It is meaningful within one art configuration to define two modules that have the same
ClassName
that differ by some elements in the remainder of their configuration. These two
instances of the module are distinguished by having different moduleLabels.
This capability might be used if one wished to run the same algorithm twice in one job, perhaps
once with loose
cuts and once with tight cuts. Any data products
produced by these two module instances will automatically be
labeled in a way that distinguishes which module instance produced them.
Any histograms or ntuples produced by these two module instances will automatically be put
into separate ROOT directories; these directories are named using the moduleLabels.
Configuration of a Service
Art services behave like agents that manage a resource and allow other code to access that resource. There are some services that are native to art and others that are written by Mu2e; Mu2e uses services to manage, among other things, geometry and conditions information. There is more information available about services in art .
The following FHiCL fragment illustrates the run-time configuration of a service:
ClassName :{ pname0 : 1234. pname1 : [ abc, def] }
As for a module, a service is configured with a FHiCL table that is seen by the C++ code as a fhicl::ParameterSet, passed as an argument to the constructor of the service class.
The FHiCL name ClassName is the name of the C++ class that implements the service. This class must live somewhere in either the art or Mu2e code bases as two files with filenames ClassName_service.cc and ClassName.hh. The mu2e build system will compile these files into a shared library with the name ClassName_service.so; the art build system will compile these files into a shared library with the name dir_subdir_ClassName_service.so, where the string dir_subdir, is the file system path from the root of art to ClassName_service.cc.
By definition there may be at most one instance of any service within an art job. Therefore
the analog of a moduleLabel does not exist for services: the ClassName
alone is sufficient to specify which service is requested.
services : { // ParameterSets for zero or more services, // both services defined by art and those defined by Mu2e }
In this context, the FHiCL name services is an identifier reserved to art.
A valid run-time configuration must include a parameter set in the FHiCL table services for each Mu2e written service that will be used in the course of the art job. This is true even if the service has no run-time configurable parameters; in that case an empty parameter must be supplied in the .fcl file; art will not provide a default.
In earlier versions of art there was a convention that the service block should have the structure:
services : { // ParameterSets for zero or more art defined services. user : { // ParameterSets for zero or more Mu2e defined services } }
This experiment failed and this style is deprecated.
Overall Structure of an art Run-time Configuration
The example below illustrates the top level view of an art run-time configuration; some details have been omitted for clarity. An art run-time configuration is just a FHiCL document. At present they live in simple files but we expect that, at some future date, the configurations will be kept in databases; this will allow a more robust audit trail of how each data file was processed.
In the following, the identifiers highlighted in red are reserved to art; these are in addition to the identifier module_type discussed earlier and the identifier @local:: discussed later.
process_name : helloWorld # The process name must NOT contain any underscores source : { # Parameters for exactly one source module } services : { # ParameterSets for zero or more services. } physics: { producers : { # ParameterSets for zero or more producer modules } analyzers: { # ParameterSets for zero or more analyzer modules } filters : { # ParameterSets for zero or more filter modules } path0 : [ comma separated list of module labels of producer or filter modules ] path1 : [ comma separated list of module labels of producer or filter modules ] path2 : [ comma separated list of module labels of analyzer or output modules ] path3 : [ comma separated list of module labels of analyzer or output modules ] trigger_paths: [ path0, path1 ] end_paths: [ path2, path3 ] } outputs: { # ParameterSets for zero or more output modules }
The parameter process_name identifies this art job. It is used as
part of the
identifier for data products produced in this job. For this reason, the process name may not
contain underscore characters. If the process_name is absent, art substitutes
a default value of "DUMMY".
The source parameter set describes where events come from. There may be at most one
source module declared in an art configuration. At present there are two options
for choosing a source module:
- module_type : RootInput
art::Events will be read from an input file or from a list of input files; files are specified by giving their pathname within the file system. In the future Mu2e will support a file catalog but, at present, there is no such system. - module_type : EmptyEvent
Internally art will start the processing of each event by incrementing the event number and creating an empty art::Event. Subsequent modules then populate the art::Event. This is the normal procedure for generating simulated events.
See the [[IOModules| web page about configuring input and output modules] for details about what other parameters may be supplied to these parameter sets. If no source parameter set is present, art substitutes a default parameter set of:
source : { module_type : EmptyEvent maxEvents : 1 }
The configuration of art services was discussed above.
If the services parameter set is missing entirely, art will supply a default
that configures only the message logger.
If an art-supplied service is requested by the code, and if there is no corresponding parameter set in the .fcl file, then art will supply a default parameter set. If a Mu2e-defined service is requested by the code, and if there no corresponding parameter set in the .fcl file, then art will throw.
Some of the art-supplied services can be turned on from command line switches; these include a service to trace all of the module and service calls made by art, a service to present timing information and a service to profile memory usage. When the command line switch is present, the service need not be included in the run-time configuration; art will supply defaults if needed. There is additional information available about the behaviour of the art-supplied services ; this discusses their default parameter sets and how to request them from the command line.
The physics parameter set has five reserved identifiers: filters, analyzers, producers, trigger_paths and end_paths. The first three must have values that are FHiCL tables of parameter sets and the last two must have values that are FHiCL sequences of art path names; an art path name is a FHiCL sequence of module labels. Any other top level parameter within the physics parameter set will be interpreted as an art path name; that is, it must be a FHiCL sequence of module labels. There is another web page that [Paths.shtml discusses paths in more detail] .
The <fonte color=red>physics.producers parameter set should contain parameter sets, of the form shown above, that are used to configure EDProducer modules. Similarly the physics.analyzers and physics.filters parameter sets should hold the configuration information for the EDAnalyzer and EDFilter modules, respectively. At present these rules are not rigorously enforced but they will be soon.
If the physics parameter set, or any of its components are missing, art will substitute a default value of an empty parameter set or an empty sequence, as appropriate.
The final element in a run-time configuration is the outputs parameter set. It contains parameter sets that configure zero or more output modules. The rules to send one subset of the events to one output file and a different subset of the events to a different output file are described on the web page that discusses paths in more detail.
If the outputs parameter set is missing, art will supply a default value of an empty parameter set.
This leaves a short discussion of art paths and the identifiers trigger_paths and end_paths. An art path is just a FHiCL sequence of module labels; there are four such paths defined in this example, path0 through path3. Any first level name in the physics parameter set, except for the five identifiers reserved to art, will be interpreted as the name of an art path. The trigger_paths definition is a FHiCL sequence of art path names; if an art path is an element of the trigger_paths sequence, then moduleLabels in that path
- must be labels of EDProducer and EDFilter modules.
- will be executed in the specified order.
The end_paths definition is also a FHiCL sequence of art path names; if an art path is an element of the end_paths sequence, then modulesLabels in that path
- must be labels of EDAnalyzer and output modules.
- may be executed in any order
A full discussion of paths is available elsewhere.
If either trigger_paths or end_paths is absent from a configuration, art will substitute a default value of an empty sequence.
Unimportance of Ordering
Once any redefinitions have been processed, neither art nor FHiCL care about the order of items within the resulting FHiCL document. Both do care about how definitions nest inside each other. FHiCL always cares about the order of items within a sequence but art only cares about the order of items within those paths that are part of the trigger_paths sequence.
In summary, in the above example art only cares about the order of the elements inside the sequences path0 and path1. Any other reordering that preserves the nesting structure is equivalent to that shown.
Command Line Arguments
Some elements of an art run-time configuration may be overridden by parameters that appear on the art command line. To see what the options are:
mu2e --help
At this writting (Sept 2011), the allowed command line parameters are:
mu2e <-c <config-file>> <other-options> [<source-file>]+: -T [ --TFileName ] arg File name for TFileService. -c [ --config ] arg Configuration file. -e [ --estart ] arg Event # of first event to process. -h [ --help ] produce help message -n [ --nevts ] arg Number of events to process. --nskip arg Number of events to skip. -o [ --output ] arg Event output stream file. -s [ --source ] arg Source data file (multiple OK). -S [ --source-list ] arg file containing a list of source files to read, one per line. --trace Activate tracing. --notrace Deactivate tracing. --memcheck Activate monitoring of memory use. --nomemcheck Deactivate monitoring of memory use.
All command line parameters are optional and many have both a short and a long form. In general the command line parameters can modify the names of files, the flow of the event loop and whether or not some monitoring services are enabled. If a parameter is specified both within the .fcl file and on the command line, the command line value takes precedence.
In the original design of art, it was planned that no parameters that control the physics behaviour would be exposed on the command line; they would only be modifiable by editing the .fcl file. It was planned that file names, debug levels and other parameters that do not change the physics behaviour would be modifiable from the command line. This restriction was chosen because of the then existing ideas about how to maintain an audit trail for run-time configurations. Recently the NOvA experiment, who also uses art, asked to remove this restriction; the plan to retain a strict audit trail is that the final state of the .fcl file, after all command line substitutions, will be stored in each of the event-data output files.
It is not clear yet if Mu2e will stick with the original plan or if we will choose to follow the route chosen by NOvA. Another option is to define the idea of an art "production mode". The idea behind production mode is that there are many convenience functions that are valuable for development and debugging but which make it difficult to maintain an audit trail. Among these convenience functions is the ability to override an arbitrary .fcl parameter from the command line. One could imagine that, when production mode is not set, all of these convenience functions would be available but, when production mode is set, they would be disabled.
The Canonical Form and the Hash Code
When FHiCL prints a document to the screen or to a file the output appears in what's known as the canonical form, in which the source formatting is entirely lost. In this form all comments are stripped, there is standardized indentation, and all strings, even atomic strings, are double quoted. Because the order of definitions is not important, they will appear in the canonical form in some well defined order meaningful to FHiCL; in general this order is not that in which they appeared in the input file.
In a FHiCL document, if a name is redefined, only the final definition will be present in the canonical form. All earlier definitions will be lost.
The other aspect of the canonical form is that all atoms that are identified by FHiCL as well-formed numeric types will be represented in a FHiCL-defined form. This form may be different than the form that appeared in the input file; but it will convert to the same bit pattern when converted to a numeric type. For example if the source file contains the definition,
a : 123.45
The canonical form will contain:
a : 1.e2345e2
The canonical form of a floating point number is scientific notation with enough digits to guarantee no loss of precision.
Other examples of source formats are,
b: 1 c: 1. d: 1 e: +1 f: 1.23e2 g: 1.23e3
These will have the canonical forms:
b: 1 c: 1 d: 1 e: 1 f: 123 g: 1230
All of b through e have the same canonical form. One rule is that if a meaningless fractional part or sign is present, it will not appear in the canonical form. A second rule is that if a number is represented in the source format in scientific notation, and if that number is representable, without loss of precision, as an integer less than 999999, then the number will be represent as an integer; this rule is illustrated in items f and g.
Once a configuration has been reduced to its canonical form, one can hash that form to compute an almost-certainly-unique key. See the FHiCL documentation for details. One can use the hash codes as a short cut to ask if two configurations are the same or different.
Why bother with a canonical form? In particular, why bother with a canonical form for numeric values?
Many previous experiments have discovered that configuration files get handed around. In the process, trival changes accumulate. For example a definition that starts life as
a: 1
May end up a few generations later as
a: 1.0
This has the consequence that two configurations that really do the same physics produce different hash codes. If the configurations, and/or their hash codes, are tracked as part of the meta-data, this makes it tedious to identify files that were produced using the same configuration.
While having a canonical form for numeric values will certainly raise new issues, we expect it to be a net reduction in the complexity of tracking meta-data.
One issue currently under study is this: consider the task of running 1000 Monte Carlo jobs that differ only in their random number seeds and the in the names of their output files. Can we arrange things such that the properties of the ensemble of jobs are represented in a single FHiCL document while the details of the different jobs are represented in some other way? If we can do this, then all files produced by this set of jobs will have a configuration that has the same hash code as every other job in the set. At present this will not work because of the way that random number seeds are distributed.
Printing the Canonical Form
It is possible, but weird, to use art to print the canonical form of the run-time configuration:
export ART_DEBUG_CONFIG=1 mu2e -c file.fcl unset ART_DEBUG_CONFIG
When this environment variable is set, art will get its parameter set from FHiCL, if needed insert some of its own defaults, insert any command line arguments into the parameter set, print the canonical form to the screen, and then exit.
@local and @table
FHiCL has the ability to define a value as a reference to a previously defined value. Consider the following FHiCL fragment,
foo : 5 source : { module_type : EmptyEvent maxEvents : @local::foo }
This fragment tells art to process a maximum of 5 events. If you print the canonical form of this parameter set, the @local::foo has entirely disappeared and has been replaced by 5. At this stage FHiCL no longer has knowledge of where the 5 came form and is equivalent to:
source : { module_type : EmptyEvent maxEvents : 5 }
The identifier @local::
is reserved to FHiCL.
When FHiCL encounters a value defined
by the syntax @local::foo it will look
in its current file for a top level object named foo
and it replace @local::foo with the value
found in the earlier definition. This works for all kinds of values, atomic, sequences or tables.
@table:: is very similar except the substitution removes the brackets when the item is a table:
BEGIN_PROLOG locvarname : 5 locname : { a : avalue } tabname : { b : bvalue } END_PROLOG stanza : { locvar : @local::locvarname locstanza: @local::locname tabstanza : { @table::tabname } }
resolves to :
stanza: { locvar: 5 locstanza: { a: "avalue" } tabstanza: { b: "bvalue" } }
PROLOGs and #includes
FHiCL supports an include mechanism that behaves much like that of the C Preprocessor. Suppose that the file defaults.fcl contains:
BEGIN_PROLOG g4run_default : { module_type : G4 generatorModuleLabel : generate seed : [9877] } END_PROLOG
For the moment, ignore the BEGIN_PROLOG and END_PROLOG lines. After creating this file, one may write the following fragment as part of a run-time configuration,
#include "defaults.fcl" physics : { producers : { g4run : @local::g4run_default } }
FHiCL is forgiving about superfluous white space within an include statement. Using includes provides a mechanism to distribute a standard configuration for g4run that can be used by many people.
The purpose of the PROLOG markers is to tell FHiCL that the material inside the PROLOG is not part of the final FHiCL document; it is merely a collection of some useful definitions that may or may not be used. Therefore FHiCL excludes the prolog from the document that it sends to art. If one prints the canonical form of the final document, the PROLOG is absent.
One could have chosen to move the BEGIN/END_PROLOG from the included file to surrounding the #include statement in the top level file. FHiCL will be happy with either; the current recommendation is to put them inside the included file. This makes clear that the purpose of the included file is to be a PROLOG.
There may be many BEGIN/END_PROLOG sections within one .fcl file but they may not be nested. Includes may be nested so long as they do not cause PROLOGS to become nested.
Together the features, @local::, redefinition, PROLOGS and #include will allow Mu2e to create standard configurations and to express most actual jobs as one of the standard configurations plus a small collection of deltas. We hope that this will make it easier to understand what any given job actually did.
The C++ API (Validated FHiCL)
The following fragment shows how to read FHiCL parameters into a C++ program. For a long time there was no automated validation of fcl and it was easy to fcl to fail silently. For example if we try to set a parameter "abc" and accidentally write "adc", this will not be caught. The fcl will be interpreted and not setting "abc" and the line that was there ("adc") will be ignored. To close this problem in 2018, fcl validation was introduced. Mu2e will write all new fcl interfaces using this validation, described in this section, but the older system is still common, and is described in the following section.
The following is an example of the c++ interface.
class MyModule : public art::EDAnalyzer { public: struct Config { using Name=fhicl::Name; using Comment=fhicl::Comment; fhicl::Atom<int> verbose{Name("verbose"), Comment("verbosity level (0-10)"), 0}; fhicl::Atom<double> tmin{Name("tmin"), Comment("min time cut"), 500.}; fhicl::Atom<art::InputTag> input{ Name("input"), Comment("Tag of the product to analyze.")}; // sequence of any length (no default) Sequence<int> ilist{Name("ilist"),Comment("list if ints")}; // sequence of fixed length, any other length is an error (default is all zero) Sequence<double,3u> point{Name("point"),Comment("[x,y,z] as doubles"),{0.0,0.0,0.0}}; OptionalAtom<std::string> message { Name("message"), Comment("print message"), "" }; }; # the following line is needed to enable art --print-description typedef art::EDAnalyzer::Table<Config> Parameters; explicit MyModule(const Parameters& conf); void analyze(const art::Event& evt) override; private: Config _conf; int _verbose; }; //================================================================ MyModule::MyModule(const Parameters& conf): _conf(conf()),_verbose(conf().verbose()) { ... } //================================================================ void MyModule::analyze(const art::Event& event) { auto ih = event.getValidHandle<SimParticle>(_conf.input()); if(_conf.verbose()>1) print .. CLHEP::Hep3Vector loc( _conf.point(0), _conf.point(1), _conf.point(2),); }
In this example, art has extracted the parameter set for this module from the .fcl file and it is passed to the constructor as the variable conf. In this example, conf is saved as a member since it is a reasonable container for the fcl parameters. At the same time, it shows that you can access the parameters immediately, in the constructor, as the "verbose" member is set.
The first argument to the Atom
constructors is the name as it will appear in the fcl. The second is a description of the parameter which will be printed with the art --print-description
command line option. Some of the parameters here have defaults, which are the third argument to the Atom constructor.
Parameters are required by default, so it is an error if they are missing, but if it should be optional, the parameter can be switched to OptionalAtom
and it won't be an error if it is missing.
Sequences may be of fixed length, in which case any other length is an error, or arbitrary length.
FIXME
The line that initializes the member datum _diagLevel asks pset to find a parameter whose name is diagLevel, to convert it to an int and to return the value of that int; if pset cannot find a parameter named diagLevel, pset will return the value of the second argument to the get call, in this case 0. If the parameter exists and conversion to an int fails, then art will throw an exception. This can happen if the FHiCL parameter diagLevel has a value that is a string that is not a valid numeric value or if its value is @nil. There is no requirement that the name of the member datum, _diagLevel, match the name of the item in the parameter set; but it seems silly if they do not match.
Similarly, the next line looks for a FHiCL name "g4ModuleLabel" and attempts to convert it to a string. In this case there is no second argument to the get call; therefore, if "g4ModuleLabel" is not found in the parameter set, art will throw an exception.
The decision of whether or not to provide a default value is very important. We recommend that if a parameter influences physics behaviour and it is likely to be set by normal users, then that parameter should NOT have defaults. If a parameters influences only diagnostics, it should have defaults. The more difficult question is that some parameters do influence physics behaviour but should only be modified by experts; here a case can be made for providing defaults in the code:
- If we expose the parameter, non-experts might be tempted to play with it.
- If the configurations become too big they become too hard to manage.
There are other more complex options in the validated fcl system. For example, some parameters can be excluded from validation. A tuple type is available.
Here are pointers to the documentation:
- art wiki
- FHICL wiki
- page about converting FHICL into "nearly arbitrary C++ type"
- An example in mu2e code:
Analyses/src/SimParticleTimeMapAnalyzer_module.cc
If your code, which you would like to validate, is part of a larger fcl stanza which is not validated, then there are additional tricks that can be employed. For an example, see retrieveConfiguration()
function in EventMixing/src/ResamplingMixer_module.cc
. See also the function of the same name in
this example.
The Deprecated C++ API
All new Mu2e code should be written with fcl validation, but as long as the older unvalidated system is still common, we include this section for reference.
The following fragment abstracted from
Analyses/src/ReadBack_module.cc
shows how to read FHiCL parameters into a C++ program.
namespace mu2e { class ReadBack : public art::EDAnalyzer { public: explicit ReadBack(fhicl::ParameterSet const& pset); private: int _diagLevel; std::string _g4ModuleLabel; std::string _generatorModuleLabel; }; ReadBack::ReadBack(fhicl::ParameterSet const& pset) : _diagLevel (pset.get<int>("diagLevel",0)), _g4ModuleLabel (pset.get<std::string>("g4ModuleLabel")), _generatorModuleLabel(pset.get<std::string>("generatorModuleLabel")){ } }
In this example, art has extracted the parameter set for this module from the .fcl file and it is passed to the constructor as the variable pset. The line that initializes the member datum _diagLevel asks pset to find a parameter whose name is diagLevel, to convert it to an int and to return the value of that int; if pset cannot find a parameter named diagLevel, pset will return the value of the second argument to the get call, in this case 0. If the parameter exists and conversion to an int fails, then art will throw an exception. This can happen if the FHiCL parameter diagLevel has a value that is a string that is not a valid numeric value or if its value is @nil. There is no requirement that the name of the member datum, _diagLevel, match the name of the item in the parameter set; but it seems silly if they do not match.
Similarly, the next line looks for a FHiCL name "g4ModuleLabel" and attempts to convert it to a string. In this case there is no second argument to the get call; therefore, if "g4ModuleLabel" is not found in the parameter set, art will throw an exception.
The decision of whether or not to provide a default value is very important. We recommend that if a parameter influences physics behaviour and it is likely to be set by normal users, then that parameter should NOT have defaults. If a parameters influences only diagnostics, it should have defaults. The more difficult question is that some parameters do influence physics behaviour but should only be modified by experts; here a case can be made for providing defaults in the code:
- If we expose the parameter, non-experts might be tempted to play with it.
- If the configurations become too big they become too hard to manage.
The following presents a few more examples. The parameters
intArray : [ 1, 2, 3 ] doubleArray : [ 1., 2., 3. ] stringArray : [ "foo", "bar" ] paramSet : { foo : 1 bar : 2 }
can be read into a C++ program with the following code fragments:
std::vector<int> iArray( pset.get<std::vector<int> >("intArray")); std::vector<double> dArray( pset.get<std::vector<double> >("doubleArray")); std::vector<std::string> sArray( pset.get<std::vector<std::string> >("stringArray")); fhicl::ParameterSet params( pset.get<fhicl::ParameterSet>("paramSet"));
In all cases it is possible to provide default values as a second argument to the get method. At present there is no accessor method to let you get one element of a sequence; you must get the entire sequence.
It is the intention of the art team that an arbitrary class T can be initialized by,
T t( pset.get<T>("Name"));
The precise details of how to make this work remain to be specified.
The argument pset passed to the constructor of a module or
a service is a temporary constructed by art that will go out of scope soon after
the return from the constructor.
If one wishes to retain pset as member data in a module,
it must be held by making a copy, not by holding a pointer or reference to the
argument
pset.
Utilities
You can get art to print the final fully-interpreted fcl commands written to a file. This is useful to see what is happening with all the includes and "@local" and "@table" substitutions.
mu2e --debug-config fhicl_debug.txt -c myconfig.fcl
or
fhicl-dump myconfig.fcl > fhicl_debug.txt
Running with the switch mu2e --trace
will cause a line to printed as each module is called. This can be a quick way to see what module is printing an unfamiliar line, or confirming the modules that are being called.
A summary of what modules ran and filter results:
services.scheduler.wantSummary: true
A summary of the time that each modules used
services.TimeTracker.printSummary: true
It is also possible to get a database of time and memory usage, recorded for each event and module, into a database. --timing-db timing.db --memcheck-db memory.db
Then art will create a file that contains an sqlite database containing information about the time or memory taken by each call to each module so you can do things like plot histograms of execution time per module, identify events that took a long time etc. Memory values are in MB.
There are some instructions on how to query the database, examples (may not be working) in $ART_DIR/tools/sqlite
, and a cheat sheet below.
sqlite3 memory.db sqlite> .help sqlite> .tables sqlite> .schema <tablename>; sqlite> select * from <tablename>; sqlite> .quit
sqlite3 memory.db sqlite> .output dump.txt sqlite> select * from ModuleInfo; sqlite> .output stdout sqlite> .quit