SearchPaths: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__FORCETOC__


== Introduction==
== Introduction==
Line 11: Line 11:
code declares success on the first match; it never looks to see if there is more than one match.
code declares success on the first match; it never looks to see if there is more than one match.


If the code cannot find a match, it throws.
If the code cannot find a match, it throws an exception.


A search path is specified in an environment variable as colon separated list of directories.
A search path is specified in an environment variable as a colon separated list of directories.
This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.
This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.


The search algorithm treats absolute paths specially.  The code can be configured so that, for absolute
Mu2e has chosen to configure the search algorithm so that, with one exception, absolute paths are
paths, it ignores the search path and simply looks to see if there is a file at the absolute path.
forbidden. That exception is for the -c command line argument of the mu2e command.  If an absolute
If there is no such file, it throwsAlternatively it can be configured to disallow absolute paths
path is specified in any other context the code will throw an exception.   
and to only look for files relative to the search path; in this case it throws as soon as it sees
Mu2e has also chosen that the search algorithm will not look for files relative to the current working
an absolute path.  This last feature is useful when the program is being used to in production runs
directory unless that directory is included in the search path.
for which it is important to maintain a strict audit trail. Mu2e Offline is normally configured to
These are safety features to ensure that production campaigns can only reference configuration files that are source-code controlled.
disallow absolute paths; this is to make sure that we do not have a lot of work to do when we
start production.


The search algorithm can also be configured to treat paths with a leading "." with the same rules
as apply to absolute paths.


Both tools have the same policy, that the file must be found exactly by adding the requested relative
For those not familiar with the concept of a search path, the foillowing example explains it.
path onto each element of the search path. Suppose that we wisht to specify a file:
Suppose that we wish to specify a file:
<pre>
<pre>
/A/B/C/D/E/F.txt
/A/B/C/D/E/F.txt
</pre>
</pre>
And the setup scripts have defined:
And the setup scripts have defined:
<pre>
<pre>
export MU2E_FILE_PATH=/A/B/I/:/A/G/I:/A/B/C
export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C
</pre>
</pre>
If we search for the file "F.txt" it will not be resolved because the system only looks for the following
If we search for the file "F.txt" it will not be resolved because the system only looks for the following
Line 42: Line 38:
path were
path were
<pre>
<pre>
export MU2E_FILE_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E
export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E
</pre>
</pre>
then "F.txt" would match with the last element.
then "F.txt" would match with the last element.




Three important files are not covered by this policy:
Three important classes of files are not covered by this policy:
event-data input files, event-data output files and the root file managed
event-data input files, event-data output files and the root output file managed
by the TFileService.  These files are managed by other facilties which
by the TFileService.
allow only two options: an absolute path or a path that is relative to the
current working directory at the time of execution.


==FHICL_FILE_PATH==
==FHICL_FILE_PATH==


This environment variable is the search path used to find the .fcl file named by the -c command line argument.
This environment variable is used by art to find .fcl files.
It is also used to find any .fcl files that are included by the top level .fcl file, and recursively for
 
all included .fcl filesAs of Offline v1_0_0 this is defined as:
When you run art, it looks for the .fcl file named with the -c argument in the current working directory; if that fails it looks for the file relateive to the search path defined by the environment variable FHICL_FILE_PATH. When art looks for files referenced by #include directives it only looks for files relative to FHICL_FILE_PATH; it does NOT look for files relative to the current working directory unless that directory is included in FHICL_FILE_PATH (which it normally is). The definition of FHICL_FILE_PATH is done when you issue the command "muse setup"When you issue that command in a Mu2e working directory without a backing release and with Offline cloned in your Muse working area, FHICL_FILE_PATH is set to
<pre>
<pre>
export FHICL_FILE_PATH=.:fcl;
export FHICL_FILE_PATH=${MUSE_BUILD_DIR}:${MUSE_WORK_DIR}
</pre>
</pre>
This definition is done in Offline/setup.sh . This will soon be changed to something like:
Why is MUSE_BUILD_DIR included? This allows for complex fcl files to be built by scripts that are run during "muse build".  Such files are located in the build area. A Muse working directory without a backing release and without Offline is not a common thing to do; if you have such a working area, the FHICL_FILE_PATH contains only ${MUSE_WORK_DIR}.
<pre>
 
export FHICL_FILE_PATH=$MU2E_BASE_RELEASE:$MU2E_BASE_RELEASE/fcl
 
</pre>
When you do "muse setup" in a working area that contains a backing release, muse defines FHICL_FILE_PATH by the following rule:
And, when one sets up a satellite release it will be redfined as,
# The first element is the muse working directory
<pre>
# The remaining elements of the path are those that would be present if you did "muse setup" in the backing directory.
export FHICL_FILE_PATH=$MU2E_SATELLITE_RELEASE:$MU2E_SATELLITE_RELEASE/fcl:$FHICL_FILE_PATH
# For nested backing release, the previous element is recursive.  The spirit is, the higher a directory is in the backing heirarchy, the earlier it is in FHICL_FILE_PATH.
</pre>
FHiCL is configured to allow an absolute path for the .fcl file
specified on the command line but, for paths to included files,
FHiCL only allows paths relative to FHICL_FILE_PATH.


FHiCL is configured to allow an absolute path for the .fcl file specified on the command line but, for paths to included files, FHiCL only allows paths relative to FHICL_FILE_PATH.  This is a safety feature to ensure that production campaigns only use .fcl files that are source code controlled.


There are two FHiCL prolog files that are included in many .fcl files:
There are two FHiCL prolog files that are included in many .fcl files:
<pre>
<pre>
#include "minimalMessageService.fcl"
#include "fcl/minimalMessageService.fcl"
#include "standardProducers.fcl"
#include "fcl/standardProducers.fcl"
</pre>
</pre>
These are found in Offline/fcl.  We currently recommend that these includes be written as shown here,
As time goes on other such files may be defined.
without the leading fcl.




Line 89: Line 79:
==MU2E_SEARCH_PATH==
==MU2E_SEARCH_PATH==


All of the non-fcl run-time configuration and one special data stream search for their files using
Mu2e code uses this environment variable to search for auxilliary files.
the environment variable MU2E_SEARCH_PATHThis distinction is a historical artifact: the formats
The environment variable is defined when you run "mu2e setup"The variable is defned to be
of these other configuration files, and the tools to read them, were defined before FHiCL was
created.


The class that supports this functionality is
export MU2E_SEARCH_PATH=${FHICL_FILE_PATH}:${MU2E_DATA_PATH};
<code>Mu2eUtilities/inc/ConfigFileLookupPolicy.hh</code>; under the covers
 
it uses technology from cetlib. Throughout the Mu2e Offline documentation,
where MU2E_DATA_PATH normally points to /cvmfs/mu2e.opensciencegrid.org/DataFiles/ but can be configured to other
we will refer to this functionality as
values if /cvmfs is not visible.
"the file lookup policy".  This feature is used in:
 
Auxiliary files include:
<ol>
<ol>
  <li> Any file that is parsed by SimpleConfig.  This includes:
  <li> Any file that is parsed by SimpleConfig.  This includes:
     <ul>
     <ul>
         <li> The geometry file, read by the GeometryService.
         <li> The geometry file, read by the GeometryService.
         <li> The event generator run-time configuration file, read by the EventGenerator module.
         <li> Some of the old style event generator run-time configuration files; newer event generator modules use fcl to get their configuration.
         <li> The conditions data run-time configuration file, read by the ConditionsService.
         <li> Old style conditions data that is still read with the ConditionsService.  It will eventually be migrated to the ProductionsService and these files will be removed.
         <li> Any files included into the above three files using #include.
         <li> Any files included into the above three files using #include.
       </ul>
       </ul>
  <li> The magnetic field maps, read by the BFieldManagerMaker.
  <li> The magnetic field maps, read by the BFieldManagerMaker.
  <li> The particle data table files, read by the ParticleDataTable class.
  <li> The particle data table files, read by the ParticleDataList class.
  <li> The G4 macro file optionally read by G4_plugin.cc
  <li> The G4 macro file optionally read by G4_plugin.cc
  <li> The beam arrival time distribution read by FoilParticleGenerator( This should be      moved into the conditions service ).
  <li> Some probability distribution functions represented as binned data.
  <li> Configuration data used by Ai/ML inference code.
  <li> Input particles in G4beamline format, read using <code>EventGenerator/inc/FromG4BLFile.hh</code>
  <li> Input particles in G4beamline format, read using <code>EventGenerator/inc/FromG4BLFile.hh</code>
</ol>
</ol>


In the following, it is presumed that the reader is familiar with the ideas
of [[SatelliteRelease|base releases and satellite releases]].
The environment variable MU2E_SEARCH_PATH is built at two places in the Mu2e setup scripts.
In <code>Offline/setup.sh</code> it is set to
<pre>
export MU2E_SEARCH_PATH=$MU2E_BASE_RELEASE:$MU2E_DATA_PATH;
</pre>
The idea is that the code will search for files first in the base release and then in $MU2E_DATA_PATH.
All run-time configuration files should be found under $MU2E_BASE_RELEASE.  Large files
that do not really part of a particular code release, such as magnetic field maps should
be found under $MU2E_DATA_PATH.


In <code>Offline/bin/addlocal.sh</code> the environment variable is set to
Most run-time configuration files should be found in one of Mu2e repositories so that they are under source code management
<pre>
and can evolve along with the code that reads them. Large files
export MU2E_SEARCH_PATH=$MU2E_TEST_RELEASE:$MU2E_BASE_RELEASE:$MU2E_DATA_PATH;
that are not tied to a particular code version, such as magnetic field maps should be found under $MU2E_DATA_PATH.
</pre>
 
The idea here is that the code should first look for run-time configuration files in the test
==CET_PLUGIN_PATH==
release and, if it is not found, to look again in the base release. One can also deploy alternate
 
version of the large data files using at test release.
art uses CET_PLUGIN_PATH to look for libraries that contain either art plugins or root dictionaries.  art plugins include modules, services and tools.  On startup art scans CET_PLUGIN_PATH to find all such files.  It loads the dictionaries and it records where to find plugin libraries for use when the job fcl requests to load a plugin. 
 
Very early versions of art used LD_LIBRARY_PATH for this purpose.  The are team changed this to use a new environment variable, CET_PLUGIN_PATH for two reasons:
# To reduce startup time by reducing the number of directories that had to be searched
# In preparation for builds using RPATHS, for which LD_LIBRARY_PATH is not necessary.
 
==Hint for Looking at Search Paths==
 
If you want to look at the definition of a search path you can just echo the environment variable.  However this prints everything in one line, which is often difficult to read. You make make a more readable view using, for example:


We are considering replacing the files parsed with SimpleConfig with FHiCL format files because
echo $FHICL_FILE_PATH | tr : \\n
the grouping of parameters will be enforced by the language and not just by convention.


which replaces the colons in the path with newline characters.  See man tr.


[[Category:Computing]]
[[Category:Computing]]
[[Category:Computing/Code]]
[[Category:Code]]

Latest revision as of 22:40, 22 October 2024


Introduction

Mu2e Offline supports the concept of a search path for most files that configure either the behaviour of art itself or that of the Mu2e Offline code. This means that you can specify a partial path to such a file and the code will search for that file in an ordered list of places. The code will traverse the list of places and look for the file in each place. As soon as it successfully finds the file it will open that file and continue.

What happens if the file is present at more than one place on the list? The list is ordered and the code declares success on the first match; it never looks to see if there is more than one match.

If the code cannot find a match, it throws an exception.

A search path is specified in an environment variable as a colon separated list of directories. This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.

Mu2e has chosen to configure the search algorithm so that, with one exception, absolute paths are forbidden. That exception is for the -c command line argument of the mu2e command. If an absolute path is specified in any other context the code will throw an exception. Mu2e has also chosen that the search algorithm will not look for files relative to the current working directory unless that directory is included in the search path. These are safety features to ensure that production campaigns can only reference configuration files that are source-code controlled.


For those not familiar with the concept of a search path, the foillowing example explains it. Suppose that we wish to specify a file:

/A/B/C/D/E/F.txt

And the setup scripts have defined:

export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C

If we search for the file "F.txt" it will not be resolved because the system only looks for the following files: "A/B/I/F.txt", "A/G/I/F.txt", "A/B/C/F.txt". To find the file of interest with the path above, one must ask for "D/E/F.txt", which will find a match on the last element. Alternatively, if the search path were

export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E

then "F.txt" would match with the last element.


Three important classes of files are not covered by this policy: event-data input files, event-data output files and the root output file managed by the TFileService.

FHICL_FILE_PATH

This environment variable is used by art to find .fcl files.

When you run art, it looks for the .fcl file named with the -c argument in the current working directory; if that fails it looks for the file relateive to the search path defined by the environment variable FHICL_FILE_PATH. When art looks for files referenced by #include directives it only looks for files relative to FHICL_FILE_PATH; it does NOT look for files relative to the current working directory unless that directory is included in FHICL_FILE_PATH (which it normally is). The definition of FHICL_FILE_PATH is done when you issue the command "muse setup". When you issue that command in a Mu2e working directory without a backing release and with Offline cloned in your Muse working area, FHICL_FILE_PATH is set to

export FHICL_FILE_PATH=${MUSE_BUILD_DIR}:${MUSE_WORK_DIR}

Why is MUSE_BUILD_DIR included? This allows for complex fcl files to be built by scripts that are run during "muse build". Such files are located in the build area. A Muse working directory without a backing release and without Offline is not a common thing to do; if you have such a working area, the FHICL_FILE_PATH contains only ${MUSE_WORK_DIR}.


When you do "muse setup" in a working area that contains a backing release, muse defines FHICL_FILE_PATH by the following rule:

  1. The first element is the muse working directory
  2. The remaining elements of the path are those that would be present if you did "muse setup" in the backing directory.
  3. For nested backing release, the previous element is recursive. The spirit is, the higher a directory is in the backing heirarchy, the earlier it is in FHICL_FILE_PATH.

FHiCL is configured to allow an absolute path for the .fcl file specified on the command line but, for paths to included files, FHiCL only allows paths relative to FHICL_FILE_PATH. This is a safety feature to ensure that production campaigns only use .fcl files that are source code controlled.

There are two FHiCL prolog files that are included in many .fcl files:

#include "fcl/minimalMessageService.fcl"
#include "fcl/standardProducers.fcl"

As time goes on other such files may be defined.


As we get experience with FHiCL we will consider adjustments to these policies. The policies will be as open as possible during our development phase, with the constraint that we understand how to ensure a strict audit trail when the time comes for large scale production.

MU2E_SEARCH_PATH

Mu2e code uses this environment variable to search for auxilliary files. The environment variable is defined when you run "mu2e setup". The variable is defned to be

export MU2E_SEARCH_PATH=${FHICL_FILE_PATH}:${MU2E_DATA_PATH};

where MU2E_DATA_PATH normally points to /cvmfs/mu2e.opensciencegrid.org/DataFiles/ but can be configured to other values if /cvmfs is not visible.

Auxiliary files include:

  1. Any file that is parsed by SimpleConfig. This includes:
    • The geometry file, read by the GeometryService.
    • Some of the old style event generator run-time configuration files; newer event generator modules use fcl to get their configuration.
    • Old style conditions data that is still read with the ConditionsService. It will eventually be migrated to the ProductionsService and these files will be removed.
    • Any files included into the above three files using #include.
  2. The magnetic field maps, read by the BFieldManagerMaker.
  3. The particle data table files, read by the ParticleDataList class.
  4. The G4 macro file optionally read by G4_plugin.cc
  5. Some probability distribution functions represented as binned data.
  6. Configuration data used by Ai/ML inference code.
  7. Input particles in G4beamline format, read using EventGenerator/inc/FromG4BLFile.hh


Most run-time configuration files should be found in one of Mu2e repositories so that they are under source code management and can evolve along with the code that reads them. Large files that are not tied to a particular code version, such as magnetic field maps should be found under $MU2E_DATA_PATH.

CET_PLUGIN_PATH

art uses CET_PLUGIN_PATH to look for libraries that contain either art plugins or root dictionaries. art plugins include modules, services and tools. On startup art scans CET_PLUGIN_PATH to find all such files. It loads the dictionaries and it records where to find plugin libraries for use when the job fcl requests to load a plugin.

Very early versions of art used LD_LIBRARY_PATH for this purpose. The are team changed this to use a new environment variable, CET_PLUGIN_PATH for two reasons:

  1. To reduce startup time by reducing the number of directories that had to be searched
  2. In preparation for builds using RPATHS, for which LD_LIBRARY_PATH is not necessary.

Hint for Looking at Search Paths

If you want to look at the definition of a search path you can just echo the environment variable. However this prints everything in one line, which is often difficult to read. You make make a more readable view using, for example:

echo $FHICL_FILE_PATH | tr : \\n

which replaces the colons in the path with newline characters. See man tr.