SearchPaths: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
__FORCETOC__


== Introduction==
== Introduction==
Line 11: Line 11:
code declares success on the first match; it never looks to see if there is more than one match.
code declares success on the first match; it never looks to see if there is more than one match.


If the code cannot find a match, it throws.
If the code cannot find a match, it throws an exception.


A search path is specified in an environment variable as colon separated list of directories.
A search path is specified in an environment variable as a colon separated list of directories.
This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.
This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.


The search algorithm treats absolute paths specially.  The code can be configured so that, for absolute
Mu2e has chosen to configure the search algorithm so that, with one exception, absolute paths are
paths, it ignores the search path and simply looks to see if there is a file at the absolute path.
forbidden. That exception is for the -c command line argument of the mu2e command.  If an absolute
If there is no such file, it throwsAlternatively it can be configured to disallow absolute paths
path is specified in any other context the code will throw an exception.   
and to only look for files relative to the search path; in this case it throws as soon as it sees
Mu2e has also chosen that the search algorithm will not look for files relative to the current working
an absolute path.  This last feature is useful when the program is being used to in production runs
directory unless that directory is included in the search path.
for which it is important to maintain a strict audit trail. Mu2e Offline is normally configured to
These are safety features to ensure that production campaigns can only reference configuration files that are source-code controlled.
disallow absolute paths; this is to make sure that we do not have a lot of work to do when we
start production.


The search algorithm can also be configured to treat paths with a leading "." with the same rules
as apply to absolute paths.


Both tools have the same policy, that the file must be found exactly by adding the requested relative
For those not familiar with the concept of a search path, the foillowing example explains it.
path onto each element of the search path. Suppose that we wisht to specify a file:
Suppose that we wish to specify a file:
<pre>
<pre>
/A/B/C/D/E/F.txt
/A/B/C/D/E/F.txt
</pre>
</pre>
And the setup scripts have defined:
And the setup scripts have defined:
<pre>
<pre>
export MU2E_FILE_PATH=/A/B/I/:/A/G/I:/A/B/C
export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C
</pre>
</pre>
If we search for the file "F.txt" it will not be resolved because the system only looks for the following
If we search for the file "F.txt" it will not be resolved because the system only looks for the following
Line 42: Line 38:
path were
path were
<pre>
<pre>
export MU2E_FILE_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E
export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E
</pre>
</pre>
then "F.txt" would match with the last element.
then "F.txt" would match with the last element.




Three important files are not covered by this policy:
Three important classes of files are not covered by this policy:
event-data input files, event-data output files and the root file managed
event-data input files, event-data output files and the root output file managed
by the TFileService.  These files are managed by other facilties which
by the TFileService.
allow only two options: an absolute path or a path that is relative to the
current working directory at the time of execution.


==FHICL_FILE_PATH==
==FHICL_FILE_PATH==
Line 57: Line 51:
This environment variable is used by art to find .fcl files.
This environment variable is used by art to find .fcl files.


When you run art, it looks for the .fcl file named with the -c argument in the current working directory; if that fails it looks for the file relateive to the search path defined by the environment variable FHICL_FILE_PATH.  When art looks for files referenced by #include directives it only looks for files relative to FHICL_FILE_PATH; it does NOT look for files relative to the current working directory unless that directory is included in FHICL_FILE_PATH (which it normally is). The definition of FHICL_FILE_PATH is done when you issue the command "muse setup".  When you issue that command in a Mu2e working directory without a backing release it is set to
When you run art, it looks for the .fcl file named with the -c argument in the current working directory; if that fails it looks for the file relateive to the search path defined by the environment variable FHICL_FILE_PATH.  When art looks for files referenced by #include directives it only looks for files relative to FHICL_FILE_PATH; it does NOT look for files relative to the current working directory unless that directory is included in FHICL_FILE_PATH (which it normally is). The definition of FHICL_FILE_PATH is done when you issue the command "muse setup".  When you issue that command in a Mu2e working directory without a backing release and with Offline cloned in your Muse working area, FHICL_FILE_PATH is set to
<pre>
<pre>
export FHICL_FILE_PATH=${MUSE_BUILD_DIR}:${MUSE_WORK_DIR}
export FHICL_FILE_PATH=${MUSE_BUILD_DIR}:${MUSE_WORK_DIR}
</pre>
</pre>
Why is MUSE_BUILD_DIR included?  This allows for complex fcl files to be built by scripts that are run during "muse build".  Such files are located in the build area.
Why is MUSE_BUILD_DIR included?  This allows for complex fcl files to be built by scripts that are run during "muse build".  Such files are located in the build area. A Muse working directory without a backing release and without Offline is not a common thing to do; if you have such a working area, the FHICL_FILE_PATH contains only ${MUSE_WORK_DIR}.
 


When you do "muse setup" in a working area that contains a backing release, muse defines FHICL_FILE_PATH by the following rule:
When you do "muse setup" in a working area that contains a backing release, muse defines FHICL_FILE_PATH by the following rule:
Line 84: Line 79:
==MU2E_SEARCH_PATH==
==MU2E_SEARCH_PATH==


The meaning of this environment variable is defined by Mu2e.  The value is set in setup.sh.
Mu2e code uses this environment variable to search for auxilliary files.
The environment variable is defined when you run "mu2e setup".  The variable is defned to be
 
export MU2E_SEARCH_PATH=${FHICL_FILE_PATH}:${MU2E_DATA_PATH};


All of the non-fcl run-time configuration and one special data stream search for their files using
where MU2E_DATA_PATH normally points to /cvmfs/mu2e.opensciencegrid.org/DataFiles/ but can be configured to other  
the environment variable MU2E_SEARCH_PATH. This distinction is a historical artifact: the formats
values if /cvmfs is not visible.
of these other configuration files, and the tools to read them, were defined before FHiCL was
created.


The class that supports this functionality is
Auxiliary files include:
<code>Mu2eUtilities/inc/ConfigFileLookupPolicy.hh</code>; under the covers
it uses technology from cetlib.  Throughout the Mu2e Offline documentation,
we will refer to this functionality as
"the file lookup policy".  This feature is used in:
<ol>
<ol>
  <li> Any file that is parsed by SimpleConfig.  This includes:
  <li> Any file that is parsed by SimpleConfig.  This includes:
Line 105: Line 97:
       </ul>
       </ul>
  <li> The magnetic field maps, read by the BFieldManagerMaker.
  <li> The magnetic field maps, read by the BFieldManagerMaker.
  <li> The particle data table files, read by the ParticleDataTable class.
  <li> The particle data table files, read by the ParticleDataList class.
  <li> The G4 macro file optionally read by G4_plugin.cc
  <li> The G4 macro file optionally read by G4_plugin.cc
  <li> The beam arrival time distribution read by FoilParticleGenerator( This should be moved into the conditions service ).
  <li> Some probability distribution functions represented as binned data.
  <li> Configuration data used by Ai/ML inference code.
  <li> Input particles in G4beamline format, read using <code>EventGenerator/inc/FromG4BLFile.hh</code>
  <li> Input particles in G4beamline format, read using <code>EventGenerator/inc/FromG4BLFile.hh</code>
</ol>
</ol>


In the following, it is presumed that the reader is familiar with the ideas
of [[SatelliteRelease|base releases and satellite releases]].
The environment variable MU2E_SEARCH_PATH is defined in setup.sh:
<pre>
export MU2E_SEARCH_PATH=$MU2E_BASE_RELEASE:$MU2E_DATA_PATH;
</pre>
The idea is that the code will search for files first in the base release and then in $MU2E_DATA_PATH.
Most run-time configuration files should be found under $MU2E_BASE_RELEASE.  Large files
that do not tied to a particular code version, such as magnetic field maps should
be found under $MU2E_DATA_PATH.


For Satellite releases the definition is prefixed with the Satellite release:
Most run-time configuration files should be found in one of Mu2e repositories so that they are under source code management
<pre>
and can evolve along with the code that reads them.  Large files
export MU2E_SEARCH_PATH=$MU2E_SATELLITE_RELEASE:$MU2E_BASE_RELEASE:$MU2E_DATA_PATH;
that are not tied to a particular code version, such as magnetic field maps should be found under $MU2E_DATA_PATH.
</pre>
 
The code will first look in the satellite release, then in the base release and lastly in MU2E_DATA_PATH.
==CET_PLUGIN_PATH==
 
art uses CET_PLUGIN_PATH to look for libraries that contain either art plugins or root dictionaries.  art plugins include modules, services and tools.  On startup art scans CET_PLUGIN_PATH to find all such files.  It loads the dictionaries and it records where to find plugin libraries for use when the job fcl requests to load a plugin. 
 
Very early versions of art used LD_LIBRARY_PATH for this purpose.  The are team changed this to use a new environment variable, CET_PLUGIN_PATH for two reasons:
# To reduce startup time by reducing the number of directories that had to be searched
# In preparation for builds using RPATHS, for which LD_LIBRARY_PATH is not necessary.
 
==Hint for Looking at Search Paths==
 
If you want to look at the definition of a search path you can just echo the environment variable.  However this prints everything in one line, which is often difficult to read. You make make a more readable view using, for example:
 
echo $FHICL_FILE_PATH | tr : \\n


which replaces the colons in the path with newline characters.  See man tr.


[[Category:Computing]]
[[Category:Computing]]
[[Category:Code]]
[[Category:Code]]

Latest revision as of 22:40, 22 October 2024


Introduction

Mu2e Offline supports the concept of a search path for most files that configure either the behaviour of art itself or that of the Mu2e Offline code. This means that you can specify a partial path to such a file and the code will search for that file in an ordered list of places. The code will traverse the list of places and look for the file in each place. As soon as it successfully finds the file it will open that file and continue.

What happens if the file is present at more than one place on the list? The list is ordered and the code declares success on the first match; it never looks to see if there is more than one match.

If the code cannot find a match, it throws an exception.

A search path is specified in an environment variable as a colon separated list of directories. This is in the same spirit as the well known environment variables PATH, LD_LIBRARY_PATH and PRODUCTS.

Mu2e has chosen to configure the search algorithm so that, with one exception, absolute paths are forbidden. That exception is for the -c command line argument of the mu2e command. If an absolute path is specified in any other context the code will throw an exception. Mu2e has also chosen that the search algorithm will not look for files relative to the current working directory unless that directory is included in the search path. These are safety features to ensure that production campaigns can only reference configuration files that are source-code controlled.


For those not familiar with the concept of a search path, the foillowing example explains it. Suppose that we wish to specify a file:

/A/B/C/D/E/F.txt

And the setup scripts have defined:

export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C

If we search for the file "F.txt" it will not be resolved because the system only looks for the following files: "A/B/I/F.txt", "A/G/I/F.txt", "A/B/C/F.txt". To find the file of interest with the path above, one must ask for "D/E/F.txt", which will find a match on the last element. Alternatively, if the search path were

export SEARCH_PATH=/A/B/I/:/A/G/I:/A/B/C/D/E

then "F.txt" would match with the last element.


Three important classes of files are not covered by this policy: event-data input files, event-data output files and the root output file managed by the TFileService.

FHICL_FILE_PATH

This environment variable is used by art to find .fcl files.

When you run art, it looks for the .fcl file named with the -c argument in the current working directory; if that fails it looks for the file relateive to the search path defined by the environment variable FHICL_FILE_PATH. When art looks for files referenced by #include directives it only looks for files relative to FHICL_FILE_PATH; it does NOT look for files relative to the current working directory unless that directory is included in FHICL_FILE_PATH (which it normally is). The definition of FHICL_FILE_PATH is done when you issue the command "muse setup". When you issue that command in a Mu2e working directory without a backing release and with Offline cloned in your Muse working area, FHICL_FILE_PATH is set to

export FHICL_FILE_PATH=${MUSE_BUILD_DIR}:${MUSE_WORK_DIR}

Why is MUSE_BUILD_DIR included? This allows for complex fcl files to be built by scripts that are run during "muse build". Such files are located in the build area. A Muse working directory without a backing release and without Offline is not a common thing to do; if you have such a working area, the FHICL_FILE_PATH contains only ${MUSE_WORK_DIR}.


When you do "muse setup" in a working area that contains a backing release, muse defines FHICL_FILE_PATH by the following rule:

  1. The first element is the muse working directory
  2. The remaining elements of the path are those that would be present if you did "muse setup" in the backing directory.
  3. For nested backing release, the previous element is recursive. The spirit is, the higher a directory is in the backing heirarchy, the earlier it is in FHICL_FILE_PATH.

FHiCL is configured to allow an absolute path for the .fcl file specified on the command line but, for paths to included files, FHiCL only allows paths relative to FHICL_FILE_PATH. This is a safety feature to ensure that production campaigns only use .fcl files that are source code controlled.

There are two FHiCL prolog files that are included in many .fcl files:

#include "fcl/minimalMessageService.fcl"
#include "fcl/standardProducers.fcl"

As time goes on other such files may be defined.


As we get experience with FHiCL we will consider adjustments to these policies. The policies will be as open as possible during our development phase, with the constraint that we understand how to ensure a strict audit trail when the time comes for large scale production.

MU2E_SEARCH_PATH

Mu2e code uses this environment variable to search for auxilliary files. The environment variable is defined when you run "mu2e setup". The variable is defned to be

export MU2E_SEARCH_PATH=${FHICL_FILE_PATH}:${MU2E_DATA_PATH};

where MU2E_DATA_PATH normally points to /cvmfs/mu2e.opensciencegrid.org/DataFiles/ but can be configured to other values if /cvmfs is not visible.

Auxiliary files include:

  1. Any file that is parsed by SimpleConfig. This includes:
    • The geometry file, read by the GeometryService.
    • Some of the old style event generator run-time configuration files; newer event generator modules use fcl to get their configuration.
    • Old style conditions data that is still read with the ConditionsService. It will eventually be migrated to the ProductionsService and these files will be removed.
    • Any files included into the above three files using #include.
  2. The magnetic field maps, read by the BFieldManagerMaker.
  3. The particle data table files, read by the ParticleDataList class.
  4. The G4 macro file optionally read by G4_plugin.cc
  5. Some probability distribution functions represented as binned data.
  6. Configuration data used by Ai/ML inference code.
  7. Input particles in G4beamline format, read using EventGenerator/inc/FromG4BLFile.hh


Most run-time configuration files should be found in one of Mu2e repositories so that they are under source code management and can evolve along with the code that reads them. Large files that are not tied to a particular code version, such as magnetic field maps should be found under $MU2E_DATA_PATH.

CET_PLUGIN_PATH

art uses CET_PLUGIN_PATH to look for libraries that contain either art plugins or root dictionaries. art plugins include modules, services and tools. On startup art scans CET_PLUGIN_PATH to find all such files. It loads the dictionaries and it records where to find plugin libraries for use when the job fcl requests to load a plugin.

Very early versions of art used LD_LIBRARY_PATH for this purpose. The are team changed this to use a new environment variable, CET_PLUGIN_PATH for two reasons:

  1. To reduce startup time by reducing the number of directories that had to be searched
  2. In preparation for builds using RPATHS, for which LD_LIBRARY_PATH is not necessary.

Hint for Looking at Search Paths

If you want to look at the definition of a search path you can just echo the environment variable. However this prints everything in one line, which is often difficult to read. You make make a more readable view using, for example:

echo $FHICL_FILE_PATH | tr : \\n

which replaces the colons in the path with newline characters. See man tr.