RandomNumbersBasic: Difference between revisions
mNo edit summary |
|||
Line 74: | Line 74: | ||
As of January 2020, all Mu2e Offline code that consumes psuedo-random numbers uses an engine of the type MixMaxRng. This includes our event generators, our use of Geant4 and all of our hit making codes (see [https://mu2e-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=30006 Mu2e-doc-30006] for more info on MixMaxRng). | As of January 2020, all Mu2e Offline code that consumes psuedo-random numbers uses an engine of the type MixMaxRng. This includes our event generators, our use of Geant4 and all of our hit making codes (see [https://mu2e-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=30006 Mu2e-doc-30006] for more info on MixMaxRng). | ||
In the implementation we use MixMaxRng can be seeded by supplying a single integer in the range <nowiki>[1, 2 147 483 647]</nowiki>. Streams created from seeds differing by at least one bit somewhere are guaranteed absolutely to be independent and non-colliding for at least the next <nowiki>10^100</nowiki> random numbers. | In the implementation we use, and given the art set limits, MixMaxRng can be seeded by supplying a single integer in the range <nowiki>[1, 2 147 483 647]</nowiki>. Streams created from seeds differing by at least one bit somewhere are guaranteed absolutely to be independent and non-colliding for at least the next <nowiki>10^100</nowiki> random numbers. | ||
Because the degree of randomness provided by the seeding mechanism is sufficient for Mu2e, | Because the degree of randomness provided by the seeding mechanism is sufficient for Mu2e, |
Latest revision as of 22:59, 31 January 2020
Introduction
A typical Mu2e simulation job uses multiple independent sequences of pseudo-random numbers. The Mu2e Offline software provides tools to create these sequences, to seed them, to save their state, to restore their state and to ensure that each job in a long chain of jobs produces unique events. It also provides a way to ensure that the sequences are exactly repeatable when that is appropriate.
It is your responsibility to know when your job must use the same sequences of pseudo-random numbers as a previous job and when it must use different sequences. You need to understand which behaviour you require and to use the tools provided to implement that behaviour.
This page presents the minimum information needed to manage the repeatability or uniqueness
of pseudo-random number sequences when running Mu2e jobs. More advanced readers may also wish to read the
complete documentation
for management of of pseudo-random numbers in the Mu2e simulation software.
If you are writing modules that use random numbers, you must consult the complete documentation.
Exclusions
The information on this page applies only to Mu2e Offline jobs, which use the art framework. It does not apply to G4beamline or MARS. At last report standard Mu2e G4beamline jobs are configured so that a single pseudo-random number engine is seeded by the event number; to produce unqiue events you need only ensure that each separate run of G4beamline produces a unique range of event numbers. For additional information, consult a G4beamline or MARS expert.
Use Cases
There are two classes of use cases that are important:
Code Development
In a typical code development use case you will run your code, look at its output, modify your code, rerun it and compare the new output to the previous output. You will likely repeat this many times. If your job uses pseudo-random numbers, you would like it to use exactly the same pseudo-random numbers every time; if the pseudo-random numbers change each time you run your code, debugging your code will be very difficult because the symptoms will change on every run.
Most of the Mu2e example fcl files, in particular the Mu2eG4/test/g4test*.fcl
files,
are configured to do this by default.
Grid Jobs
Suppose that you have run one art job to generate some simulated events. After looking at these events, you decide that you want to generate more events in order to reduce the statistical errors on your results. If you reuse exactly the same fcl file, you will generate identical events, which is simply a waste of time. To get statistically independent events you need to change the seeds used by the pseudo-random number engines.
It is also critically important that each event in the two files have a unique event ID; an event ID consists
of a run number, a subrun number and an event number. This is discussed below.
A more general example is this: you wish to run 10,000 unique grid processes by
submitting 10 grid jobs of 1000 processes each; in grid-speak, one job of 1000 processes is called a "cluster".
You must ensure that events created by each grid process are independent of those created by all other grid
processes. To understand how to do this you need to know two things: how to write a fcl file to do this and how
the mu2egrid scripts will automatically do this for you.
When you run a large ensemble of grid jobs, you must ensure that each event has a unqiue event ID. The mu2egrid scripts also look after this requirement automatically.
MixMaxRng and Seeding
As of January 2020, all Mu2e Offline code that consumes psuedo-random numbers uses an engine of the type MixMaxRng. This includes our event generators, our use of Geant4 and all of our hit making codes (see Mu2e-doc-30006 for more info on MixMaxRng).
In the implementation we use, and given the art set limits, MixMaxRng can be seeded by supplying a single integer in the range [1, 2 147 483 647]. Streams created from seeds differing by at least one bit somewhere are guaranteed absolutely to be independent and non-colliding for at least the next 10^100 random numbers.
Because the degree of randomness provided by the seeding mechanism is sufficient for Mu2e, the problem of managing randomness is reduced to managing the uniqueness or repeatability of seeds.
Basic Instructions
Instructions for Code Development
Mu2e uses two art services to manage pseduo-random numbers:
- RandomNumberGenerator, a service supplied by art
- SeedService, a service now supplied by Mu2e but soon to be supplied by art
When a module wishes to use a random engine it must take two steps:
- ask the SeedService for a seed that is guaranteed unique within this art job
- pass that seed to the RandomNumberGenerator service and ask the service to instantiate a new pseudo-random engine on behalf of the module.
The module can then use the engine.
The RandomNumberGenerator service must be present in the art configuration but it has no parameters; see the line in blue in the following fcl fragment. The SeedService service must also be present in the art configuration and the normal configuration is illustrated by the lines in red in the following fcl fragment.
#include "standardServices.fcl" services : @local::Services.Sim # or # services : @local::Services.SimAndReco // ... services.SeedService.baseSeed : 8 services.SeedService.maxUniqueEngines : 20
The beginning SeedService :
says to configure the SeedService to one of its known standard configurations, named automaticSeeds. That configuration is found in the file, Offline/fcl/standardServices.fcl
:
automaticSeeds : { policy : "autoIncrement" baseSeed : nil maxUniqueEngines : nil # verbosity : 1 # endOfJobSummary : true }
This tells the seed service that two important parameters are left undefined, baseSeed
and maxEngines
;
if the end-user does not give values to these parameters, FHiCL will issue an error when any code tries
to read these parameters. The last two lines in the above fcl fragment give values to these parameters.
Taken all together, this configuration tells SeedService
that it may supply seeds for up to 20 different
engines in this job; the seeds will be the integers 0 through 19 ( baseSeed
through baseseed+maxUnqiueEngines-1
) and they will be given out in the order in which the code asks for them.
If the job tries to seed more than 20 engines, the SeedService
will print an error message and
tell art to perform a graceful shutdown. If this happens, you should increase the value of maxUniqueEngines
and rerun the job; please also send email to the Mu2e software team describing the situation in which this happened.
Current Mu2e simulation jobs use, at most, about 10 unique engines; so a value of 20 should
be safe for a while.
If you uncomment the two lines verbosity and endOfJobSummary
you will get some
informational printout, including which seed was assigned to which engine. The SeedService has many other features; for further details, see the complete instructions.
If you run the g4test_03.fcl
job once and then rerun it, the output will be identical because the
SeedService
will compute the same seeds each time.
If you wish to generate a additional events, change the value of baseSeed
to baseSeed+maxUniqueEngines
, in this case, 20. You can repeat this pattern until the baseSeed+maxUniqueEngines
excceds 900,000,000. Also remember to change the range of event ID's that are created.
This pattern guarantees that every engine in an ensemble of art jobs will have a unique seed. This is
actually a stronger requirement than we usually need. Normally it is sufficient that
baseSeed
be unique in each of an ensemble of art jobs; for example, it is OK if the Straw hit making
code in one job has the same seed as the event generator in another job. The point of maxUniqueEngines
is that we can enforce the stricter definition of uniqueness should it be important to do so.
Instructions for Grid Jobs
If you write your own grid workflow scripts you must ensure that
- Every event that you generate will have a unqiue event ID
- Every grid process that you run has a unique
baseSeed
.
If you use the mu2egrid scripts, as described in the next section, this is taken of for you.
Mu2e Grid Scripts
At this writing ( February 2015), the mu2egrid scripts presume that the grid cluster number and process number are guaranteed to be unique. These scripts take a base fcl file supplied by the user and append six additional lines:
services.user.SeedService.policy : "autoIncrement" services.user.SeedService.maxUniqueEngines : 20 services.user.SeedService.baseSeed : ${RANDOMSEED} source.firstRun : ${CLUSTER} source.firstSubRun : ${PROCESS} source.firstEvent : 1
where ${CLUSTER}
and ${PROCESS}
are the grid cluster and process numbers and where ${RANDOMSEED}
is a random number chosen on domain allowed for MixMaxRng. The value of RANDOMSEED
is chosen by the workflow script using its own random number generator.
This guarantees unique event IDs. While it does not guarantee a unique baseSeed
, a repeated baseSeed
will be extremely rare. The script checkAndMove, which is part of the mu2egrid
package will scan the output of a grid cluster to ensure that the seeds chosen for each process are unique.
If you need to ensure uniqueness across multiple clusters, you will need to script that yourself. In the future we may provide such a script.