ConditionsData: Difference between revisions
Line 254: | Line 254: | ||
When inputing a string to dbTool to include a comment, there are only two rules: | When inputing a string to dbTool to include a comment, there are only two rules: | ||
*if more than one word, use double quotes | |||
--comment "two words" | --comment "two words" | ||
* when using a double quote in a string, escape it | |||
--comment "two \"words\"" | --comment "two \"words\"" | ||
When writing a file that contain calibration data, the following rules apply | When writing a file that contain calibration data, the following rules apply | ||
* comments may be included by writing the hash (#) as the first character of a line. Comments are not allowed embedded in lines with data. The hash may be used in a string column. | |||
# this is a legal comment | # this is a legal comment | ||
TABLE tableName1 | TABLE tableName1 | ||
Line 275: | Line 273: | ||
3, 1.1, failed check #3 <span style="color:green">OK, legal to use hash in string</span> | 3, 1.1, failed check #3 <span style="color:green">OK, legal to use hash in string</span> | ||
* commas must be quoted | |||
TABLE tableName2 | TABLE tableName2 | ||
1, 1.2, GOOD | 1, 1.2, GOOD | ||
Line 281: | Line 279: | ||
3, 1.1, "BAD, or not" <span style="color:green">OK</span> | 3, 1.1, "BAD, or not" <span style="color:green">OK</span> | ||
* embedded double quotes must escaped or doubled | |||
TABLE tableName2 | TABLE tableName2 | ||
Line 294: | Line 292: | ||
4, 1.1, "Joe says "BAD"" <span style="color:red">will crash on parse</span> | 4, 1.1, "Joe says "BAD"" <span style="color:red">will crash on parse</span> | ||
5, 1.1, "Joe says, ""BAD""" <span style="color:green">OK, comma requires quotes, embedded quotes must be escaped or doubled</span> | 5, 1.1, "Joe says, ""BAD""" <span style="color:green">OK, comma requires quotes, embedded quotes must be escaped or doubled</span> | ||
===time standards=== | ===time standards=== |
Latest revision as of 23:33, 14 May 2024
Introduction
In Mu2e it will be necessary to maintain a database of calibration constants, also known as conditions data. This will include information like the alignment of the detector, the gas gain in straws, the time-space relationship for the straws, the gain of SiPMs, the calibration curve for ADC's and so on.
The current system is set of two postgres databases, mu2e_conditions_dev, for development by experts and mu2e_conditions_prd for users. The database infrastructure is maintained by the database group of the computing division. A user can access data by configuring a service in their art job, which reads the database using an http protocol. The system is intended to support access from high-volume grid jobs. In addition, there is a mu2e bin called dbTool which can be used for dump database contents, or maintaining database contents.
The user configures an art executable by pointing to a particular set of calibration in the database. In code, the user can access database tables directly by the DbService, but most access will be through a high-level use-friendly layer called the Proditions service. Proditions allows deriving and caching quantities computed from many database tables and other Proditions quantities, while triggering quantity updates as the underlying tables change with the run number.
This page is the right place to start to understand the system, then you can also see:
- CalibrationSets what calibration sets are available
- ConditionsMaintenance for calibrators and calibration manager
- ConditionsDbCode, ConditionsDbSchema and ProditionsCode for developers
Using the conditions database
selecting a calibration set
When accessing the conditions database, you select a purpose, such as "PRODUCTION" and version, such a "V1_1". These are entered in the DbService services stanza.
services : { DbService : { purpose : PRODUCTION version : v1_1 dbName : "mu2e_conditions_prd" textFile : ["table.txt"] verbose : 1 } }
The version numbers may have up to three fields like v1_2_3:
- 1=major version number, this changes when there is a major change in the content of the calibration set, that a user should probably be aware of, such as going from unaligned data to aligned data. The modules and services that will run and their configuration are likely to change. Your physics results will probably change.
- 2=minor version number. This changes when the list of table types changes, or if there was a repair to the calibration set. The modules and services that will run and their configuration might change. Your physics results might change.
- 3=extension number. This changes when new runs are added to the calibration set - physics results do not change, but run which previously failed to run because they had no calibrations, will now run.
You always want to explicitly provide a purpose, but if you do not, the code will assume "PRODUCTION". If you have no interest in the version number, then you can leave it blank and the code will take the highest version available. If you specify only the major version number, then the highest available minor version number will be used. Your results might change between runs of your exe. If you provide the major and minor version numbers, then any run numbers that successfully ran before will always run the same way, but runs that previously failed because no calibrations were available might succeed at a later time due to an extension of the calibration set. This (specifying major and minor) is probably the right approach for most user work. Finally, if you specify the major, minor and extension number, then you will get the exact same result every time.
The database parameter should usually be mu2e_conditions_prd, and this is the default.
The table parameter allows you to add a table to the calibration set for testing, see below.
verbose set to 0 will give no output, set to 1 is intended to give interesting brief reports, and can be set up to 10.
You can see the available purposes with
dbTool print-purpose
and you can see what versions are available with
dbTool print-version
overriding a table with a local file
The conditions code allows a user to provide a file (or several files) in the DbService fcl configuration, with the effect that the file content overrides or extends whatever is in the database. This is intended to make it easy to test new or updated tables. The table must be defined in the code in order to be included in the text file, but no database entries are needed before it can be used. The text file must be in a specific format. The file content includes table data and intervals of validity. When a handle in the user code asks for a certain table for a certain run/subrun, the code will check if the text file entries satisfy the request. If so, that text file entry is provided to the caller with no check on the actual database content. If the text file does not satisfy the request, the code will look into the database using the IOV lookup rules.
If the jobs will only use tables from the text file, and the user does not want any information to be read from the database, then the jobs should turn on the database access in the relevant Proditions entities and set the database purpose to "EMPTY". This will cause the service to be full functional, but not try to read any IOV information or table content from the database. The user will have to provide all requests in the text files.
services : { DbService : { purpose : EMPTY version : v0 # ignored dbName : "mu2e_conditions_prd" # ignored textFile : ["table.txt"] # provide everything needed verbose : 1 } }
access by dbTool
All table data can be dumped by the command line tool dbTool. For example, listing the available tables:
> dbTool print-table TID name dbname user time 1 TstCalib1 tst.calib1 rlc 2018-10-12 08:58:26.762519-05:00 2 TstCalib2 tst.calib2 rlc 2018-10-12 08:58:26.763687-05:00 3 TrkDelayPanel trk.delaypanel rlc 2018-10-12 08:59:20.519840-05:00 4 TrkPreampRStraw trk.preamprstraw rlc 2018-10-12 08:59:20.765956-05:00 ...
printing the calibration entries available for a table:
> dbTool print-calibration --name TrkPreampStraw CID Table create_user create_date CID 7 TrkPreampStraw rlc 2018-10-12 08:59:51.789062-05:00 CID 21 TrkPreampStraw rlc 2019-03-11 11:59:38.338682-05:00 CID 32 TrkPreampStraw rbonvent 2021-01-12 18:21:14.912449-06:00 CID 42 TrkPreampStraw rbonvent 2021-07-08 15:51:35.822251-05:00 ...
dumping a calibration entry in canonical format
> dbTool print-content --cid 32 TABLE TrkPreampStraw # cid 32 # index,delay_hv,delay_cal,threshold_hv,threshold_cal,gain 0,0.0,0.0,12.0,12.0,1.0 1,0.0,0.0,12.0,12.0,1.0 2,0.0,0.0,12.0,12.0,1.0 ...
It can also tell you about what sets of calibrations are available
> dbTool print-version VID PID LID maj min create_user create_date comment VID 25 14 14 1 0 rlc 2022-04-30 18:01:29.389773-05:00 initial version VID 30 14 14 1 1 rlc 2023-05-04 11:36:51.331483-05:00 "updated align, straw"
or drill down to what will be applied to a run:
> dbTool print-run --purpose MDC2020_perfect --version v1_0 --table TrkAlignTracker --run 1201 --content 0, 0_0_0,0.0,0.0,0.0,0.0,0.0,0.0
this tool has built-in help.
dbTool is also the primary method to upload new data to the database.
Access in Modules
Once the DbService is configured with a calibration set, all the data contained that set is available to any module. Since the interval of validity is not known, the services will not return any values until the file is being read and run and events numbers are defined. The intent is that only a few use cases will access the database tables directly. Most uses will access data in user-friendly containers provided by the high-level ProditionsServices. Proditions allows the creation and caching of entities (collections of numbers, classes) derived from multiple database tables and other Proditions entities. An example is a Proditions entity holding the aligned geometry might be made from several tables of alignment values. Proditions will create and cache the aligned geometry, and know when to update it if any of the underlying tables changed, as you process new run numbers. Proditions hides the low-level dependencies from the user and optimizes the creation, updating and caching. Another example is a straw model entity made from both database tables and other Prodition entities representing straw calibration or straw electronics conditions.
Accessing ProditionsService Contents
The recommended access pattern is to make the handle a module class member. You must call "get" on the handle every event to make sure it is up to date. This method will return a handle to correct version of the entity (here "Tracker", the tracker geometry class).
// in the module cc file #include "TrackerGeom/inc/Tracker.hh" #include "ProditionsService/inc/ProditionsHandle.hh" // in the class defintion ProditionsHandle<Tracker> _alignedTracker_h; // at the top of produce(art::Event& event) or analyze(art::Event& event) Tracker const& tracker = _alignedTracker_h.get(event.id()); // use tracker here... // pass tracker to utility routines: auto v = strawPostion(tracker);
Important Notes:
- The recommended practice is hold ProditionsHandles as member data of your module class and to default construct them in the c'tor. Classes and functions that are not modules should never create or hold a ProditionsHandle. (Proditions service can be accessed in other services, but the creation of the services must follow a set pattern, and the actual Proditions handles creation may need to be delayed.)
- When a module member function needs a proditions entity it should get a const& to the entity from the handle at the start of the module member function ( eg produce, analyze, filter, beginSubRun ... ). Hold the const& to the entity as a function-local variable or as member data of a function local struct; never hold the const& to the entity in a way that persists to the next event since this will prevent the entity being updated when appropriate
- When a non-module function needs an entity, that function should receive the entity as a const& argument from the calling module.
- A side effect of 3) is that some classes/functions called from module member functions will receive the entity as an argument for the sole purpose of passing it on to lower level classes/functions. This is OK. If you have many entities that need to be passed this way the recommended practice is to collect them into a small struct and pass the struct by const reference; this is slightly more efficient than passing them individually and enhances readability. The small struct should be function-local data of the module member function.
- Some Proditions quantities (such as alignment) will only be uploaded to the database at run or subrun boundaries, so it would seem to make sense to access the proditions entity in the beginRun or beginSubrun methods. While this is possible, we recommend against it because the entity update frequency may be increased without the coder knowing, and, secondly, in highly-skimmed datasets, you will be updating the entity for subruns where there are no events. If you only update in the event, it is optimal in all cases. (If no update is needed, the call is very low cost, and by only updating in the event call, you never update unecessarily.)
Accessing DbService tables
Most uses will not employ this pattern where the database table are accessed directly. Most uses will employ the Proditions service explained above.
The code pattern is to create a handle to the table (TstCalib1 in this example) as a data member in the module
#include "DbService/inc/DbHandle.hh" #include "DbTables/inc/TstCalib1.hh" namespace mu2e { class DbServiceTest : public art::EDAnalyzer { private: mu2e::DbHandle<mu2e::TstCalib1> _testCalib1; }; };
in the event method, the user must update the handle with the run number:
void DbServiceTest::analyze(const art::Event& event) { auto const& myTable = _testCalib1.get(event.id()); };
once the user has the table filled with the correct content for this event, the access can be by several methods. At this point it is important to check with experts on how the table is intended to be used. There may be a channel or index column which has a certain meaning by convention. There may or may not be random access by maps. The number of rows may be guaranteed fixed or variable. Dead channels might have flag values, etc. Some examples of access are:
int n = 0; for(auto const& r : myTable.rows()) { std::cout << "row " << n << " is channel "<<r.channel()<<" and has DtoE "<<r.dtoe()<<std::endl; n++; } int channel = 1; std::cout << "DtoE for channel "<<channel<<" is "<<myTable.row(channel).dtoe()<<std::endl; int index = 1; std::cout << "DtoE for row "<<index<<" is "<<myTable.rowAt(index).dtoe()<<std::endl;
Conventions
Intervals of validity
Intervals are inclusive, the end points stated are in the interval. You can't create an interval where the end is before the beginning, so all intervals contain at least one subrun.
String | Interpreted |
---|---|
EMPTY | 0:0-0:0 |
MAX | 0:0-999999:999999 |
ALL | 0:0-999999:999999 |
1000 | 1000:0-1000:999999 |
1000-1000 | 1000:0-1000:999999 |
1000-MAX | 1000:0-999999:999999 |
MIN-1000 | 0:0-1000:999999 |
MIN-MAX | 0:0-999999:999999 |
1000-2000 | 1000:0-2000:999999 |
1000:10-2000 | 1000:10-2000:999999 |
1000:11-1001:23 | 1000:11-1001:23 |
text file format
The text file must have the following format:
TABLE <tableName> <IOV> row1-col1, row1-col2, row1-col3 row2-col1, row2-col2, row2-col3 ...
For example:
# my comment TABLE TstCalib1 1001:2-1002 1,20,20.21 2,21,20.22 3,22,20.23
The table name must be the same as the c++ class name. The IOV text may be missing, in which case the table data applies to all runs. You can see the allowed format of the IOV text here. These items must be together on one line. When a text file is supplied as content to the DbService, the IOV is automatically active. When a text file is used to commit calibration data to the database, the IOV is ignored by default (IOV's are declared in a separate command) but, optionally, IoV's may also be committed at the same time.
The data is one line for each row in the table, typically a channel. The rows may need to be in a specific order, depending on how the table is coded and accessed. The columns are separated by commas, and in the order defined by the c++ representation of the table. Please see here for details on string columns. There may be several tables in one file and if so, each table entry must start with "TABLE" as a keyword flag.
strings
Arbitrary string input occurs at two places
- when adding comments to a dbTool create action, such as creating a new purpose or version
- when uploading a calibration table that has a string column
In general, there are three special characters to watch our for: double quotes, used for quoting ("), comma, used for column separation (,), and hash, used for comments (#).
When inputing a string to dbTool to include a comment, there are only two rules:
- if more than one word, use double quotes
--comment "two words"
- when using a double quote in a string, escape it
--comment "two \"words\""
When writing a file that contain calibration data, the following rules apply
- comments may be included by writing the hash (#) as the first character of a line. Comments are not allowed embedded in lines with data. The hash may be used in a string column.
# this is a legal comment TABLE tableName1 1, 1.2 # legal comment # legal comment - first non-whitespace char is the hash 2, 1.1 # illegal comment - will crash on parse (part of number column) TABLE tableName2 1, 1.2, GOOD 2, 1.1, BAD # malformed comment - will appear in string column 3, 1.1, failed check #3 OK, legal to use hash in string
- commas must be quoted
TABLE tableName2 1, 1.2, GOOD 2, 1.1, BAD, or not will crash on parse 3, 1.1, "BAD, or not" OK
- embedded double quotes must escaped or doubled
TABLE tableName2 1, 1.2, GOOD OK 1, 1.2, "GOOD" OK 3, 1.1, really BAD OK, multiple words OK (as long as no commas or quotes) 3, 1.1, ain't really BAD OK, single quotes OK 2, 1.1, Joe says "BAD" OK 2, 1.1, Joe says "BAD, or not" OK, comma requires quotes 3, 1.1, "Joe says \"BAD\"" OK 4, 1.1, "Joe says ""BAD""" OK 4, 1.1, "Joe says "BAD"" will crash on parse 5, 1.1, "Joe says, ""BAD""" OK, comma requires quotes, embedded quotes must be escaped or doubled
time standards
The are two uses of time in the database. The first is to record the time that a database action is taken, for example when an IOV entry is created. Columns to record these times, are declared as:
TIMESTAMP WITH TIME ZONE NOT NULL
and are filled with the built-in postgres time
CURRENT_TIMESTAMP
internally, these times are stored as UTC times in postgres internal binary format. Postgres displays these values in the ISO 8601 time format, but with the "T" that usually separates the date and time replaced by a space - postgres says: "This is for readability and for consistency with RFC 3339 as well as some other database systems."
2018-10-22 08:58:26.762519-05:00
For your convenience, a reminder that the "-05.00" means the displayed time is 5 hours ahead of the UTC time (08:58 was displayed on local clocks) and the UTC time is therefore 13:58. Note: in our database tools it might be useful to restore pure ISO 8601 by restoring the "T".
The second use is to store times that are unrelated to the database itself or the time that the database entry is made, for example, the time a run was started. In this case, we declare the
start_time TEXT NOT NULL
and fill it with string representation of UTC time in ISO 1801 format:
2018-10-22T08:58:26.762519-05:00
after this string is retrieved from the database, it may be converted to a binary format, if needed.
Verbosity levels
- 0 print nothing (default, since default is also db is off)
- 1 minimal
- print purpose/version for log file
- print when fetches go into retries (indicates system is struggling)
- 2 good default for users and production (often set with purpose/version)
- report startup time and duration
- print interpreted version string
- print IoV summary
- print engine endjob time and cache statistics
- 3 add some detail
- confirm reading text file, and which tables
- confirm engine start and early exit
- report IoV cache details
- 5 start reporting on every table fetch
- report engine initialization start and end
- print IoV chain
- report each request for table update from handle
- report if table found in text file
- report creation of temporary table ID's, and each use
- 6 more info about each fetch
- report start time and duration for each url fetch
- report each retry for a url
- report fetch status for a url
- 9 start print soem table content
- print first and last few lines from each table for each fetch
- 10 everything
- print full table content for each fetch