ConditionsData: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
 
(27 intermediate revisions by the same user not shown)
Line 7: Line 7:
The current system is set of two postgres databases, '''mu2e_conditions_dev''', for development by experts and '''mu2e_conditions_prd''' for users.  The database infrastructure is maintained by the database group of the computing division. A user can access data by configuring a service in their art job, which reads the database using an http protocol.  The system is intended to support access from high-volume grid jobs.  In addition, there is a mu2e bin called '''dbTool''' which can be used for dump database contents, or maintaining database contents.
The current system is set of two postgres databases, '''mu2e_conditions_dev''', for development by experts and '''mu2e_conditions_prd''' for users.  The database infrastructure is maintained by the database group of the computing division. A user can access data by configuring a service in their art job, which reads the database using an http protocol.  The system is intended to support access from high-volume grid jobs.  In addition, there is a mu2e bin called '''dbTool''' which can be used for dump database contents, or maintaining database contents.


This page is a good place to start to understand the system, then you can also study the detailed database [[ConditionsDbSchema|schema]].
The user configures an art executable by pointing to a particular set of calibration in the database.  In code, the user can access database tables directly by the DbService, but most access will be through a high-level use-friendly layer called the Proditions service.  Proditions allows deriving and caching quantities computed from many database tables and  other Proditions quantities, while triggering quantity updates as the underlying tables change with the run number.
 
This page is the right place to start to understand the system, then you can also see:
* [[CalibrationSets]] what calibration sets are available
* [[ConditionsMaintenance]] for calibrators and calibration manager
* [[ConditionsDbCode]], [[ConditionsDbSchema]] and [[ProditionsCode]] for developers


==Using the conditions database==
==Using the conditions database==
Line 42: Line 47:


You can see the available purposes with  
You can see the available purposes with  
  dbTool print-purposes
  dbTool print-purpose
and you can see what versions are available with  
and you can see what versions are available with  
  dbTool print-versions
  dbTool print-version


===overriding a table with a local file===
===overriding a table with a local file===


The conditions code allows a user to provide a file (or several files) in the DbService fcl configuration, with the effect that the file content overrides or extends whatever is in the database. This is intended to make it easy to test new or updated tables. The table must be defined in the code in order to be included in the text file, but no database entries are needed before it can be used.  The text file must be in a [[ConditionsData#text_file_format|specific format]].
The conditions code allows a user to provide a file (or several files) in the DbService fcl configuration, with the effect that the file content overrides or extends whatever is in the database. This is intended to make it easy to test new or updated tables. The table must be defined in the code in order to be included in the text file, but no database entries are needed before it can be used.  The text file must be in a [[ConditionsData#text_file_format|specific format]]. The file content includes table data and intervals of validity. When a handle in the user code asks for a certain table for a certain run/subrun, the code will check if the text file entries satisfy the request.  If so, that text file entry is provided to the caller with no check on the actual database content.  If the text file does not satisfy the request, the code will look into the database using the IOV lookup rules.
 
If the jobs will only use tables from the text file, and the user does not want any information to be read from the database, then the jobs should turn on the database access in the relevant Proditions entities and set the database purpose to "EMPTY".  This will cause the service to be full functional, but not try to read any IOV information or table content from the database.  The user will have to provide all requests in the text files.
 
services : {
  DbService : {
      purpose :  EMPTY
      version : v0  # ignored
      dbName : "mu2e_conditions_prd"  # ignored
      textFile : ["table.txt"]  # provide everything needed
      verbose : 1
  }
}


===access by dbTool===
===access by dbTool===


All table data can be dumped by the command line tool dbTool.  For example, dumping a data table in [[ConditionsData#text_file_format|canonical format]]:
All table data can be dumped by the command line tool dbTool.  For example, listing the available tables:
<pre>
> dbTool print-table
TID            name                dbname              user              time
  1          TstCalib1            tst.calib1              rlc  2018-10-12 08:58:26.762519-05:00
  2          TstCalib2            tst.calib2              rlc  2018-10-12 08:58:26.763687-05:00
  3      TrkDelayPanel        trk.delaypanel              rlc  2018-10-12 08:59:20.519840-05:00
  4    TrkPreampRStraw      trk.preamprstraw              rlc  2018-10-12 08:59:20.765956-05:00
...
</pre>
printing the calibration entries available for a table:
<pre>
<pre>
> dbTool print-table --name TstCalib1 --pretty
> dbTool print-calibration --name TrkPreampStraw
TABLE TstCalib1
      CID          Table      create_user        create_date
#  cid 1
CID    7      TrkPreampStraw        rlc  2018-10-12 08:59:51.789062-05:00
# channel  flag  dtoe    
CID    21      TrkPreampStraw        rlc   2019-03-11 11:59:38.338682-05:00
        0,   12,  1.11
CID    32      TrkPreampStraw   rbonvent  2021-01-12 18:21:14.912449-06:00
        1,   13,  2.11
CID   42      TrkPreampStraw   rbonvent  2021-07-08 15:51:35.822251-05:00
        2,   11,  3.11
...
...
</pre>
</pre>
dumping a calibration entry in [[ConditionsData#text_file_format|canonical format]]
<pre>
> dbTool print-content --cid 32
TABLE TrkPreampStraw
#  cid 32
# index,delay_hv,delay_cal,threshold_hv,threshold_cal,gain
0,0.0,0.0,12.0,12.0,1.0
1,0.0,0.0,12.0,12.0,1.0
2,0.0,0.0,12.0,12.0,1.0
...
</pre>
It can also tell you about what sets of calibrations are available
It can also tell you about what sets of calibrations are available
<pre>
<pre>
  > dbTool print-versions
  > dbTool print-version
VID       purpose    LID   major minor          user              time                     comment
      VID PID  LID  maj  min  create_user        create_date                     comment
   1           TEST   1      1      0             rlc 2018-10-12 08:58:26.794692-05:00 initial version
VID    25   14  14    1    0         rlc   2022-04-30 18:01:29.389773-05:00                     initial version
   2          EMPTY   2      1     0              rlc 2018-10-12 08:58:26.798556-05:00  initial version
VID    30  14   14    1   1         rlc   2023-05-04 11:36:51.331483-05:00             "updated align, straw"
  3      TRK_TEST    3      1      0             rlc  2018-10-12 08:59:41.575732-05:00  first test of trk tables
</pre>
 
or drill down to what will be applied to a run:
<pre>
> dbTool print-run --purpose MDC2020_perfect --version v1_0 --table TrkAlignTracker --run 1201 --content
0, 0_0_0,0.0,0.0,0.0,0.0,0.0,0.0
</pre>
</pre>


Line 77: Line 120:


==Access in Modules==
==Access in Modules==
Once the DbService is configured with a calibration set, all the data contained that set is available to any module at any time. The intent is that only a few experts will access the database tables directly.  Most uses will access data in user-friendly containers provided by high-level services, such as accessing aligned geometry through the Geometry Service, which will access the database for the user.
 
Once the DbService is configured with a calibration set, all the data contained that set is available to any module.  Since the interval of validity is not known, the services will not return any values until the file is being read and run and events numbers are defined. The intent is that only a few use cases will access the database tables directly.  Most uses will access data in user-friendly containers provided by the high-level ProditionsServices.  Proditions allows the creation and caching of entities (collections of numbers, classes) derived from multiple database tables and other Proditions entities. An example is a Proditions entity holding the aligned geometry might be made from several tables of alignment values. Proditions will create and cache the aligned geometry, and know when to update it if any of the underlying tables changed, as you process new run numbers.  Proditions hides the low-level dependencies from the user and optimizes the creation, updating and caching. Another example is a straw model entity made from both database tables and other Prodition entities representing straw calibration or straw electronics conditions.
 
=== Accessing ProditionsService Contents===
 
The recommended access pattern is to make the handle a module class member.  You must call "get" on the handle every event to make sure it is up to date.  This method will return a handle to correct version of the entity (here "Tracker", the tracker geometry class).
 
<pre>
// in the module cc file
#include "TrackerGeom/inc/Tracker.hh"
#include "ProditionsService/inc/ProditionsHandle.hh"
 
// in the class defintion
 
ProditionsHandle<Tracker> _alignedTracker_h;
 
// at the top of produce(art::Event& event) or analyze(art::Event& event)
Tracker const& tracker = _alignedTracker_h.get(event.id());
// use tracker here...
 
// pass tracker to utility routines:
  auto v = strawPostion(tracker);
</pre>
 
Important Notes:
 
# The recommended practice is hold ProditionsHandles as member data of your module class and to default construct them in the c'tor.  Classes and functions that are not modules should never create or hold a ProditionsHandle.  (Proditions service can be accessed in other services, but the creation of the services must follow a set pattern, and the actual Proditions handles creation may need to be delayed.)
# When a module member function needs a proditions entity it should get a const& to the entity from the handle at the start of the module member function ( eg produce, analyze, filter, beginSubRun ... ). Hold the const& to the entity as a function-local variable or as member data of a function local struct; '''never hold the const& to the entity in a way that persists to the next event since this will prevent the entity being updated when appropriate'''
# When a non-module function needs an entity, that function should receive the entity as a const& argument from the calling module.
# A side effect of 3) is that some classes/functions called from module member functions will receive the entity as an argument for the sole purpose of passing it on to lower level classes/functions. This is OK.  If you have many entities that need to be passed this way the recommended practice is to collect them into a small struct and pass the struct by const reference; this is slightly more  efficient than passing them individually and enhances readability.  The small struct should be function-local data of the module member function.
# Some Proditions quantities (such as alignment) will only be uploaded to the database at run or subrun boundaries, so it would seem to make sense to access the proditions entity in the beginRun or beginSubrun methods.  While this is possible, we recommend against it because the entity update frequency may be increased without the coder knowing, and,  secondly, in highly-skimmed datasets, you will be updating the entity for subruns where there are no events.  If you only update in the event, it is optimal in all cases.  (If no update is needed, the call is very low cost, and by only updating in the event call, you never update unecessarily.)
 
=== Accessing DbService tables===
Most uses will not employ this pattern where the database table are accessed directly.  Most uses will employ the Proditions service explained above.  


The code pattern is to create a handle to the table (TstCalib1 in this example) as a data member in the module
The code pattern is to create a handle to the table (TstCalib1 in this example) as a data member in the module
Line 114: Line 190:


</pre>
</pre>
==Conditions data maintenance==
During data-taking the detector experts responsible for calibrations will need to continually update the database with calibrations for the new data.  This section explains their procedures.
The calibrator should be familiar with the rules of a table from when that table was created.  The most important rules are those concerning the column, or columns, which makes the rows unique.  This is typically a channel number.  The table code and database constraints may have assumptions built in, for example, it may be required that the channel number is sequential.  This channel number may be dense or sparse, or have a custom meaning.  The calibrator should also know what precision is required in the floating point numbers since recording more significant digits than necessary wastes space and time.  And, of course, the calibrator must know their detector, such as when and how to produce a new calibration.
All commit functions have a dry-run where the commit will be done, then rolled back, to fully check what will happen.  Also, if a mistake is committed, it can always be ignored by simply performing a second commit correctly and carrying the new correct result forward to the next step.  Typically, nothing is ever deleted.
The calibrator will need the [[ConditionsData#permissions_and_roles|permission]] (or role in database parlance) appropriate to their detector, such as trk_role or cal_role.  They will also need val_role.
===Committing a calibration===
The calibrator would typically produce the calibration by writing a text file where each row represents a row in the database. The contents should from one logical "calibration", that is, a set of data that will be retrieved all at once for use in an event.  The table name, columns, and rows must all be correctly parsed so it has a [[ConditionsData#text_file_format|required format]].
> dbTool commit-calibration --file FILENAME
If the file contained data for table TstCalib1, the response might be
created calibration for TstCalib1 with 3 rows, new cid is 3
The number labeled cid uniquely identifies this data forever.  You can use this to refer to the data.
<pre>
> dbTool print-table --cid 3
TABLE TstCalib1
#  cid 3
# channel,flag,dtoe
0,32,1.3177
1,33,2.3166
2,31,3.3134
</pre>
The file format allows for a interval of validity (IOV) to be included for the calibration data.  This will be ignored by commit-calibration.
===Committing an interval of validity===
After the calibrator has committed the data to the database, they can declare what runs the data can be used with, this is called an interval of validity or IOV.  The IOV is represented in a [[ConditionsData#Intervals_of_validity|well-defined format]]. The IOV can be defined at the run level or at the subrun level, but not lower.  A typical IOV, starting at run 1001, subrun 10 and ending on run 1002, can be represented as 1001:10-1002.  The run ranges are inclusive, so run 1001, subrun 10 is included, as well as all subruns of run 1002.
An IOV is attached to calibration data, which is indicated by its CID number.
> dbTool commit-iov --cid 3 --iov 1001:10-1002
with reply
new IID is 10
The number labeled IID uniquely and permanently refers to this commit. 
If the same calibration data that is valid for run X and you have committed an IOV declaring that, then it is determined that the data is also good for run Y, then the proper step is commit a second IOV declaring the same CID is valid for the run Y.  In this way, many IOVs may point to the same calibration data, the same CID.
Logically, it will be important to make sure that, putting together all your relevant IOV, that all good runs have calibrations, and there is no case of overlaps, where one IOV says data A is good for run X but a second IOV says data B is is good for Run X.  Once we have a good-run system in place, code will be provided to make these checks.
===Committing a group===
The third and final step to committing calibration data is to combine all you IOV into one group.  A group is just a collection of IOV.  The purpose of a group is to make it easier to refer to large sets of IOVs.  For example if the same calibration data is good for both PRODUCTION and ANALYSIS calibration sets, it is easier to put the groups into both set rather recreate a new, and potentially much larger set of IOV.  This layer also make repairs easier.
You provide a set of IID numbers, saved from the IOV commits.  The IIDs may refer to the same table or different tables.  It can even be tables in different detectors.  The IOVs don't need to be adjacent in time or any other restriction.  We expect that, typically, a detector expert will gather everything they need for a run, or a small set of runs, commit all those data, make IOVs, and then collect them into one group.
  > dbTool commit-group --iid 10,12,13
with reply
  new GID is 17
Note that the list of IIDs may be uploaded as a pointer to a file full of numbers (--iid mylist_of_iids.txt).
The number labeled GID uniquely and permanently refers to this group of IOVs.
It is this GID which you pass to the database or production manager, who will include it in the calibration set, along with the GID from other detector groups.
==Calibration set maintenance==
A calibration set is
* a purpose (entry in ValPurposes, with a PID)
* a list (list of table types in valLists, with a LID)
* a version (entry in ValVersions, with a VID)
* a set of extensions (entries in ValExtensions, with an EID)
* the groups of calibration data associated with the extensions (entries in ValExtensionLists)
When an executable starts the DbService, set to a particular purpose and version, then DbService makes the data in the calibration set available on demand.
During data-taking, the production managers will need to continually update the calibration set contents as calibrators enter new data and send their lists of GID's.  This section explains these procedures.  This section assumes the reader is familiar with the [[ConditionsDbSchema]].
===Extend a calibration set===
This is the most common procedure. The calibrators have send new GID's and it is time to extend a calibration set.  The extension may be to add tables for more runs, or to complete the needed tables available for a run. (The tracker tables for run X were entered yesterday and now you want to enter the calorimeter tables for run X.)  At this point, there exists a calibration set - a PURPOSE and VERSION.  PURPOSE refers to the purpose of the set, such as "PRODUCTION", and VERSION refers to the major and minor numbers, like v1_1.  This procedure extends the set, so takes a new set of tables represented by a GROUP number (or numbers), given to you by calibrators, and extends the calibration set. For example, an extension might take calibration set PRODUCTION v1_1_10, adding some groups of tables, to create PRODUCTION v1_1_11.
> dbTool commit-extension --purpose PRODUCTION --version v1_1 --gid 4
the result shows the new full version number with the new extension.  --gid can take a list of numbers, or a file containing numbers (see the dbTool help). You can verify the calibration set with
> dbTool print-set --purpose PRODUCTION --version v1_1 --details
once an extension is created it is permanent and can't be deleted or undone.  If there is a mistake, the procedure is to create a new VERSION, copy over the correct part, then create a new correct extension.
Note that in this step, the calibration set completeness is not checked.  For example, if this set requires 10 tables, you can commit an extension for Run X with only 7 of the tables, and later commit an extension with the other 3.  The calibration set completeness would be difficult to enforce at every commit, and would also be annoying.  Eventually the completeness can only be meaningful when checked against a good run list.  You have to ask if all the tables have IOV for a given set of runs.  The other problem you can have is overlaps, where two IOVs define two different calibrations for the same run.  You could check for this at the point of commit.  This functionality will be developed in the future.
===Create a new calibration set===
A calibration set refers to a purpose and a version number (major and minor) and its extensions. A new calibration set will be needed if
* the list of table types needs to change
* a repair needs to be made to an existing calibration set
The first decision to be made is whether a new purpose needs to be made.  This should usually be a very rare need, mostly there will be a few purposes (PRODUCTION, CALIBRATION,..) and as needs evolve, they will gain new version numbers.  A need for a new purpose, for example, might be if the CRV group wants to run a calibration job on new data and needs the nominal calibration as input to the job.  They don't care about any other detector calibrations so no existing calibration sets are appropriate.
> dbTool commit-purpose --name CRV_CALIBRATION --comment "input to the CRV calibration job"
If a new purpose is not needed, one of the existing can be chosen
> dbTool print-purposes
You should end up with a purpose identifier number, or PID.
The second decision to be made is whether a new list of table types is needed.  You can see the existing lists
> dbTool print-lists
You will need a list ID (LID). You can also see what lists are associated with current versions.
> dbTool print-versions
If a new list of tables is needed, first find the tables, including their unique numeric identifiers, their TIDs.
> dbTool print-tables
To create a new list:
> dbTool commit-list --name CRV_CALIBRATION_LIST --comment "for CRV calibration job" --tids 1,2,3
which will result in a new list id, or LID.
Finally, once you have the purpose and list, you will need to decide the version numbers.  If this is a new purpose, the typical major and minor number will be v1_0.  Increment the major number only if there was a major milestone passed (such as the second round of production) or a major philosophical change (non-aligned to aligned detectors).  Otherwise, if there is an adjustment to the table list (for example, switch from CrvGains to CrvGains2 tables), or if a repair needs to be made to an existing extension.  The principle is that if a physics result can change, then at least the  minor number must change.  This is enforced because a PURPOSE/VERSION has a fixed table list and also can't be modified (only extended).
To create a new version, once you have the PID, the LID, the major and minor version numbers selected:
> dbTool commit-version --purpose PRODUCTION --list 3 --major 1 --minor 3 --comment "fix run 1234 of version 1_2"
The purpose and list switches accept either the text name or the numerical index.
At this point the calibration set needs to be populated with new extensions.  If the calibrators are producing new tables, the process will be creating extensions. 
If the new version is a small correction or change from another existing calibration set, then some hand work, possibly automated in the future, may be necessary.  For example, if the new version was a matter of adding a table to an old version, you would first create an extension that contains all the groups from the old set.  Then create a group that contains the new table, and add that with an extension.  If the new version is a correction to an old set, then you might dump the groups associated with the old set ("print-set"), delete the bad group number, add the repaired group number, and commit that as the first extension.
===Create a new calibration table===
Any user can create a new table in their private code and put data into it using the text file mechanism.  Tables that will be created and in the database itself should be designed in consultation with an expert.  Only experts with "admin" privilege can create the table in the database.
====design considerations====
There are many consideration that go into designing a calibration table. First, what data can go into a table?  A database table is a set of rows and columns.  The rows are usually connected to physical or conceptual units, such a detector components or channels. The columns are usually the data you want associated to each detector component. For example, there might be a table where each row represents a calorimeter crystal and columns are gains and pedestals. An alignment table might have a row for each tracker panel and position offsets for columns. Here are some considerations, starting with the most important.
# Do the columns have the same row indexing?  This is a requirement.
# Are the values going to be used at the same time?  You don't want to have to fetch columns A and B, if you only want A.
# Are the values going to be updated on a similar schedule?  If you have a table with a column that is updated much less often that other data, then you will end up repeating those values, which is inefficient.
# the largest reading time overhead is probably per-table latency, so you want to have as much data in one table as you can, subject to the above constraints.
# Have you thought about what else might be needed in the near future? It will add non-trivial work to the database maintenance if you need changes to your table design.
The next major decision is how to index the rows.  There must be a column which is unique to the row.  Typically this is a channel number, but it can be any integer.  For example it might be the sparse index StrawId.  There may be 3 major methods:
# vector access. Access all rows of your table as a vector, loop over the vector, see what the index is for each row, and apply the data to your object based on the index.  For example, if there is a row of alignment data for each tracker panel, you could loop over the rows and read the index X and apply the values to the panel X object. In this case you don't need to care about the order of your rows.
# sequential access.  You commit to always keeping your rows in a fixed order 0 to N-1.  The index has these values and you always upload them in this order.  When you access the table, and you need the row for index X, you simply take position X of the vector, knowing that this corresponds to index X.  Once you have the row, you can double-check that the index agrees with the position.  Note that if you have a sparse index, such as StrawId, you can keep in your code, apart from the database table code, a conversion between the sparse index and the dense index used to label table rows.
# map access. You can design your table to keep a map of the rows where the key to the map is the index of your row, and the value is the vector position, then you can do random access for index X.  This map could be fixed or created on the fly as the table is filled.  A map like this may have some overhead cost if it is large.
Please consider how many significant digits you will need to represent your data and use the minimum.  For reproducibility, floats are essentially stored as text, so more digits require more space and time.
Name your table.  The first three characters must correspond to the schema in the database. The current schema are "Trk", "Cal", and "Crv", but more can be added.  This is important since permissions key off this text.  Then there is a string which must be unique within the schema.  So if your table contains CRV SiPM gains, the table class might be called CrvSiPMGain.  Or if it contained lots of columns concerning the SiPM, maybe call it CrvSiPM or CrvSiPMPars.
Decide if the number of rows in one commit of calibration data is fixed.  This would be recommended for any table indexed to a detector with a fixed number of channels. The code can enforce this length as a check. Channels without good data are included with nominal or flag values. An example where the length could not be fixed is variable length list of bad channels or bad runs.
====table class code====
# Copy the example code from TstCalib1.hh to the new class name and rename the class inside.
# change the nested Row class to correspond to the needed row. The unique row index should be the first column.
# column ordering should be the same everywhere
# The constructor has three arguments
## the code class name (TstCalib1)
## the database table name as schema.shortname (tst.calib1).  This must be identical the table name in the SQL creation process.
## the column names, separated by commas ("channel,flag,dtoe"), this will serve as the basis of an SQL query. The name here must be identical to the column names in the SQL table creation
# update the size() method so it is an estimate of the memory space used by this class
# if the table is to have a fixed length, implement it in nrowFix(), otherwise delete this method
# change addRow() so when it receives a vector of strings, it converts the strings to int and floats in the order you established in the Row class
# change rowToCsv so that it can convert binary rows to text. Please put in the correct precision.
# if you chose to not have a map, remove the map from the member data,  addRow() and remove the row(channel) accessor.
# add the class to <code>DbTables/src/DbTableFactory.cc</code>
Once the table class has been created, you can use the table, providing data by the text table interface.
You do not need any actual database entries.
====create in the database====
To create the table in the database, you will need an expert to run an SQL snippet.
<pre>
CREATE TABLE tst.calib1
  (cid INTEGER, channel INTEGER, flag INTEGER , dtoe NUMERIC,
  CONSTRAINT tst_calib1_pk PRIMARY KEY (cid,channel) );
GRANT SELECT ON tst.calib1 TO PUBLIC;
GRANT INSERT ON tst.calib1 TO val_role;
</pre>
# the table name must be the same as the database name in the class definition
# you must add the CID column to the first position
# floats should be represented as NUMERIC, for perfect reproducibility
# there should be a constraint that the CID and channel number are unique.  If two columns form the unique row label, then the constraint would contain both those columns, and the CID.
# GRANT SELECT to PUBLIC
# GRANT INSERT (and INSERT only) to the appropriate role.  Tables named trk.* are assigned to role trk_role.  The tables in the tst (test) schema are assigned the val role to avoid creating another role just for tst.
Finally you will need to add the table to the list of tables in the database:
dbTool commit-table --name TstCalib1 --dbname tst.calib1
This is how the system knows that this table can have calibration data attached to it.


==Conventions==
==Conventions==
===permissions and roles===
The current (10/2018) permissions.  Access to write to the database is allowed by a kerberos ticket. When a username is recorded in the database, it is the account name, which is the same as the kerberos name.  Anonymous users can browse the database using SQL and the '''mu2e_reader''' read-only account (ask for password).
{|style="width: 50%;text-align:right;"
|-
!style="width:30%;text-align:left;"|'''Permissions'''
!style="width:20%;text-align:left;"|'''users'''
|-
|style="text-align:left;" | ADMIN_ROLE ||style="text-align:left;" | gandr,kutschke,rlc
|-
| style="text-align:left;" | VAL_ROLE, MANAGER_ROLE || style="text-align:left;" | gandr,kutschke,rlc,brownd
|-
| style="text-align:left;" | TRK_ROLE, VAL_ROLE ||style="text-align:left;" | brownd,rbonvent,edmonds
|-
| style="text-align:left;" | CAL_ROLE, VAL_ROLE ||style="text-align:left;" | echenard,fcp
|-
| style="text-align:left;" | CRV_ROLE, VAL_ROLE ||style="text-align:left;" | ehrlich,oksuzian
|}
* ADMIN_ROLE owns all tables and has complete control. It can add or drop tables, and everything in between.  Only a few experts will have this role. The only regular job will be to create new calibration tables.
* VAL_ROLE can make intervals of validity and make groups of IOVs.  All detector calibrators will have this role.
* MANAGER_ROLE can commit to the higher-level interval of validity tables.  This includes declaring a new calibration table, creating a new list of calibration tables, purposes, or versions of a purpose, and extend a calibration set.  It is expected that only a few active offline production managers will have this role at any one time.
* TRK_ROLE can commit calibration data to tables with names Trk*, and similarly for the other detector roles.  Only a few experts in each detector, with the responsibility to maintain calibrations, will have this role.


===Intervals of validity===
===Intervals of validity===
Line 357: Line 222:
| 1000:11-1001:23 || 1000:11-1001:23
| 1000:11-1001:23 || 1000:11-1001:23
|}
|}


===text file format===
===text file format===
Line 375: Line 242:
3,22,20.23
3,22,20.23
</pre>
</pre>
The table name is the same as the c++ class name.  The IOV may be missing, in which case the table applies to all data.  You can see the format of the IOV text [[ConditionsData#Intervals_of_validity|here]]. The data is one line for each row in the table, typically a channel.  The rows may need to be in a specific order, depending on how the table is coded and accessed.  The columns are separated by commas, and in the order defined by the c++ representation of the table. Please see [[ConditionsData#strings|here]] for details on string columns. There may be several tables in one file.
The table name must be the same as the c++ class name.  The IOV text may be missing, in which case the table data applies to all runs.  You can see the allowed format of the IOV text [[ConditionsData#Intervals_of_validity|here]]. These items must be together on one line.  When a text file is supplied as content to the DbService, the IOV is automatically active. When a text file is used to commit calibration data to the database, the IOV is ignored by default (IOV's are declared in a separate command) but, optionally, IoV's may also be committed at the same time.
 
The data is one line for each row in the table, typically a channel.  The rows may need to be in a specific order, depending on how the table is coded and accessed.  The columns are separated by commas, and in the order defined by the c++ representation of the table. Please see [[ConditionsData#strings|here]] for details on string columns. There may be several tables in one file and if so, each table entry must start with "TABLE" as a keyword flag.


===strings===
===strings===
Line 385: Line 254:


When inputing a string to dbTool to include a comment, there are only two rules:
When inputing a string to dbTool to include a comment, there are only two rules:
<ol>
*if more than one word, use double quotes
<li> if more than one word, use double quotes</li>
  --comment "two words"  
  --comment "two words"  
<li> when using a double quote in a string, escape it</li>
* when using a double quote in a string, escape it
  --comment "two \"words\""  
  --comment "two \"words\""  
</ol>
 


When writing a file that contain calibration data, the following rules apply
When writing a file that contain calibration data, the following rules apply
<ol>
* comments may be included by writing the hash (#) as the first character of a line.  Comments are not allowed embedded in lines with data.  The hash may be used in a string column.
<li> comments may be included by writing the hash (#) as the first character of a line.  Comments are not allowed embedded in lines with data.  The hash may be used in a string column.</li>
  # this is a legal comment
  # this is a legal comment
  TABLE tableName1
  TABLE tableName1
Line 406: Line 273:
  3, 1.1, failed check #3  <span style="color:green">OK, legal to use hash in string</span>
  3, 1.1, failed check #3  <span style="color:green">OK, legal to use hash in string</span>


<li> commas must be quoted </li>
* commas must be quoted
  TABLE tableName2
  TABLE tableName2
  1, 1.2, GOOD
  1, 1.2, GOOD
Line 412: Line 279:
  3, 1.1, "BAD, or not" <span style="color:green">OK</span>
  3, 1.1, "BAD, or not" <span style="color:green">OK</span>


<li> embedded double quotes must escaped or doubled</li>
* embedded double quotes must escaped or doubled


  TABLE tableName2
  TABLE tableName2
Line 426: Line 293:
  5, 1.1, "Joe says, ""BAD""" <span style="color:green">OK, comma requires quotes, embedded quotes must be escaped or doubled</span>
  5, 1.1, "Joe says, ""BAD""" <span style="color:green">OK, comma requires quotes, embedded quotes must be escaped or doubled</span>


</ol>
===time standards===
 
===URL format===
 
The url access is through the Scientific Computing Division, database API group's [https://cdcvs.fnal.gov/redmine/projects/qengine QueryEngine] system.  This web server has a cache which will be used for any url that has the same text as a previous url, which loaded the cache. A cache entry expires in 26h.  This cache is our main guard against overloading the database by grid job database access.
 
The url comes in two forms, the "no cache" version forced the database to be read directly, skipping the web cache.  it does not update the cache. The base string are:
<pre>
_url =        "http://dbdata0vm.fnal.gov:9091/QE/mu2e/prod/app/SQ/query?";
_urlNoCache = "http://dbdata0vm.fnal.gov:9090/QE/mu2e/prod/app/SQ/query?";
</pre>
You can add parameters to the query
* t=table name
* c=selection clause
* w=where clause
* o=order clause
For example
http://dbdata0vm.fnal.gov:9091/QE/mu2e/prod/app/SQ/query?t=tst.calib1&c=flag,dtoe&w=cid:eq:2&w=channel:le:1&o-flag
 
The code returns one line for each row.  It present columns as comma-separated values.  If a column contains text with a comma or quotes, the code will return the cell in double quotes.
 
See also <code>DbService/src/DbReader.cc</code>
 
===Services Order===
<pre>
-rw-r--r-- 1 rlc mu2e 2.9K Apr 27  2018 Alignment/src/AlignmentService_service.cc
-rw-r--r-- 1 rlc mu2e 2.4K Apr 27  2018 BTrkHelper/src/BTrkHelper_service.cc
-rw-r--r-- 1 rlc mu2e 4.7K Nov 30 11:56 ConditionsService/src/ConditionsService_service.cc
-rw-r--r-- 1 rlc mu2e 1.2K Dec 14 11:51 DbExample/src/ConditionsService2_service.cc
-rw-r--r-- 1 rlc mu2e 2.2K Dec 14 11:51 DbService/src/DbService_service.cc
-rw-r--r-- 1 rlc mu2e 2.0K Apr 27  2018 G4Helper/src/G4Helper_service.cc
-rw-r--r-- 1 rlc mu2e  14K Aug 24 12:35 GeometryService/src/GeometryService_service.cc
-rw-r--r-- 1 rlc mu2e 7.4K Apr 27  2018 GeometryService/src/RecoGeometryService_service.cc
-rw-r--r-- 1 rlc mu2e  259 Apr 27  2018 GlobalConstantsService/src/GlobalConstantsService_service.cc
-rw-r--r-- 1 rlc mu2e 1.2K Dec 27 17:31 ProditionsService/src/ProditionsService_service.cc
-rw-r--r-- 1 rlc mu2e 3.5K Apr 27  2018 Sandbox/src/BarService_service.cc
-rw-r--r-- 1 rlc mu2e  947 Apr 27  2018 Sandbox/src/Bug01Service_service.cc
-rw-r--r-- 1 rlc mu2e 1.1K Apr 27  2018 Sandbox/src/FooService_service.cc
-rw-r--r-- 1 rlc mu2e 1.8K Apr 27  2018 Sandbox/src/XBarService_service.cc
-rw-r--r-- 1 rlc mu2e 9.6K Apr 27  2018 SeedService/src/SeedService_service.cc
</pre>
 
==Notes==
Relating to contents
* calorimeter [https://mu2e-docdb.fnal.gov/cgi-bin/sso/ShowDocument?docid=29007 29007]
 
Relating to infrastructure:
*Art interface:
** [https://cdcvs.fnal.gov/redmine/projects/art/wiki/C++_database_interface_for_art_users art] talk on a db interface.
** [https://cdcvs.fnal.gov/redmine/projects/csv_qe_api/wiki csv_qe_api] query_engine wrapper and generic table (`upd list -aK+ csv_qe_api`)
* [https://cdcvs.fnal.gov/redmine/projects/condb/wiki ConDB] - urls access to REST server
** [https://cdcvs.fnal.gov/redmine/projects/condb/wiki/Web_Interface ConDB Web Interface] urls that return csv, assumes time IOV
* [https://cdcvs.fnal.gov/redmine/projects/qengine query_engine wiki] - a generic query db url/REST interface
* [https://cdcvs.fnal.gov/redmine/projects/rexdb-fife/wiki/WebDataAccessCAPI Web Data Access (wda)] [https://cdcvs.fnal.gov/redmine/projects/fife/wiki/DBAccessREST manual] (ups product) c function wrapper, takes urls, runs curl, returns pointer to memory, has utilities for converting csv to values.
** nu products
* nutools - contains nova generic table
* nucond -
* [http://www-numi.fnal.gov/offline_software/srt_public_context/doc/UserManual/node9.html MINOS DB manual]
* [https://cdcvs.fnal.gov/redmine/projects/ucondb UconDB] unstructured DB
* [https://curl.haxx.se curl] [https://ec.haxx.se/ curl free book]
 
* 5/9/18 created dev conditions
psql -h ifdb04  -p 5444  mu2e_conditions_dev
* query engine
non-cached: http://dbdata0vm.fnal.gov:9090/QE/mu2e/dev/app/SQ/query?t=test
cached: http://dbdata0vm.fnal.gov:9091/QE/mu2e/dev/app/SQ/query?t=test
* setup postgresql v9_3_9
* hypernews mu2e-hn-ConditionsDB@listserv.fnal.gov 
 
 
<pre>
CREATE ROLE admin_role;
GRANT ALL to adm_role;
GRANT admin_role to gandr,kutschke,rlc;
 
CREATE ROLE trk_role;
GRANT trk_role TO brownd,rbonvent,edmonds;
 
CREATE ROLE cal_role;
GRANT cal_role TO echenard,fcp;
 
CREATE ROLE crv_role;
GRANT crv_role TO ehrlich,oksuzian;
</pre>
 
==Notes on conDB==
[https://cdcvs.fnal.gov/redmine/projects/condb/wiki ConDB] is a conditions database system provided to the neutrino experiments by the computing division.  It is
* a set of python scripts for creating data tables, in a fixed IOV schema, and uploading data
* a web server for Minerva and one for NOvA, which are customized for each service (there may be others)
* an http REST interface for delivering data at scale
These are combined with libwda (to hide the web url's and text data behind a c binary interface), and expriment code on top of that.
 
In Minerva style, the data for all the channels in a table are always uploaded at once.  In the Nova style, the data for all channels are uploaded once at the start, then subsets of channels are updated occasionally.  All data is retrieved based on two dates: the ''valid date'', the date of the data where this calibration starts to be valid, and the ''record'' date, when the data was inserted. 
 
Generally, data is retrieved by sending the event time and table name to the web server.  The code in the web server searches for the last valid date before the event time, and selects the associated data.  If there are two calibrations with the same valid date, the one with the newer record date is returned.  A user can request data for a time interval, causing all data relevant to the interval is returned as lists of data for channels, with metadata of the valid times, which have to interpreted in the user code.  As far as I can tell, Minerva retrieves data on each event based on event time.  The web server caches recent queries so often can return without going to the database.  (Note this cache is an internal custom cache, and is on top of the generic commercial web server cache used by Query Engine and nginx.)  NOvA uses run numbers for time stamps, and assumes all data is constant over the run (if I understood).  Intervals of validity are open-ended - for the newest data, the latest valid data is returned until data with a newer valid date is uploaded.  This means the results can change unless there are human controls.
 
In the NOvA style, patches can be created which have a valid start date and a valid end date.  A patch overrides the standard result.In the Nova style, ''snapshots'' are created every few weeks.  This simply marks all the table active at that time, as a way to limit how far back future searches for data have to go - it is an efficiency optimization. 
 
A user can put a ''tag'' on a table. This is a text string that marks all entries in this table up to this date.  It is typically used at the end of a production pass to save what was done.  A tag cannot be extended. In the future, tables can be retrieved based on this tag, and only data with this tag will be used in the time search algorithm to find data.  Data may also be retrieved based on the record date.  This gives another way to broadly reproduce an old result - only data entered before this date will be used in the time search algorithm.
 
''tolerances'' are used by NOvA to trim data as it is uploaded, to remove channels where the new data is similar to the previous, within tolerances.
 
In the NOvA style, each upload of data my have a ''data type'' string attached.  This is used to write MC and beam data into the same table at the same time.  Since this is just a string, it could be used many ways
 
===Thoughts for Mu2e===
 
 
1) the open-ended intervals, valid for all future data until it is overridden, can cause a user's physics result to change unexpectedly.
 
2) There is no way to restrict the list of tables that a user sees.  Imagine production pass 1 uses table A, but pass 2 uses table B.  A user could select the right set of calibrations for pass 2, but unknowingly run code that accesses obsolete table A. 


3) there is no way to extend a calibration set that is also repaired or replacedImagine production pass 1 runs for a while, the pass 2 is created and run on the same dataNew data comes in and people are analyzing both pass 1 data and pass2 data, so we want to add the new data to both passes. Another example is maintaining as current both the calibration pass and the production pass, or maintaining two different alignments for testing. The conDB concepts only allow one right answer at a time.   
The are two uses of time in the database. The first is to record the time that a database action is taken, for example when an IOV entry is createdColumns to record these times, are declared as:
TIMESTAMP WITH TIME ZONE NOT NULL
and are filled with the built-in postgres time
  CURRENT_TIMESTAMP
internally, these times are stored as UTC times in postgres internal binary formatPostgres displays these values in the [https://en.wikipedia.org/wiki/ISO_8601 ISO 8601 time format], but with the "T" that usually separates the date and time replaced by a space - postgres says: "This is for readability and for consistency with RFC 3339 as well as some other database systems."
2018-10-22 08:58:26.762519-05:00
For your convenience, a reminder that the "-05.00" means the displayed time is 5 hours ahead of the UTC time (08:58 was displayed on local clocks) and the UTC time is therefore 13:58'''Note: in our database tools it might be useful to restore pure ISO 8601 by restoring the "T"'''.


4) there is no text interface, which is useful for prototyping and for avoiding data access altogether, for example, in very high-load situations.
The second use is to store times that are unrelated to the database itself or the time that the database entry is made, for example, the time a run was started. In this case, we declare the
start_time TEXT NOT NULL
and fill it with string representation of UTC time in ISO 1801 format:
2018-10-22T08:58:26.762519-05:00
after this string is retrieved from the database, it may be converted to a binary format, if needed.


Some of these points might be addressed by precise human controls, re-uploading data multiple times labelled differently, by creating identical tables with different names, by adding accounting columns to tables, or by flattening and re-organizing our IOV structures into types of metadata calibration tables. All of these are disfavored as counter-intuitive or inefficient.
===Verbosity levels===
* 0 print nothing (default, since default is also db is off)
* 1 minimal
**  print purpose/version for log file
**  print when fetches go into retries (indicates system is struggling)
* 2 good default for users and production (often set with purpose/version)
**  report startup time and duration
**  print interpreted version string
**  print IoV summary
**  print engine endjob time and cache statistics
* 3 add some detail
**confirm reading text file, and which tables
**  confirm engine start and early exit
**  report IoV cache details
* 5 start reporting on every table fetch
**report engine initialization start and end
**  print IoV chain
**  report each request for table update from handle
**  report if table found in text file
**  report creation of temporary table ID's, and each use
* 6 more info about each fetch
** report start time and duration for each url fetch
**  report each retry for a url
** report fetch status for a url
* 9 start print soem table content
** print first and last few lines from each table for each fetch
* 10 everything
** print full table content for each fetch




Line 547: Line 342:
[[Category:Computing]]
[[Category:Computing]]
[[Category:Code]]
[[Category:Code]]
[[Category:Conditions]]

Latest revision as of 23:33, 14 May 2024

Introduction

In Mu2e it will be necessary to maintain a database of calibration constants, also known as conditions data. This will include information like the alignment of the detector, the gas gain in straws, the time-space relationship for the straws, the gain of SiPMs, the calibration curve for ADC's and so on.

The current system is set of two postgres databases, mu2e_conditions_dev, for development by experts and mu2e_conditions_prd for users. The database infrastructure is maintained by the database group of the computing division. A user can access data by configuring a service in their art job, which reads the database using an http protocol. The system is intended to support access from high-volume grid jobs. In addition, there is a mu2e bin called dbTool which can be used for dump database contents, or maintaining database contents.

The user configures an art executable by pointing to a particular set of calibration in the database. In code, the user can access database tables directly by the DbService, but most access will be through a high-level use-friendly layer called the Proditions service. Proditions allows deriving and caching quantities computed from many database tables and other Proditions quantities, while triggering quantity updates as the underlying tables change with the run number.

This page is the right place to start to understand the system, then you can also see:

Using the conditions database

selecting a calibration set

When accessing the conditions database, you select a purpose, such as "PRODUCTION" and version, such a "V1_1". These are entered in the DbService services stanza.

services : {
   DbService : {
      purpose :  PRODUCTION
      version :  v1_1
      dbName : "mu2e_conditions_prd"
      textFile : ["table.txt"]
      verbose : 1
   }
}

The version numbers may have up to three fields like v1_2_3:

  • 1=major version number, this changes when there is a major change in the content of the calibration set, that a user should probably be aware of, such as going from unaligned data to aligned data. The modules and services that will run and their configuration are likely to change. Your physics results will probably change.
  • 2=minor version number. This changes when the list of table types changes, or if there was a repair to the calibration set. The modules and services that will run and their configuration might change. Your physics results might change.
  • 3=extension number. This changes when new runs are added to the calibration set - physics results do not change, but run which previously failed to run because they had no calibrations, will now run.

You always want to explicitly provide a purpose, but if you do not, the code will assume "PRODUCTION". If you have no interest in the version number, then you can leave it blank and the code will take the highest version available. If you specify only the major version number, then the highest available minor version number will be used. Your results might change between runs of your exe. If you provide the major and minor version numbers, then any run numbers that successfully ran before will always run the same way, but runs that previously failed because no calibrations were available might succeed at a later time due to an extension of the calibration set. This (specifying major and minor) is probably the right approach for most user work. Finally, if you specify the major, minor and extension number, then you will get the exact same result every time.

The database parameter should usually be mu2e_conditions_prd, and this is the default.

The table parameter allows you to add a table to the calibration set for testing, see below.

verbose set to 0 will give no output, set to 1 is intended to give interesting brief reports, and can be set up to 10.

You can see the available purposes with

dbTool print-purpose

and you can see what versions are available with

dbTool print-version

overriding a table with a local file

The conditions code allows a user to provide a file (or several files) in the DbService fcl configuration, with the effect that the file content overrides or extends whatever is in the database. This is intended to make it easy to test new or updated tables. The table must be defined in the code in order to be included in the text file, but no database entries are needed before it can be used. The text file must be in a specific format. The file content includes table data and intervals of validity. When a handle in the user code asks for a certain table for a certain run/subrun, the code will check if the text file entries satisfy the request. If so, that text file entry is provided to the caller with no check on the actual database content. If the text file does not satisfy the request, the code will look into the database using the IOV lookup rules.

If the jobs will only use tables from the text file, and the user does not want any information to be read from the database, then the jobs should turn on the database access in the relevant Proditions entities and set the database purpose to "EMPTY". This will cause the service to be full functional, but not try to read any IOV information or table content from the database. The user will have to provide all requests in the text files.

services : {
  DbService : {
     purpose :  EMPTY
     version : v0   # ignored
     dbName : "mu2e_conditions_prd"  # ignored
     textFile : ["table.txt"]   # provide everything needed
     verbose : 1
  }
}

access by dbTool

All table data can be dumped by the command line tool dbTool. For example, listing the available tables:

> dbTool print-table
TID            name                 dbname              user              time
  1           TstCalib1            tst.calib1              rlc  2018-10-12 08:58:26.762519-05:00
  2           TstCalib2            tst.calib2              rlc  2018-10-12 08:58:26.763687-05:00
  3       TrkDelayPanel        trk.delaypanel              rlc  2018-10-12 08:59:20.519840-05:00
  4     TrkPreampRStraw      trk.preamprstraw              rlc  2018-10-12 08:59:20.765956-05:00
...

printing the calibration entries available for a table:

> dbTool print-calibration --name TrkPreampStraw
       CID          Table      create_user        create_date 
CID     7      TrkPreampStraw         rlc   2018-10-12 08:59:51.789062-05:00
CID    21      TrkPreampStraw         rlc   2019-03-11 11:59:38.338682-05:00
CID    32      TrkPreampStraw    rbonvent   2021-01-12 18:21:14.912449-06:00
CID    42      TrkPreampStraw    rbonvent   2021-07-08 15:51:35.822251-05:00
...

dumping a calibration entry in canonical format

> dbTool print-content --cid 32
TABLE TrkPreampStraw
#  cid 32
# index,delay_hv,delay_cal,threshold_hv,threshold_cal,gain
0,0.0,0.0,12.0,12.0,1.0
1,0.0,0.0,12.0,12.0,1.0
2,0.0,0.0,12.0,12.0,1.0
...

It can also tell you about what sets of calibrations are available

 > dbTool print-version
      VID  PID  LID  maj  min  create_user        create_date                     comment
VID    25   14   14    1    0         rlc   2022-04-30 18:01:29.389773-05:00                     initial version
VID    30   14   14    1    1         rlc   2023-05-04 11:36:51.331483-05:00              "updated align, straw"

or drill down to what will be applied to a run:

> dbTool print-run  --purpose MDC2020_perfect --version v1_0 --table TrkAlignTracker --run 1201 --content
0, 0_0_0,0.0,0.0,0.0,0.0,0.0,0.0

this tool has built-in help.

dbTool is also the primary method to upload new data to the database.

Access in Modules

Once the DbService is configured with a calibration set, all the data contained that set is available to any module. Since the interval of validity is not known, the services will not return any values until the file is being read and run and events numbers are defined. The intent is that only a few use cases will access the database tables directly. Most uses will access data in user-friendly containers provided by the high-level ProditionsServices. Proditions allows the creation and caching of entities (collections of numbers, classes) derived from multiple database tables and other Proditions entities. An example is a Proditions entity holding the aligned geometry might be made from several tables of alignment values. Proditions will create and cache the aligned geometry, and know when to update it if any of the underlying tables changed, as you process new run numbers. Proditions hides the low-level dependencies from the user and optimizes the creation, updating and caching. Another example is a straw model entity made from both database tables and other Prodition entities representing straw calibration or straw electronics conditions.

Accessing ProditionsService Contents

The recommended access pattern is to make the handle a module class member. You must call "get" on the handle every event to make sure it is up to date. This method will return a handle to correct version of the entity (here "Tracker", the tracker geometry class).

// in the module cc file
#include "TrackerGeom/inc/Tracker.hh"
#include "ProditionsService/inc/ProditionsHandle.hh"

// in the class defintion

 ProditionsHandle<Tracker> _alignedTracker_h;

// at the top of produce(art::Event& event) or analyze(art::Event& event)
 Tracker const& tracker = _alignedTracker_h.get(event.id());
// use tracker here...

// pass tracker to utility routines:
   auto v = strawPostion(tracker);

Important Notes:

  1. The recommended practice is hold ProditionsHandles as member data of your module class and to default construct them in the c'tor. Classes and functions that are not modules should never create or hold a ProditionsHandle. (Proditions service can be accessed in other services, but the creation of the services must follow a set pattern, and the actual Proditions handles creation may need to be delayed.)
  2. When a module member function needs a proditions entity it should get a const& to the entity from the handle at the start of the module member function ( eg produce, analyze, filter, beginSubRun ... ). Hold the const& to the entity as a function-local variable or as member data of a function local struct; never hold the const& to the entity in a way that persists to the next event since this will prevent the entity being updated when appropriate
  3. When a non-module function needs an entity, that function should receive the entity as a const& argument from the calling module.
  4. A side effect of 3) is that some classes/functions called from module member functions will receive the entity as an argument for the sole purpose of passing it on to lower level classes/functions. This is OK. If you have many entities that need to be passed this way the recommended practice is to collect them into a small struct and pass the struct by const reference; this is slightly more efficient than passing them individually and enhances readability. The small struct should be function-local data of the module member function.
  5. Some Proditions quantities (such as alignment) will only be uploaded to the database at run or subrun boundaries, so it would seem to make sense to access the proditions entity in the beginRun or beginSubrun methods. While this is possible, we recommend against it because the entity update frequency may be increased without the coder knowing, and, secondly, in highly-skimmed datasets, you will be updating the entity for subruns where there are no events. If you only update in the event, it is optimal in all cases. (If no update is needed, the call is very low cost, and by only updating in the event call, you never update unecessarily.)

Accessing DbService tables

Most uses will not employ this pattern where the database table are accessed directly. Most uses will employ the Proditions service explained above.

The code pattern is to create a handle to the table (TstCalib1 in this example) as a data member in the module

#include "DbService/inc/DbHandle.hh"
#include "DbTables/inc/TstCalib1.hh"
namespace mu2e {
  class DbServiceTest : public art::EDAnalyzer {
  private:
    mu2e::DbHandle<mu2e::TstCalib1> _testCalib1;
  };
};

in the event method, the user must update the handle with the run number:

  void DbServiceTest::analyze(const art::Event& event) {
    auto const& myTable = _testCalib1.get(event.id());
  };

once the user has the table filled with the correct content for this event, the access can be by several methods. At this point it is important to check with experts on how the table is intended to be used. There may be a channel or index column which has a certain meaning by convention. There may or may not be random access by maps. The number of rows may be guaranteed fixed or variable. Dead channels might have flag values, etc. Some examples of access are:

  int n = 0;
  for(auto const& r : myTable.rows()) {
     std::cout << "row " << n << " is channel "<<r.channel()<<" and has DtoE "<<r.dtoe()<<std::endl;
     n++;
  }

  int channel = 1;
  std::cout << "DtoE for channel "<<channel<<" is "<<myTable.row(channel).dtoe()<<std::endl;

  int index = 1;
  std::cout << "DtoE for row "<<index<<" is "<<myTable.rowAt(index).dtoe()<<std::endl;

Conventions

Intervals of validity

Intervals are inclusive, the end points stated are in the interval. You can't create an interval where the end is before the beginning, so all intervals contain at least one subrun.

String Interpreted
EMPTY 0:0-0:0
MAX 0:0-999999:999999
ALL 0:0-999999:999999
1000 1000:0-1000:999999
1000-1000 1000:0-1000:999999
1000-MAX 1000:0-999999:999999
MIN-1000 0:0-1000:999999
MIN-MAX 0:0-999999:999999
1000-2000 1000:0-2000:999999
1000:10-2000 1000:10-2000:999999
1000:11-1001:23 1000:11-1001:23


text file format

The text file must have the following format:

TABLE <tableName> <IOV>
row1-col1, row1-col2, row1-col3
row2-col1, row2-col2, row2-col3
...

For example:

# my comment
TABLE TstCalib1 1001:2-1002
1,20,20.21
2,21,20.22
3,22,20.23

The table name must be the same as the c++ class name. The IOV text may be missing, in which case the table data applies to all runs. You can see the allowed format of the IOV text here. These items must be together on one line. When a text file is supplied as content to the DbService, the IOV is automatically active. When a text file is used to commit calibration data to the database, the IOV is ignored by default (IOV's are declared in a separate command) but, optionally, IoV's may also be committed at the same time.

The data is one line for each row in the table, typically a channel. The rows may need to be in a specific order, depending on how the table is coded and accessed. The columns are separated by commas, and in the order defined by the c++ representation of the table. Please see here for details on string columns. There may be several tables in one file and if so, each table entry must start with "TABLE" as a keyword flag.

strings

Arbitrary string input occurs at two places

  1. when adding comments to a dbTool create action, such as creating a new purpose or version
  2. when uploading a calibration table that has a string column

In general, there are three special characters to watch our for: double quotes, used for quoting ("), comma, used for column separation (,), and hash, used for comments (#).

When inputing a string to dbTool to include a comment, there are only two rules:

  • if more than one word, use double quotes
--comment "two words" 
  • when using a double quote in a string, escape it
--comment "two \"words\"" 


When writing a file that contain calibration data, the following rules apply

  • comments may be included by writing the hash (#) as the first character of a line. Comments are not allowed embedded in lines with data. The hash may be used in a string column.
# this is a legal comment
TABLE tableName1
1, 1.2
# legal comment
   # legal comment - first non-whitespace char is the hash
2, 1.1 # illegal comment - will crash on parse (part of number column)
TABLE tableName2
1, 1.2, GOOD
2, 1.1, BAD # malformed comment - will appear in string column
3, 1.1, failed check #3  OK, legal to use hash in string
  • commas must be quoted
TABLE tableName2
1, 1.2, GOOD
2, 1.1, BAD, or not will crash on parse
3, 1.1, "BAD, or not" OK
  • embedded double quotes must escaped or doubled
TABLE tableName2
1, 1.2, GOOD OK
1, 1.2, "GOOD" OK
3, 1.1, really BAD  OK, multiple words OK (as long as no commas or quotes)
3, 1.1, ain't really BAD  OK, single quotes OK
2, 1.1, Joe says "BAD" OK
2, 1.1, Joe says "BAD, or not" OK, comma requires quotes
3, 1.1, "Joe says \"BAD\"" OK
4, 1.1, "Joe says ""BAD""" OK
4, 1.1, "Joe says "BAD"" will crash on parse
5, 1.1, "Joe says, ""BAD""" OK, comma requires quotes, embedded quotes must be escaped or doubled

time standards

The are two uses of time in the database. The first is to record the time that a database action is taken, for example when an IOV entry is created. Columns to record these times, are declared as:

TIMESTAMP WITH TIME ZONE NOT NULL

and are filled with the built-in postgres time

 CURRENT_TIMESTAMP

internally, these times are stored as UTC times in postgres internal binary format. Postgres displays these values in the ISO 8601 time format, but with the "T" that usually separates the date and time replaced by a space - postgres says: "This is for readability and for consistency with RFC 3339 as well as some other database systems."

2018-10-22 08:58:26.762519-05:00

For your convenience, a reminder that the "-05.00" means the displayed time is 5 hours ahead of the UTC time (08:58 was displayed on local clocks) and the UTC time is therefore 13:58. Note: in our database tools it might be useful to restore pure ISO 8601 by restoring the "T".

The second use is to store times that are unrelated to the database itself or the time that the database entry is made, for example, the time a run was started. In this case, we declare the

start_time TEXT NOT NULL

and fill it with string representation of UTC time in ISO 1801 format:

2018-10-22T08:58:26.762519-05:00

after this string is retrieved from the database, it may be converted to a binary format, if needed.

Verbosity levels

  • 0 print nothing (default, since default is also db is off)
  • 1 minimal
    • print purpose/version for log file
    • print when fetches go into retries (indicates system is struggling)
  • 2 good default for users and production (often set with purpose/version)
    • report startup time and duration
    • print interpreted version string
    • print IoV summary
    • print engine endjob time and cache statistics
  • 3 add some detail
    • confirm reading text file, and which tables
    • confirm engine start and early exit
    • report IoV cache details
  • 5 start reporting on every table fetch
    • report engine initialization start and end
    • print IoV chain
    • report each request for table update from handle
    • report if table found in text file
    • report creation of temporary table ID's, and each use
  • 6 more info about each fetch
    • report start time and duration for each url fetch
    • report each retry for a url
    • report fetch status for a url
  • 9 start print soem table content
    • print first and last few lines from each table for each fetch
  • 10 everything
    • print full table content for each fetch