ConditionsMaintenance: Difference between revisions
(Created page with "==Conditions data maintenance== During data-taking the detector experts responsible for calibrations will need to continually update the database with calibrations for the ne...") |
|||
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== | ==Introduction== | ||
During data-taking the detector experts responsible for calibrations will need to continually update the database with calibrations for the new data. This | During data-taking the detector experts responsible for calibrations (calibrators) will need to continually update the database with calibrations for the new data. A calibration manager will need to collect the inputs from the detector calibrators. This page explains their procedures. | ||
The calibrator should be familiar with the rules of a table from when that table was created. The most important rules are those concerning the column, or columns, which makes the rows unique. This is typically a channel number. The table code and database constraints may have assumptions built in, for example, it may be required that the channel number is sequential. This channel number may be dense or sparse, or have a custom meaning. The calibrator should also know what precision is required in the floating point numbers since recording more significant digits than necessary wastes space and time. And, of course, the calibrator must know their detector, such as when and how to produce a new calibration. | The calibrator should be familiar with the rules of a table from when that table was created. The most important rules are those concerning the column, or columns, which makes the rows unique. This is typically a channel number. The table code and database constraints may have assumptions built in, for example, it may be required that the channel number is sequential. This channel number may be dense or sparse, or have a custom meaning. The calibrator should also know what precision is required in the floating point numbers since recording more significant digits than necessary wastes space and time. And, of course, the calibrator must know their detector, such as when and how to produce a new calibration. | ||
Line 7: | Line 7: | ||
All commit functions have a dry-run where the commit will be done, then rolled back, to fully check what will happen. Also, if a mistake is committed, it can always be ignored by simply performing a second commit correctly and carrying the new correct result forward to the next step. Typically, nothing is ever deleted. | All commit functions have a dry-run where the commit will be done, then rolled back, to fully check what will happen. Also, if a mistake is committed, it can always be ignored by simply performing a second commit correctly and carrying the new correct result forward to the next step. Typically, nothing is ever deleted. | ||
The calibrator will need the [[ | The calibrator will need the [[#permissions_and_roles|permission]] (or role in database parlance) appropriate to their detector, such as trk_role or cal_role. They will also need val_role. | ||
==Calibrator Procedures== | |||
These are the steps that the detector expert would take. | |||
===Committing a calibration=== | ===Committing a calibration=== | ||
Line 31: | Line 34: | ||
</pre> | </pre> | ||
The file format allows for a interval of validity (IOV) to be included for the calibration data. | The [[ConditionsData#text_file_format|file format]] allows for a interval of validity (IOV) to be included for the calibration data. By default, the IOV will be ignored by commit-calibration. Optionally, the IOV in the file can also be committed, attached to the appropriate new calibration. Further, the new IOV can be put in a new group. | ||
> dbTool commit-calibration --file FILENAME --addIOV --addGroup | |||
created calibration for TstCalib1 with 3 rows, new cid is 58 | |||
created calibration for TstCalib1 with 3 rows, new cid is 59 | |||
new IID is 44 | |||
new IID is 45 | |||
new GID is 15 | |||
===Committing an interval of validity=== | ===Committing an interval of validity=== | ||
Line 61: | Line 71: | ||
It is this GID which you pass to the database or production manager, who will include it in the calibration set, along with the GID from other detector groups. | It is this GID which you pass to the database or production manager, who will include it in the calibration set, along with the GID from other detector groups. | ||
==Calibration | ==Calibration Manager Procedures== | ||
A calibration set is | A calibration set is | ||
Line 72: | Line 82: | ||
When an executable starts the DbService, set to a particular purpose and version, then DbService makes the data in the calibration set available on demand. | When an executable starts the DbService, set to a particular purpose and version, then DbService makes the data in the calibration set available on demand. | ||
During data-taking, the | During data-taking, the calibration managers will need to continually update the calibration set contents as calibrators enter new data and send their lists of GID's. This section explains these procedures. This section assumes the reader is familiar with the [[ConditionsDbSchema]]. | ||
===Extend a calibration set=== | ===Extend a calibration set=== | ||
Line 85: | Line 95: | ||
once an extension is created it is permanent and can't be deleted or undone. If there is a mistake, the procedure is to create a new VERSION, copy over the correct part, then create a new correct extension. | once an extension is created it is permanent and can't be deleted or undone. If there is a mistake, the procedure is to create a new VERSION, copy over the correct part, then create a new correct extension. | ||
Note that in this step, the calibration set completeness is not checked. For example, if this set requires 10 tables, you can commit an extension for Run X with only 7 of the tables, and later commit an extension with the other 3. | Note that in this step, the calibration set completeness is not checked. For example, if this set requires 10 tables, you can commit an extension for Run X with only 7 of the tables, and later commit an extension with the other 3, see also [[#Checking_a_calibration_set|checking a set]]. During command is given to create an extension, but before the database is altered, there is a check that the extension does not overlap any existing IoV's and the command exits if there is any detected. | ||
===Checking a calibration set=== | |||
Before using a calibration set, you will want to ask if all the tables have IOV's for a given set of runs. The following command is an example of this check | |||
dbTool verify-set --purpose PRODUCTION --version v1_1 --run 1101,1103,1105-1107,1108:20-1108:70 | |||
The runs are listed in [[ConditionsData#Intervals_of_validity|canonical format]]. The command will report which tables are missing for which runs. Eventually, the completeness should be checked against a good run list and this capability will be developed. | |||
===Create a new calibration set=== | ===Create a new calibration set=== | ||
Line 119: | Line 137: | ||
If the new version is a small correction or change from another existing calibration set, then some hand work, possibly automated in the future, may be necessary. For example, if the new version was a matter of adding a table to an old version, you would first create an extension that contains all the groups from the old set. Then create a group that contains the new table, and add that with an extension. If the new version is a correction to an old set, then you might dump the groups associated with the old set ("print-set"), delete the bad group number, add the repaired group number, and commit that as the first extension. | If the new version is a small correction or change from another existing calibration set, then some hand work, possibly automated in the future, may be necessary. For example, if the new version was a matter of adding a table to an old version, you would first create an extension that contains all the groups from the old set. Then create a group that contains the new table, and add that with an extension. If the new version is a correction to an old set, then you might dump the groups associated with the old set ("print-set"), delete the bad group number, add the repaired group number, and commit that as the first extension. | ||
===Patching=== | |||
Some expected scenarios which require a new calibration set version are | |||
* a new table, with content and IoVs needs to be added to a existing set | |||
* a mistake was made in extending a set, and now correct content needs to be patched in | |||
* some improved content was developed for a table and that now needs to be patched in | |||
A new set version is required because the content of the set will change. In this case, changing a set is not allowed and the only action is to create a new version of the set. | |||
A dbTool function is available to patch an existing calibration set with a new new set of GIDs. To run this tool, there needs to be the old version, the new version (created using a separate commit-version command) and a set of GIDs that represent the new content. If the GIDs contain new tables, they are simply added. If the old version contains tables which are not in the new version, those tables and all their content are not copied over. If the GIDs represent changes to the content, then each table and its IoVs are analyzed to see which parts need to be copied from the old version, which parts replace content from the old version, and which parts add new content. Since some new ioVs might partially overlap old ioVs, some new ioVs might be created automatically. For example, if the existing IoV is for 1000-1005 and the patch GID is for run 1003, then the result is an new IoV for 1000-1002, with old content, the new ioV for 1003, and a new IoV for 1004-1005 for the old content. | |||
dbTool commit-patch --old_purpose PRODUCTION --old_version v1_1 \ | |||
--purpose PRODUCTION --version v1_2 \ | |||
--gid 123,124 | |||
==Conventions== | |||
===permissions and roles=== | |||
The current (10/2018) permissions. Access to write to the database is allowed by a kerberos ticket. When a username is recorded in the database, it is the account name, which is the same as the kerberos name. Anonymous users can browse the database using SQL and the '''mu2e_reader''' read-only account (ask for password). | |||
{|style="width: 50%;text-align:right;" | |||
|- | |||
!style="width:30%;text-align:left;"|'''Permissions''' | |||
!style="width:20%;text-align:left;"|'''users''' | |||
|- | |||
|style="text-align:left;" | ADMIN_ROLE ||style="text-align:left;" | gandr,kutschke,rlc | |||
|- | |||
| style="text-align:left;" | VAL_ROLE, MANAGER_ROLE || style="text-align:left;" | kutschke,rlc,rbonvent | |||
|- | |||
| style="text-align:left;" | TRK_ROLE, VAL_ROLE ||style="text-align:left;" | brownd,rbonvent,edmonds | |||
|- | |||
| style="text-align:left;" | CAL_ROLE, VAL_ROLE ||style="text-align:left;" | echenard,sophie,simona | |||
|- | |||
| style="text-align:left;" | CRV_ROLE, VAL_ROLE ||style="text-align:left;" | ehrlich,oksuzian | |||
|- | |||
| style="text-align:left;" | SIM_ROLE, VAL_ROLE ||style="text-align:left;" | brownd,edmonds,sophie,gandr,kutschke | |||
|} | |||
* ADMIN_ROLE owns all tables and has complete control. It can add or drop tables, and everything in between. Only a few experts will have this role. The only regular job will be to create new calibration tables. | |||
* VAL_ROLE can make intervals of validity and make groups of IOVs. All detector calibrators will have this role. | |||
* MANAGER_ROLE can commit to the higher-level interval of validity tables. This includes declaring a new calibration table, creating a new list of calibration tables, purposes, or versions of a purpose, and extend a calibration set. It is expected that only a few active offline production managers will have this role at any one time. | |||
* TRK_ROLE can commit calibration data to tables with names Trk*, and similarly for the other detector roles. Only a few experts in each detector, with the responsibility to maintain calibrations, will have this role. | |||
==Hardware upgrade== | |||
On <code>CTASK0022375</code>, 7/2024, we demonstrated that following procedure will work to allow continuous user reads of the conditions database while changing the database hardware, for platform or postgres upgrade. | |||
* block all accounts except readonly Query Engine account <code>mu2e_web</code> | |||
* export the running database | |||
* bring up the new database and import content | |||
* turn on the new database for read and write | |||
* move the Query Engine backend to the new database | |||
* monitor the old database for 1d, then turn it off | |||
* to allow writes by <code>dbTool</code>, update <code>cvmfs/Datafile/Database/connections.txt</code> | |||
* also update <code>dbMon.sh</code> script | |||
[[Category:Computing]] | |||
[[Category:Code]] | |||
[[Category:Conditions]] |
Latest revision as of 21:08, 17 July 2024
Introduction
During data-taking the detector experts responsible for calibrations (calibrators) will need to continually update the database with calibrations for the new data. A calibration manager will need to collect the inputs from the detector calibrators. This page explains their procedures.
The calibrator should be familiar with the rules of a table from when that table was created. The most important rules are those concerning the column, or columns, which makes the rows unique. This is typically a channel number. The table code and database constraints may have assumptions built in, for example, it may be required that the channel number is sequential. This channel number may be dense or sparse, or have a custom meaning. The calibrator should also know what precision is required in the floating point numbers since recording more significant digits than necessary wastes space and time. And, of course, the calibrator must know their detector, such as when and how to produce a new calibration.
All commit functions have a dry-run where the commit will be done, then rolled back, to fully check what will happen. Also, if a mistake is committed, it can always be ignored by simply performing a second commit correctly and carrying the new correct result forward to the next step. Typically, nothing is ever deleted.
The calibrator will need the permission (or role in database parlance) appropriate to their detector, such as trk_role or cal_role. They will also need val_role.
Calibrator Procedures
These are the steps that the detector expert would take.
Committing a calibration
The calibrator would typically produce the calibration by writing a text file where each row represents a row in the database. The contents should from one logical "calibration", that is, a set of data that will be retrieved all at once for use in an event. The table name, columns, and rows must all be correctly parsed so it has a required format.
> dbTool commit-calibration --file FILENAME
If the file contained data for table TstCalib1, the response might be
created calibration for TstCalib1 with 3 rows, new cid is 3
The number labeled cid uniquely identifies this data forever. You can use this to refer to the data.
> dbTool print-table --cid 3 TABLE TstCalib1 # cid 3 # channel,flag,dtoe 0,32,1.3177 1,33,2.3166 2,31,3.3134
The file format allows for a interval of validity (IOV) to be included for the calibration data. By default, the IOV will be ignored by commit-calibration. Optionally, the IOV in the file can also be committed, attached to the appropriate new calibration. Further, the new IOV can be put in a new group.
> dbTool commit-calibration --file FILENAME --addIOV --addGroup created calibration for TstCalib1 with 3 rows, new cid is 58 created calibration for TstCalib1 with 3 rows, new cid is 59 new IID is 44 new IID is 45 new GID is 15
Committing an interval of validity
After the calibrator has committed the data to the database, they can declare what runs the data can be used with, this is called an interval of validity or IOV. The IOV is represented in a well-defined format. The IOV can be defined at the run level or at the subrun level, but not lower. A typical IOV, starting at run 1001, subrun 10 and ending on run 1002, can be represented as 1001:10-1002. The run ranges are inclusive, so run 1001, subrun 10 is included, as well as all subruns of run 1002.
An IOV is attached to calibration data, which is indicated by its CID number.
> dbTool commit-iov --cid 3 --iov 1001:10-1002
with reply
new IID is 10
The number labeled IID uniquely and permanently refers to this commit.
If the same calibration data that is valid for run X and you have committed an IOV declaring that, then it is determined that the data is also good for run Y, then the proper step is commit a second IOV declaring the same CID is valid for the run Y. In this way, many IOVs may point to the same calibration data, the same CID.
Logically, it will be important to make sure that, putting together all your relevant IOV, that all good runs have calibrations, and there is no case of overlaps, where one IOV says data A is good for run X but a second IOV says data B is is good for Run X. Once we have a good-run system in place, code will be provided to make these checks.
Committing a group
The third and final step to committing calibration data is to combine all you IOV into one group. A group is just a collection of IOV. The purpose of a group is to make it easier to refer to large sets of IOVs. For example if the same calibration data is good for both PRODUCTION and ANALYSIS calibration sets, it is easier to put the groups into both set rather recreate a new, and potentially much larger set of IOV. This layer also make repairs easier.
You provide a set of IID numbers, saved from the IOV commits. The IIDs may refer to the same table or different tables. It can even be tables in different detectors. The IOVs don't need to be adjacent in time or any other restriction. We expect that, typically, a detector expert will gather everything they need for a run, or a small set of runs, commit all those data, make IOVs, and then collect them into one group.
> dbTool commit-group --iid 10,12,13
with reply
new GID is 17
Note that the list of IIDs may be uploaded as a pointer to a file full of numbers (--iid mylist_of_iids.txt). The number labeled GID uniquely and permanently refers to this group of IOVs.
It is this GID which you pass to the database or production manager, who will include it in the calibration set, along with the GID from other detector groups.
Calibration Manager Procedures
A calibration set is
- a purpose (entry in ValPurposes, with a PID)
- a list (list of table types in valLists, with a LID)
- a version (entry in ValVersions, with a VID)
- a set of extensions (entries in ValExtensions, with an EID)
- the groups of calibration data associated with the extensions (entries in ValExtensionLists)
When an executable starts the DbService, set to a particular purpose and version, then DbService makes the data in the calibration set available on demand.
During data-taking, the calibration managers will need to continually update the calibration set contents as calibrators enter new data and send their lists of GID's. This section explains these procedures. This section assumes the reader is familiar with the ConditionsDbSchema.
Extend a calibration set
This is the most common procedure. The calibrators have send new GID's and it is time to extend a calibration set. The extension may be to add tables for more runs, or to complete the needed tables available for a run. (The tracker tables for run X were entered yesterday and now you want to enter the calorimeter tables for run X.) At this point, there exists a calibration set - a PURPOSE and VERSION. PURPOSE refers to the purpose of the set, such as "PRODUCTION", and VERSION refers to the major and minor numbers, like v1_1. This procedure extends the set, so takes a new set of tables represented by a GROUP number (or numbers), given to you by calibrators, and extends the calibration set. For example, an extension might take calibration set PRODUCTION v1_1_10, adding some groups of tables, to create PRODUCTION v1_1_11.
> dbTool commit-extension --purpose PRODUCTION --version v1_1 --gid 4
the result shows the new full version number with the new extension. --gid can take a list of numbers, or a file containing numbers (see the dbTool help). You can verify the calibration set with
> dbTool print-set --purpose PRODUCTION --version v1_1 --details
once an extension is created it is permanent and can't be deleted or undone. If there is a mistake, the procedure is to create a new VERSION, copy over the correct part, then create a new correct extension.
Note that in this step, the calibration set completeness is not checked. For example, if this set requires 10 tables, you can commit an extension for Run X with only 7 of the tables, and later commit an extension with the other 3, see also checking a set. During command is given to create an extension, but before the database is altered, there is a check that the extension does not overlap any existing IoV's and the command exits if there is any detected.
Checking a calibration set
Before using a calibration set, you will want to ask if all the tables have IOV's for a given set of runs. The following command is an example of this check
dbTool verify-set --purpose PRODUCTION --version v1_1 --run 1101,1103,1105-1107,1108:20-1108:70
The runs are listed in canonical format. The command will report which tables are missing for which runs. Eventually, the completeness should be checked against a good run list and this capability will be developed.
Create a new calibration set
A calibration set refers to a purpose and a version number (major and minor) and its extensions. A new calibration set will be needed if
- the list of table types needs to change
- a repair needs to be made to an existing calibration set
The first decision to be made is whether a new purpose needs to be made. This should usually be a very rare need, mostly there will be a few purposes (PRODUCTION, CALIBRATION,..) and as needs evolve, they will gain new version numbers. A need for a new purpose, for example, might be if the CRV group wants to run a calibration job on new data and needs the nominal calibration as input to the job. They don't care about any other detector calibrations so no existing calibration sets are appropriate.
> dbTool commit-purpose --name CRV_CALIBRATION --comment "input to the CRV calibration job"
If a new purpose is not needed, one of the existing can be chosen
> dbTool print-purposes
You should end up with a purpose identifier number, or PID.
The second decision to be made is whether a new list of table types is needed. You can see the existing lists
> dbTool print-lists
You will need a list ID (LID). You can also see what lists are associated with current versions.
> dbTool print-versions
If a new list of tables is needed, first find the tables, including their unique numeric identifiers, their TIDs.
> dbTool print-tables
To create a new list:
> dbTool commit-list --name CRV_CALIBRATION_LIST --comment "for CRV calibration job" --tids 1,2,3
which will result in a new list id, or LID.
Finally, once you have the purpose and list, you will need to decide the version numbers. If this is a new purpose, the typical major and minor number will be v1_0. Increment the major number only if there was a major milestone passed (such as the second round of production) or a major philosophical change (non-aligned to aligned detectors). Otherwise, if there is an adjustment to the table list (for example, switch from CrvGains to CrvGains2 tables), or if a repair needs to be made to an existing extension. The principle is that if a physics result can change, then at least the minor number must change. This is enforced because a PURPOSE/VERSION has a fixed table list and also can't be modified (only extended).
To create a new version, once you have the PID, the LID, the major and minor version numbers selected:
> dbTool commit-version --purpose PRODUCTION --list 3 --major 1 --minor 3 --comment "fix run 1234 of version 1_2"
The purpose and list switches accept either the text name or the numerical index.
At this point the calibration set needs to be populated with new extensions. If the calibrators are producing new tables, the process will be creating extensions.
If the new version is a small correction or change from another existing calibration set, then some hand work, possibly automated in the future, may be necessary. For example, if the new version was a matter of adding a table to an old version, you would first create an extension that contains all the groups from the old set. Then create a group that contains the new table, and add that with an extension. If the new version is a correction to an old set, then you might dump the groups associated with the old set ("print-set"), delete the bad group number, add the repaired group number, and commit that as the first extension.
Patching
Some expected scenarios which require a new calibration set version are
- a new table, with content and IoVs needs to be added to a existing set
- a mistake was made in extending a set, and now correct content needs to be patched in
- some improved content was developed for a table and that now needs to be patched in
A new set version is required because the content of the set will change. In this case, changing a set is not allowed and the only action is to create a new version of the set.
A dbTool function is available to patch an existing calibration set with a new new set of GIDs. To run this tool, there needs to be the old version, the new version (created using a separate commit-version command) and a set of GIDs that represent the new content. If the GIDs contain new tables, they are simply added. If the old version contains tables which are not in the new version, those tables and all their content are not copied over. If the GIDs represent changes to the content, then each table and its IoVs are analyzed to see which parts need to be copied from the old version, which parts replace content from the old version, and which parts add new content. Since some new ioVs might partially overlap old ioVs, some new ioVs might be created automatically. For example, if the existing IoV is for 1000-1005 and the patch GID is for run 1003, then the result is an new IoV for 1000-1002, with old content, the new ioV for 1003, and a new IoV for 1004-1005 for the old content.
dbTool commit-patch --old_purpose PRODUCTION --old_version v1_1 \ --purpose PRODUCTION --version v1_2 \ --gid 123,124
Conventions
permissions and roles
The current (10/2018) permissions. Access to write to the database is allowed by a kerberos ticket. When a username is recorded in the database, it is the account name, which is the same as the kerberos name. Anonymous users can browse the database using SQL and the mu2e_reader read-only account (ask for password).
Permissions | users |
---|---|
ADMIN_ROLE | gandr,kutschke,rlc |
VAL_ROLE, MANAGER_ROLE | kutschke,rlc,rbonvent |
TRK_ROLE, VAL_ROLE | brownd,rbonvent,edmonds |
CAL_ROLE, VAL_ROLE | echenard,sophie,simona |
CRV_ROLE, VAL_ROLE | ehrlich,oksuzian |
SIM_ROLE, VAL_ROLE | brownd,edmonds,sophie,gandr,kutschke |
- ADMIN_ROLE owns all tables and has complete control. It can add or drop tables, and everything in between. Only a few experts will have this role. The only regular job will be to create new calibration tables.
- VAL_ROLE can make intervals of validity and make groups of IOVs. All detector calibrators will have this role.
- MANAGER_ROLE can commit to the higher-level interval of validity tables. This includes declaring a new calibration table, creating a new list of calibration tables, purposes, or versions of a purpose, and extend a calibration set. It is expected that only a few active offline production managers will have this role at any one time.
- TRK_ROLE can commit calibration data to tables with names Trk*, and similarly for the other detector roles. Only a few experts in each detector, with the responsibility to maintain calibrations, will have this role.
Hardware upgrade
On CTASK0022375
, 7/2024, we demonstrated that following procedure will work to allow continuous user reads of the conditions database while changing the database hardware, for platform or postgres upgrade.
- block all accounts except readonly Query Engine account
mu2e_web
- export the running database
- bring up the new database and import content
- turn on the new database for read and write
- move the Query Engine backend to the new database
- monitor the old database for 1d, then turn it off
- to allow writes by
dbTool
, updatecvmfs/Datafile/Database/connections.txt
- also update
dbMon.sh
script