ConditionsDbSchema

From Mu2eWiki
Jump to navigation Jump to search

Introduction

A database schema is the design of the tables and functions, etc, and their interactions. Here we describe the conditions database in some detail.

We are using postgres databases. In postgres, the word "schema" is also used to refer to a what is essentially a folder, or a way to group database tables and assign permssions as a group. We have several of this type of schema, one for each detector subsystem (trk,cal,crv) plus one for the interval of validity structure (val) and one for test tables (tst). A table can be referred to as schema.name, for example, we have an example table in the tst schema, tst.calib1.

In this page we will use the word calibration to refer to one logical group of data, entered into one table at the same time. For example, if we had a set of gains for all crystals and committed those, with one value (equals one row) for each channel, that would comprise a "calibration". If the next week I committed another set of gains, that would be another "calibration".

Calibration Tables

Starting with tst.calib1, we have a conceptual content of a fictional detector with 3 channels and calibration which contains a integer flag and a float, "dtoe", calibration for each channel.

TstCalib1 concept
channel flag DtoE
0 12 1.11
1 13 2.11
2 11 3.11

This set of 3 rows are a logical group and were committed together, and so it is an example of a "calibration".

In the database, this table needs to hold many calibrations and we need a way to separate them, so we add a column called cid, which stands for calibration ID. One cid number is unique across the whole database, so it labels exactly three rows in exactly this table. The actual database table looks like the following, with three calibrations entered, with cids of 1, 2 and 3.

TstCalib1
cid channel flag DtoE
1 0 12 1.11
1 1 13 2.11
1 2 11 3.11
2 0 22 1.21
2 1 23 2.21
2 2 21 3.21
3 0 32 1.3177
3 1 33 2.3166
3 2 31 3.3134

Calibration Table Support

There are two tables required to support the calibration entries. The first is a table which defines what calibration tables exist and gives them a number.


ValTables
tid (seq) name dbname create_date create_user
1 TstCalib1 tst.calib1 2018-10-12 08:58:26 rlc
2 TstCalib2 tst.calib2 2018-10-12 08:58:26 rlc

Note that this table is in the "val" schema, as its name indicates. The name is the c++ class name and the dbname is the postgres table name. The number assigned to identify the calibration table is the Table ID or TID. This is unique across the whole database. The number is a assigned as a sequence. In postgres, if a column is defined as a sequence, then you do not insert a value for that column when you insert a new row, that column's value for the new row is assigned automatically as the next higher integer in a sequence. The sequence is stored in another auto-generated auxiliary table. It is not hard to increment this sequence incidentally while not actually creating a new row, such as in the dbTool dry-run methods, so the sequence numbers are not going to be densely packed, there will be gaps - the tables might be numbered 1, 2, 5, 6, 8, etc.

This table, as many tables do, as a record of who created the row and when.

The second support table, called ValCalibrations, lists the calibration entries and assigns the CID.

ValCalibrations
cid (seq) tid create_date create_user
1 1 2018-10-12 08:58:26 rlc
2 1 2018-10-12 08:58:26 rlc
3 1 2018-10-12 08:58:26 rlc

The Calibration or CID is assigned automatically as a sequence. A row assigns a CID and defines which table contains this calibration.

When a calibrator inserts a new calibration, the dbTool code does:

  1. the table name is identified from the input file
  2. the new calibration rows are read from the input file
  3. the TID is looked up in ValTables based on the table name
  4. The TID is inserted in the ValCalibrations tables and a new CID is returned
  5. the rows are added to the calibration table (TstCalib1 in this example) with the new rows labeled by the new CID

Intervals of Validity and Groups

Logically, an interval of validity (IOV) says that a specific cid (one calibration in a specific table) is appropriate for a specific interval of subruns. We do not expect that any calibration data will need to change faster than subruns. We even anticipate that most calibration data will change at run boundaries, and only change at subruns when something breaks, such as a HV trip, or possibly to record a value drifting during a very long run.

Many of these IOVs can point to the same calibration data. For example, if a calibrator uploads a calibration for run 1 and makes an IOV tying the data and the run together, then the next day confirms that the same calibration data is also good for run 2, the procedure would be to simply add a new IOV that connects the data to run 2.

ValIovs
iid (seq) cid start_run start_subrun end_run end_subrun create_time create_user
1 1 1001 1 1001 999999 2018-10-12 08:58:26 rlc
2 2 1002 1 1004 1 2018-10-12 08:58:26 rlc
3 3 1004 2 999999 999999 2018-10-12 08:58:26 rlc

These entries are identified by a postgres sequence number, the IOV ID or IID.

The run and subrun ranges are closed and fixed in order to enforce that once IOV's are collected in a calibration set, the user's result will never change. If intervals were open-ended, then results for later runs might change. If you want this IOV to apply to all runs permanently, then use 0 for the start_run and 999999 for the end_run.

There is another conceptual level which is not address yet. This level connects this IOV to a particular set of calibrations. For example, this IOV might belong in the first pass of production, but not in the second pass, which has a whole different set of calibrations. This logical connection is made in other tables explained below. In the IOV table, we only attach a subrun range to calibration. This can be re-used for many calibration sets.

There may be many, even tens of thousands of IOV created for a calibration set, particularly if the set it is updated frequently like every run. To help organize the entries and to refer to many entries with one handle, we create groups of IOV's. These groups are recorded in two tables, ValGroups, with one row for each new group, and ValGroupLists which shows the relationship between groups and IOVS.


ValGroups
gid (seq) create_time create_user
1 2018-10-12 08:58:26 rlc
2 2018-10-12 08:58:26 rlc

The purpose of ValGroups is to create GID and record who and when. Here we created two groups.

ValGroupLists
gid iid
1 1
1 2
2 3

In this table we can see the group with GID 1 contains IID 1 and 2, while the group with GID 2 contains only IID 3.

The logical input to creating a group is simply a list of IID's. Then

  1. the IID's are checked that they are valid entries in ValIovs
  2. a new row in ValGroups is inserted and the new GID is returned
  3. the rows are made in ValGroupLists, one for each IID in the group

Calibration Set Schema

The previous sections describe the schema involved as the calibrator inserts data in the database. The calibrator's job is done when they report their new data to the production manager. The logical content of that message is "I created groups (GID's) x and y and they are ready to be including in calibration set S". This section explains the part of the schema that is maintained by a production manager, though the manager_role.

Recall that calibration sets are referred to by a purpose and version number. You must understand the numbering, or this section will not make sense.

The core concept of this section is the idea of a calibration set, defined by

  • a purpose
  • a list of table types
  • a major and minor version number

As runs are added to this calibration set, it is extended, getting addition extension version numbers.

The first tables supporting the calibration set are those describing a list of table types.

ValLists
lid (seq) name comment create_time create_user
1 TEST only calib1 2018-10-12 08:58:26 rlc

In this example, we have one row, which defines a table list with LID=1 and name "TEST". The list contents are not here, this row only declares that there is such a list.

The next table provides which tables are in the table list.

ValTableLists
lid tid
1 1

The only columns are the LID, which defines the list, and a TID, which refers to a table type in ValTables. In this example, there is only one table type in the list (TID=1, which is type TstCalib1), but if there were a second table type in the list, there would be a second row with the same LID, but a different TID.


Next, as we build the structure, we need to define purposes, which is done in ValPurposes

ValPurposes
pid (seq) name comment create_time create_user
1 TEST testing 2018-10-12 08:58:26 rlc

The row in this table just declares that there is a purpose called "TEST" and it has purpose ID, or PID=1.

Now we are ready to declare a calibration set in table ValVersions.

ValVersions
vid (seq) pid lid major minor comment create_time create_user
1 1 1 1 0 initial version 2018-10-12 08:58:26.794692-05:00 rlc

The row here declares a calibration set with version ID, or VID=1. The VID is a sequence, so is assigned by the database when you insert the rest of the row. This set has PID of 1, so the purpose is "TEST" which can be seen in the ValPurposes table. The list of tables associated with the set has list ID, or LID=1. This can be looked up in ValLists, ValTableLists, and valTables to see it is a list of one table type, TstCalib1. We have assigned major version number of 1 and minor version number of 0, so it can be requested in the fcl stanza with purpose "TEST" and version "v1_0".

At this point the calibration set is simply defined, and it does not contain any calibration data. To add calibration data, we extend this set. This creates a new extension version number. The logical input to this action is the definition of the calibration set (purpose and version number) and the list of groups (GID's) to g ointo this extension. This requires the final two tables.


ValExtensions
eid (seq) vid extension create_time create_user
1 1 0 2018-10-12 08:58:26.795703-05:00 rlc

In this example, we have one row, or one extension with assigned extension ID, or EID=1. As in the other tables with sequences, this is assigned when you insert the rest of the row. The column for vid defines this extension as applying to to calibration set defined in ValVersion with VID=1. The extension number (3rd column, not first) appears as Z in version number "vX_Y_Z". Looking up the purpose, major and minor version numbers using the VID, we find we can refer to this extension with the fcl stanza where purpose is "TEST" and the version is "v1_0_0".

The next table provides which groups of calibrations are in the extension.

ValExtensionLists
eid gid
1 1

Here only one group with GID=1 is included in the extension. If there were more groups included in the extension, there would be more rows, with the same EID and different GID.