DQM
Introduction
Data Quality monitoring is the process of running checks on new data as it comes in, and record the results. For example, as pass1 reconstruction is run, we plan to also create and save a set of histograms. To evaluate the status of the data quality, we expect two general approaches. First, is to extract a set of simple numbers (such as the mean number of hit on a track) that would be sensitive to overall detector performance and data quality. These quantities can be saved in a database and plotted as a function of time or run number. The second general approach is to compare the histograms to previous runs, including perhaps quantitative comparison, such as a chi2. Generally, the Offline operations and online shift crews would be responsible for reviewing these monitors to spot unexpected changes.
File names
DQM file names will have a specific pattern:
ntd.mu2e.DQM_stream.process_aggregation_version.run_subrun.root
- the names, such as the data_tier, owner, and sequencer, should follow the standard file name pattern
- DQM string is always first in the description
- stream would typically represents a stream of files written by the DAQ, expected to be names as a dataset, and fed into the offline processing. This filed is epxected to be something like "ele" or "cosmic".
- process is the procedures which produced this DQM plots, say "pass1" or "pass2"
- aggregation allows for the fact that smaller DQM files are likely to be added together, so the, say, 20 files that go into a stream during a run can be added together and a DQM result can be recorded for the whole run. Some aggregation key words might be "file" for a single file, or "run" or "week".
- version is an integer and should reflect the version of process which produced the file which this DQM file represents. For example, we expect pass1 in will be stopped, repaired or improved, and then re-run on some subset of data. This will advance the pass1 version number and DQM should follow.
Database
The database is used to record numerical metrics derived by the extractors from a DQM histogram file. These metrics can then be plotted in timelines. Each entry needs to know
- the source of the metrics. Logically this is the combination of "process, stream, aggregation, version".
- the relevant run period and/or time period for the metrics.
- the values of the metrics
The source values can be derived directly from the standard DQM file name, so it will be very useful to maintain this pattern. If the source is not a file with a standard name, it can be represented by the equivalent 4 words. The more these words are standardized the more straightforward it will be to organize and search for the metrics. They shoudl be treated as case-sensitive, and all should be lower-case.
We expect that timelines can usually be adequately plotted using only the run or time of the start of the period when the metric is relevant. For example, a DAQ file might contain run 100000, subrun 0 to subrun 100. The follow file might represent subruns 101 to 150. The metrics extracted from these file could be plotted at the points 100000:0 and 100000:101. However, the database allows for the start and stop of the relevant period to be recorded. So the relevant period first file can be represented by the run range 100000:0-100000:100, using the standard run range format. We can also enter the start and stop times of the relevant period. The entry requires at least a start to the run range OR a start to the time period. The end run range and end time are optional. Both run range and time period may be present.
The numerical metric is labeled by three fields: "group, subgroup ,name". For example, for the metrics derived from histograms made in pass1, the groups might be "cal", "trk", or "crv". The subgroups might be "digi", "track", or "cluster". The "name" might be "meanEnergy", "rmsEnergy", "zeroERate". While it is possible to write a metric name like "Average momentum (MeV) for tracks with p>10 and MeV 20 hits and cal cluster E>10.0 MeV", ultimately this will be more annoying than helpful. If short names need to be documented, it is probably best to do that in a parallel system, not in the database name. No commas are allowed (csv format is used internally), and some other special characters might also fail.
The numerical metric is represented by a float for the value, a float for its uncertainty (0.0 if N/A) and an integer. The values of the integer are determined by an enum in DQM/inc/DqmValue.hh
. Zero is normal, success.
If a source and interval can apply to many metrics begin entered, the metrics can be listing, one per line, in a text file and committed in one command.
In this example, an extractor has produced a set of metrics in a text file. The extract was run on a properly-named histogram file. The run and subrun of the start of the period is taken from the file name. The time range will be null.
cat myvalues.txt cal,digi,meanE,0.125,0.001,0 cal,digi,fracZeroE,0.001,0.0,0 cal,cluster,meanE,25.2,0.12,0 cal,cluster,rmsE,2.2,0.25,0
dqmTool commit-value \ --source ntd.mu2e.DQM_ele.pass1_file_0.100000_00000100.root \ --value myvalues.txt
A single value can be committed with text:
dqmTool commit-value \ --source ntd.mu2e.DQM_ele.pass1_file_0.100000_00000100.root \ --value "cal,digi,meanE,0.125,0.001,0"
and the source can be explicit if it is not a file.
dqmTool commit-value \ --source "valNightly,reco,day,0" \ --value myvalues.txt
The commit can contain full period information. Times are in ISO 8601 format. If time zone is missing, the current time zone will be taken from the computer running the exe, and will be saved in the database in UTC.
dqmTool commit-value \ --source ntd.mu2e.DQM_ele.pass1_file_0.100000_00000100.root \ --runs "100000:100-100000:999999" \ --start "2022-01-01T16:04:10.255-6:00" \ --end "2022-01-01T22:32:44.908-6:00" \ --value myvalues.txt