Provenance
This page is a draft, please help complete it!
This page page needs expert review!
Each data product has an associated provenance that describes how the data product was made:
- Which module made this data product?
- What parameter set was used to configure that module?
- What data products, if any, were read by that module?
Suppose that module MA has no input data products (for example an event generator) and that it produces a data product DPA. Further suppose that module MB reads DPA and produces data product DPB. In this circumstance both data products A and B have an entry in the provenance registry. Now suppose that only the data product DPB is written to the output file. When that file is read in again, the data product DPA is not present but both provenances remain in the provenance registry; art does this because the provenance of DPA is part of the provenance of DPB and art always keeps complete proveances.
If we read this output file and write a new one in which we do NOT write data product DPB, then neither DPA nor DPB will be present in the output. In this case the provenances for both DPA and DPB will be removed from the registry that is written to the output file.
The general rule is that a provenance is retained in the registry so long as at least one of the following is true:
- The data product that it describes is present in the output file.
- Any of that data products descendant data products are present in the output file.
Tools
There is an art tool to dump the list of processes which have run on a fie, based on file provenance contents:
file_info_dumper --process-history <file>
produces:
Chronological list of process names for processes that produced this file. 1. cosmics1 2. cosmics2 3. cosmics3 4. drap
and a config dumper:
config_dumper <file>
which prints the fcl config for each module that was used in the art jobs which produced the file.