Provenance: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
(Created page with " Each data product has an associated provenance that describes how the data product was made: <ul> <li> Which module made this data product? <li> What parameter set was u...")
 
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Draft}}
{{Expert}}


Each data product has an associated provenance that describes how the data product was made:
Each data product has an associated provenance that describes how the data product was made:
Line 21: Line 24:
   <li> Any of that data products descendant data products are present in the output file.
   <li> Any of that data products descendant data products are present in the output file.
</ul>
</ul>
==Tools==
There is an art tool to dump the list of processes which have run on a fie, based on file provenance contents:
file_info_dumper --process-history <file>
produces:
<pre>
Chronological list of process names for processes that
produced this file.
    1. cosmics1
    2. cosmics2
    3. cosmics3
    4. drap
</pre>
and a config dumper:
config_dumper <file>
which prints the fcl config for each module that was used in the art jobs which produced the file.
[[Category:Computing]]
[[Category:Code]]

Latest revision as of 15:19, 14 April 2017


Construction.jpeg This page is a draft, please help complete it!

Expert.jpeg This page page needs expert review!

Each data product has an associated provenance that describes how the data product was made:

  • Which module made this data product?
  • What parameter set was used to configure that module?
  • What data products, if any, were read by that module?

Suppose that module MA has no input data products (for example an event generator) and that it produces a data product DPA. Further suppose that module MB reads DPA and produces data product DPB. In this circumstance both data products A and B have an entry in the provenance registry. Now suppose that only the data product DPB is written to the output file. When that file is read in again, the data product DPA is not present but both provenances remain in the provenance registry; art does this because the provenance of DPA is part of the provenance of DPB and art always keeps complete proveances.

If we read this output file and write a new one in which we do NOT write data product DPB, then neither DPA nor DPB will be present in the output. In this case the provenances for both DPA and DPB will be removed from the registry that is written to the output file.

The general rule is that a provenance is retained in the registry so long as at least one of the following is true:

  • The data product that it describes is present in the output file.
  • Any of that data products descendant data products are present in the output file.

Tools

There is an art tool to dump the list of processes which have run on a fie, based on file provenance contents:

file_info_dumper --process-history <file>

produces:

 Chronological list of process names for processes that
 produced this file.

    1. cosmics1
    2. cosmics2
    3. cosmics3
    4. drap

and a config dumper:

config_dumper <file>

which prints the fcl config for each module that was used in the art jobs which produced the file.