FileTools: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
(Created page with " <!********************************************************> == jsonMaker== The jsonMaker is a python script which lives in the dhtools product and should be available at t...")
 
No edit summary
Line 1: Line 1:


<!********************************************************>
== jsonMaker==
== jsonMaker==


All files to be uploaded to tape need to have a [[SAM]] file record.
(Some other semi-permanent files in other locations may also have SAM records.)
We create a SAM record by supplying a [http://json.org/ json file]
(which looks a lot like a Python dictionary or a fcl table) that contains
keyword/value pairs.  We include the keyword/value
pairs for the file [[SamMetadata|metadata]] that we want to supply for
the file.


The jsonMaker is a python script which lives in the dhtools product
These json files could be written (or edited) by hand, but
and should be available at the command line after "setup dhtools."
it is far easier to run jsonMaker on the file.  This Python script
Please see the [uploadExample.shtml upload examples]  page
is put in your path with the dhtools product:
for details.
setup dhtools


 
Running jsonMaker will produce all the mundane metadata like file size.
All files to be uploaded should be processed by the jsonMaker,
For art files, it will run a fast art executable over the file to extract
which writes the final json file to be included with the  
information like the number of events in the file.
data file in the FTS input directory. Even if all
This means a version of offline must be set up to run jsonMaker.
the final json could be written by hand, the jsonMaker
The code checks certain required fields are present and other rules,  
checks certain required fields are present and other rules,  
checks consistency, and writes in a known correct format.
checks consistency, and writes in a known correct format.


jsonMaker has a help ("-h") option to show the optional switches.
There is a lot about moving or copying files to the upload area -
this is obsolete functionality.  Please see the [[Upload|upload]] examples
for how to use everything.


Simply run the maker with all the data files and json fragment(s)
Here is an example json file output. Please do not use this for upload, let jsonMaker
as input.  The help of the code is below. The most
do the right thing...
useful practical reference is the
[uploadExample.shtml upload examples]  page.


<pre>
<pre>
jsonMaker  [OPTIONS] ... [FILES] ...
{
 
    "dh.description": "cd3-beam-g4s1-dsregion",  
  Create json files which hold metadata information about the file
    "file_type": "mc",  
to be uploaded. The file list can contain data, and other types,
    "file_name": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art",  
of files (foo.bar) to be uploaded. If foo.bar.json is in the list,  
    "dh.first_subrun": 5,  
its contents will be added to the json for foo.bar.
    "file_size": 4290572,  
If a generic json file is supplied, it's contents will be
    "file_format": "art",  
added to all output json files.  Output is a json file for each input
    "dh.first_run_event": 1002,  
file, suitable to presenting to the upload FTS server together with
    "dh.last_event": 10000,  
the data file.
    "dh.last_subrun": 5,
  If the input file is an art file, jsonMaker must run
    "dh.last_run_event": 1002,  
a module over the file in order to extract run and event
    "dh.last_run_subrun": 1002,
information, so a mu2e offline release that contains the module
    "dh.first_run_subrun": 1002,  
must be setup.
    "data_tier": "sim",  
 
    "dh.first_event": 5,  
  -h
    "dh.source_file": "/pnfs/mu2e/phy-sim/sim/mu2e/cd3-beam-g4s1-dsregion/0506a/001/307/sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art",
      print help
    "runs": [
  -v LEVEL
        [
      verbose level, 0 to 10, default=1
            1002,  
  -x
            5,  
      perform write/copy of files. Default is to evaluate the
            "mc"
      upload parameters, but not not write or move anything.
        ]
  -c
    ],
      copy the data file to the upload area after processing
    "dh.configuration": "0506a",
      Will move the json file too, unless overidden by an explicit -d.
    "event_count": 3018,
  -m
    "dh.owner": "mu2e",
      mv the data file to the upload area after processing.  
    "content_status": "good",
      Useful if the data file is already in
    "dh.dataset": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art",
      /pnfs/mu2e/scratch where the FTS is.
    "dh.sha256": "e3b5b426ce6c6d4dd2b9fcf2bccb4663205235d3e3fb6011a8dc49ef2ff66dbb",  
      Will move the json file too, unless overidden by an explicit -d.
    "dh.sequencer": "001002_00000005"
  -e
}
      just rename the data file where it is
  -s FILE
      FILE contains a list of input files to operate on.
  -p METHOD
      How to match a input json file to a data file
      METHOD="none" for no json input file for each data file (default)
      METHOD="file" pair an input json file with a data file based on the
      fact that if the file is foo, the json is foo.json.
      METHOD="dir" pair a json file and a data file based on the fact that
      they are in the same directory, whatever their names are.
  -j FILE
      a json file fragment to add to the json for all files,
      typically used to supply MC parameters.
  -i PAR=VALUE
      a json file entry to add to the json for all files, like
        -i mc.primary_particle=neutron
        -i mc.primary_particle="neutron" 
        -i mc.simulation_stage=2
      Can be repeated. Will supersede values given in -j
  -a FILE
      a text file with parent file sam names - usually would only
      be used if there was one data file to be processed.
  -t TAG
      text to prepend to the sequencer field of the output filename.
      This can be useful for non-art datasets which have different
      components uploaded at different times with different jsonMaker
      commands, but intended to be in the same dataset, such as a series
      of backup tarballs from different stages of processing.
  -d DIR
      directory to write the json files in. Default is ".".
      If DIR="same" then write the json in the same directory as the
      the data file. If DIR="fts" then write it to the FTS directory.
      If -m or -c is set, then -d "fts" is implied unless overidden by
      an explicit -d.
  -f FILE_FAMILY
      the file_family for these files - required
  -r NAME
      this will trigger renaming the data files by the pattern in NAME
      example: -r mcs.batman.beam-2014.fcl-100..art
      The blank sequencer ".." will be replaced by a sequence number
      like ".0001." or first run and subrun for art files.
  -l DIR
      write a file of the data file name and json file name
      followed by the fts directory where they should go, suitable
      for driving a "ifdh cp -f" command to move all files in one lock.
      This file will be named for the dataset plus "command"  
        plus a time string.
  -g
      the command file will be written (implies -l) and then
      when all files are evaluated and json files written, execute
      the command file with "ifdh cp -f commandfile". Useful
      to use one lock file to execute all ifdh commands.
      Nullifies -c and -m.
 
  Requires python 2.7 or greater for subprocess.check_output and
    2.6 or greater for json module.
  version 2.0
 
</pre>
</pre>

Revision as of 22:31, 30 March 2017

jsonMaker

All files to be uploaded to tape need to have a SAM file record. (Some other semi-permanent files in other locations may also have SAM records.) We create a SAM record by supplying a json file (which looks a lot like a Python dictionary or a fcl table) that contains keyword/value pairs. We include the keyword/value pairs for the file metadata that we want to supply for the file.

These json files could be written (or edited) by hand, but it is far easier to run jsonMaker on the file. This Python script is put in your path with the dhtools product:

setup dhtools

Running jsonMaker will produce all the mundane metadata like file size. For art files, it will run a fast art executable over the file to extract information like the number of events in the file. This means a version of offline must be set up to run jsonMaker. The code checks certain required fields are present and other rules, checks consistency, and writes in a known correct format.

jsonMaker has a help ("-h") option to show the optional switches. There is a lot about moving or copying files to the upload area - this is obsolete functionality. Please see the upload examples for how to use everything.

Here is an example json file output. Please do not use this for upload, let jsonMaker do the right thing...

{
    "dh.description": "cd3-beam-g4s1-dsregion", 
    "file_type": "mc", 
    "file_name": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art", 
    "dh.first_subrun": 5, 
    "file_size": 4290572, 
    "file_format": "art", 
    "dh.first_run_event": 1002, 
    "dh.last_event": 10000, 
    "dh.last_subrun": 5, 
    "dh.last_run_event": 1002, 
    "dh.last_run_subrun": 1002, 
    "dh.first_run_subrun": 1002, 
    "data_tier": "sim", 
    "dh.first_event": 5, 
    "dh.source_file": "/pnfs/mu2e/phy-sim/sim/mu2e/cd3-beam-g4s1-dsregion/0506a/001/307/sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art", 
    "runs": [
        [
            1002, 
            5, 
            "mc"
        ]
    ], 
    "dh.configuration": "0506a", 
    "event_count": 3018, 
    "dh.owner": "mu2e", 
    "content_status": "good", 
    "dh.dataset": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art", 
    "dh.sha256": "e3b5b426ce6c6d4dd2b9fcf2bccb4663205235d3e3fb6011a8dc49ef2ff66dbb", 
    "dh.sequencer": "001002_00000005"
}