FileTools

From Mu2eWiki
Revision as of 18:07, 3 April 2017 by Rlc (talk | contribs)
Jump to navigation Jump to search

mu2etools

Tools to help setup fcl for a grid project. Setup with

setup mu2e
source an offline setup.sh script
setup mu2etools

generate_fcl

A project may consist of several grid submissions, and each of those submissions may have many jobs. Each of these jobs will typically have a unique fcl file to drive it. For simulation, this fcl will include aunique random number seed, run numbers and output file names. generate_fcl will take a template fcl file, add these unique parts, and write out the complete set of fcl files for the project.

mu2efiletools

Tools to deal with file during grid operations, including method to list, check, count, move and upload all the files of a dataset. Some of these operations can be done with unix commands like find, but these scripts are recommended because they incorporate the most efficient methods, which are not always obvious.

Official production datasets, and user datasets manipulated by the file tools will appear under the following designated dataset areas, corresponding to the above flavors:

  • /pnfs/mu2e/scratch/datasets
  • /pnfs/mu2e/persistent/datasets
  • /pnfs/mu2e/tape

Setup with

setup mu2e
setup mu2efiletools

mu2eClusterArchive

mu2eClusterCheckAndMove

mu2eClusterFileList

mu2eDatasetDelete

mu2eDatasetFileList

mu2eDatasetLocation

mu2eFileDeclare

mu2eFileMoveToTape

mu2eFileUpload

mu2eMissingJobs

File path tools

These tools, listed here

  • mu2eabsname_tape
  • mu2eabsname_disk
  • mu2eabsname_scratch

These can be given a SAM (six-dot-field) file name, and will return the full path for this file in the respective dCache areas. The subdirectories in the path are all derived from the file name, and is unique. Files may be stored anywhere temporarily, but when they go to their permanent (or semi-permanent for scratch) location they should go here.

They only read stdin:

 > ls sim.mu2e.example-beam-g4s1.1812a.16638329_000016.art | mu2eabsname_scratch
/pnfs/mu2e/scratch/datasets/phy-sim/sim/mu2e/example-beam-g4s1/1812a/art/f8/29/sim.mu2e.example-beam-g4s1.1812a.16638329_000016.art

jsonMaker

All files to be uploaded to tape need to have a SAM file record. (Some other semi-permanent files in other locations may also have SAM records.) We create a SAM record by supplying a json file (which looks a lot like a Python dictionary or a fcl table) that contains keyword/value pairs. We include the keyword/value pairs for the file metadata that we want to supply for the file's SAM file record.

These json files could be written (or edited) by hand, but it is far easier to run jsonMaker on the file. This Python script is put in your path with the dhtools product:

setup dhtools

Running jsonMaker will produce all the mundane metadata like file size. For art files, it will run a fast art executable over the file to extract information like the number of events in the file. This means a version of offline must be set up to run jsonMaker. The code checks certain required fields are present and other rules, checks consistency, and writes in a known correct format.

jsonMaker has a help ("-h") option to show the optional switches. There is a lot about moving or copying files to the upload area - this is obsolete functionality. Please see the upload examples for how to use everything.

Here is an example json file output. Please do not use this for upload, let jsonMaker do the right thing...

{
    "dh.description": "cd3-beam-g4s1-dsregion", 
    "file_type": "mc", 
    "file_name": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art", 
    "dh.first_subrun": 5, 
    "file_size": 4290572, 
    "file_format": "art", 
    "dh.first_run_event": 1002, 
    "dh.last_event": 10000, 
    "dh.last_subrun": 5, 
    "dh.last_run_event": 1002, 
    "dh.last_run_subrun": 1002, 
    "dh.first_run_subrun": 1002, 
    "data_tier": "sim", 
    "dh.first_event": 5, 
    "dh.source_file": "/pnfs/mu2e/phy-sim/sim/mu2e/cd3-beam-g4s1-dsregion/0506a/001/307/sim.mu2e.cd3-beam-g4s1-dsregion.0506a.001002_00000005.art", 
    "runs": [
        [
            1002, 
            5, 
            "mc"
        ]
    ], 
    "dh.configuration": "0506a", 
    "event_count": 3018, 
    "dh.owner": "mu2e", 
    "content_status": "good", 
    "dh.dataset": "sim.mu2e.cd3-beam-g4s1-dsregion.0506a.art", 
    "dh.sha256": "e3b5b426ce6c6d4dd2b9fcf2bccb4663205235d3e3fb6011a8dc49ef2ff66dbb", 
    "dh.sequencer": "001002_00000005"
}