Difference between revisions of "AnalysisWorkflow"

From Mu2eWiki
Jump to navigation Jump to search
Line 12: Line 12:
 
  mu2eDatasetFileList  <dataset name> > fcllist.txt
 
  mu2eDatasetFileList  <dataset name> > fcllist.txt
 
fcllist.txt will contain one file per line, with the full path to the file, usually in [[Dcache|dCache]] (path starts with "/pnfs").  This list is what will drive your job.
 
fcllist.txt will contain one file per line, with the full path to the file, usually in [[Dcache|dCache]] (path starts with "/pnfs").  This list is what will drive your job.
 +
 +
==Analysis Methods==
 +
 +
You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need.  We have a summary of at [[Ntuples]].  It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple.  Design these details requires working with a experienced collaborator, familiar with your goals.
 +
 +
You will need an Offline build to run your jobs in.  This can be one of the published releases:
 +
ls /cvmfs/mu2e.opensciencegrid.org/Offline
 +
or you can build, and modify, the code locally, and build a [[Gridexport|tarball of the code]] for grid submission.
 +
  
 
==Running jobs==
 
==Running jobs==
  
You will need to design the job you
+
To start, you can run on the first file interactively:
 +
mu2e -S fcllist.txt -n 100 -c <your_job_fcl>
 +
be sure to limit the number of events since you have the entire dataset available for input.
  
==Analysis Methods==
+
A quick way to see the contents of almost all products in an art file is [[Validation]]:
 +
  mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl
 +
which will write <code>validation.root</code> containing many histograms.
  
You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you needWe have a summary of at [[Ntuples]]
+
If you are new to grid jobs, please review [[Grids]], [[Dcache]], [[DataTransfer]], and [[JobPlan]].  To
 +
submit a grid job you will need to [[GenerateFcl|make a set of fcl files]], each takes your basic interactive fcl and customizes it for different input and output files.  The resulting set of fcl files are submitted to the gridThis is likely to be "example 2" of the [[SubmitJobs]] page.

Revision as of 16:59, 14 December 2018

Introduction

This workflow is used to access the data in existing art format data files. If you don't know what art files are, please review the basic information at ComputingTutorials. Some major collaboration efforts to make files for analysis were the "cd3" processing in 2015 and the MDC2018 processing in 2018. Individual users may also produce and upload datasets.

Finding data

In a typical scenario, you will be given or find a dataset name, such as dig.mu2e.CeEndpoint.MDC2018b.art. Usually this will arise out of discussions with you physics group or mentor. You can also discover these from the listing of MDC 2018 data or the full listing.

Once you have the dataset name, you can see the list of files in the dataset with

setup mu2e
setup mu2efiletools
mu2eDatasetFileList  <dataset name> > fcllist.txt

fcllist.txt will contain one file per line, with the full path to the file, usually in dCache (path starts with "/pnfs"). This list is what will drive your job.

Analysis Methods

You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need. We have a summary of at Ntuples. It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple. Design these details requires working with a experienced collaborator, familiar with your goals.

You will need an Offline build to run your jobs in. This can be one of the published releases:

ls /cvmfs/mu2e.opensciencegrid.org/Offline

or you can build, and modify, the code locally, and build a tarball of the code for grid submission.


Running jobs

To start, you can run on the first file interactively:

mu2e -S fcllist.txt -n 100 -c <your_job_fcl>

be sure to limit the number of events since you have the entire dataset available for input.

A quick way to see the contents of almost all products in an art file is Validation:

 mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl

which will write validation.root containing many histograms.

If you are new to grid jobs, please review Grids, Dcache, DataTransfer, and JobPlan. To submit a grid job you will need to make a set of fcl files, each takes your basic interactive fcl and customizes it for different input and output files. The resulting set of fcl files are submitted to the grid. This is likely to be "example 2" of the SubmitJobs page.