AnalysisWorkflow: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 8: Line 8:


Once you have the dataset name, you can see the list of files in the dataset with
Once you have the dataset name, you can see the list of files in the dataset with
  setup mu2e
  mu2einit
  setup mu2efiletools
  setup mu2efiletools
  mu2eDatasetFileList  <dataset name> > fcllist.txt
  mu2eDatasetFileList  <dataset name> > fcllist.txt
fcllist.txt will contain one file per line, with the full path to the file, usually in [[Dcache|dCache]] (path starts with "/pnfs").  This list is what will drive your job.
fcllist.txt will contain one file per line, with the full path to the file, usually in [[Dcache|dCache]] (path starts with "/pnfs").  This list is what will drive your job.
==Analysis Methods==
You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need.  We have a summary of at [[Ntuples]].  It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple.  Design these details requires working with a experienced collaborator, familiar with your goals.
You will need an Offline build to run your jobs in.  This can be one of the published releases:
ls /cvmfs/mu2e.opensciencegrid.org/Offline
or you can build, and modify, the code locally, and build a code tarball, using [[Muse#Tarball|muse tarball]], to be used for grid submission.


==Running jobs==
==Running jobs==


To start, you can run on the first file interactively:
mu2e -S fcllist.txt -n 100 -c <your_job_fcl>
be sure to limit the number of events since you have the entire dataset available for input.
A quick way to see the contents of almost all products in an art file is [[Validation]]:
  mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl
which will write <code>validation.root</code> containing many histograms.  Or you can print the products:
mu2e -S fcllist.txt -n 10 -c Print/fcl/print.fcl > products.txt


==Analysis Methods==


You have a choice of how to convert the art files into histograms or similar formats to get at the anlaysis quantities you needWe have a summary of at [[Ntuples]]
If you are new to grid jobs, please review [[Grids]], [[Dcache]], [[DataTransfer]], and [[JobPlan]].  To
submit a grid job you will need to [[GenerateFcl|make a set of fcl files]], each takes your basic interactive fcl and customizes it for different input and output files.  The resulting set of fcl files are submitted to the gridThis is likely to be "example 2" of the [[SubmitJobs]] page.

Latest revision as of 22:22, 19 July 2024

Introduction

This workflow is used to access the data in existing art format data files. If you don't know what art files are, please review the basic information at ComputingTutorials. Some major collaboration efforts to make files for analysis were the "cd3" processing in 2015 and the MDC2018 processing in 2018. Individual users may also produce and upload datasets.

Finding data

In a typical scenario, you will be given or find a dataset name, such as dig.mu2e.CeEndpoint.MDC2018b.art. Usually this will arise out of discussions with you physics group or mentor. You can also discover these from the listing of MDC 2018 data or the full listing.

Once you have the dataset name, you can see the list of files in the dataset with

mu2einit
setup mu2efiletools
mu2eDatasetFileList  <dataset name> > fcllist.txt

fcllist.txt will contain one file per line, with the full path to the file, usually in dCache (path starts with "/pnfs"). This list is what will drive your job.

Analysis Methods

You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need. We have a summary of at Ntuples. It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple. Design these details requires working with a experienced collaborator, familiar with your goals.

You will need an Offline build to run your jobs in. This can be one of the published releases:

ls /cvmfs/mu2e.opensciencegrid.org/Offline

or you can build, and modify, the code locally, and build a code tarball, using muse tarball, to be used for grid submission.


Running jobs

To start, you can run on the first file interactively:

mu2e -S fcllist.txt -n 100 -c <your_job_fcl>

be sure to limit the number of events since you have the entire dataset available for input.

A quick way to see the contents of almost all products in an art file is Validation:

 mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl

which will write validation.root containing many histograms. Or you can print the products:

mu2e -S fcllist.txt -n 10 -c Print/fcl/print.fcl > products.txt


If you are new to grid jobs, please review Grids, Dcache, DataTransfer, and JobPlan. To submit a grid job you will need to make a set of fcl files, each takes your basic interactive fcl and customizes it for different input and output files. The resulting set of fcl files are submitted to the grid. This is likely to be "example 2" of the SubmitJobs page.