AnalysisWorkflow: Difference between revisions
(Created page with "==Introduction== This workflow is used to access the data in existing '''art''' format data files. If you don't know what art files are, please review the basic information...") |
|||
(8 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
==Finding data== | ==Finding data== | ||
In a typical scenario, you will be given or find a [[ | In a typical scenario, you will be given or find a [[FileNames|dataset name]], such as '''dig.mu2e.CeEndpoint.MDC2018b.art'''. Usually this will arise out of discussions with you physics group or mentor. You can also discover these from the listing of [[MDC2018| MDC 2018 data]] or the [https://mu2e.fnal.gov/atwork/computing/ops/samMon.html full listing]. | ||
Once you have the dataset name, you can see the list of files in the dataset with | |||
mu2einit | |||
setup mu2efiletools | |||
mu2eDatasetFileList <dataset name> > fcllist.txt | |||
fcllist.txt will contain one file per line, with the full path to the file, usually in [[Dcache|dCache]] (path starts with "/pnfs"). This list is what will drive your job. | |||
==Analysis Methods== | |||
You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need. We have a summary of at [[Ntuples]]. It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple. Design these details requires working with a experienced collaborator, familiar with your goals. | |||
You will need an Offline build to run your jobs in. This can be one of the published releases: | |||
ls /cvmfs/mu2e.opensciencegrid.org/Offline | |||
or you can build, and modify, the code locally, and build a code tarball, using [[Muse#Tarball|muse tarball]], to be used for grid submission. | |||
==Running jobs== | ==Running jobs== | ||
To start, you can run on the first file interactively: | |||
mu2e -S fcllist.txt -n 100 -c <your_job_fcl> | |||
be sure to limit the number of events since you have the entire dataset available for input. | |||
A quick way to see the contents of almost all products in an art file is [[Validation]]: | |||
mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl | |||
which will write <code>validation.root</code> containing many histograms. Or you can print the products: | |||
mu2e -S fcllist.txt -n 10 -c Print/fcl/print.fcl > products.txt | |||
If you are new to grid jobs, please review [[Grids]], [[Dcache]], [[DataTransfer]], and [[JobPlan]]. To | |||
submit a grid job you will need to [[GenerateFcl|make a set of fcl files]], each takes your basic interactive fcl and customizes it for different input and output files. The resulting set of fcl files are submitted to the grid. This is likely to be "example 2" of the [[SubmitJobs]] page. |
Latest revision as of 22:22, 19 July 2024
Introduction
This workflow is used to access the data in existing art format data files. If you don't know what art files are, please review the basic information at ComputingTutorials. Some major collaboration efforts to make files for analysis were the "cd3" processing in 2015 and the MDC2018 processing in 2018. Individual users may also produce and upload datasets.
Finding data
In a typical scenario, you will be given or find a dataset name, such as dig.mu2e.CeEndpoint.MDC2018b.art. Usually this will arise out of discussions with you physics group or mentor. You can also discover these from the listing of MDC 2018 data or the full listing.
Once you have the dataset name, you can see the list of files in the dataset with
mu2einit setup mu2efiletools mu2eDatasetFileList <dataset name> > fcllist.txt
fcllist.txt will contain one file per line, with the full path to the file, usually in dCache (path starts with "/pnfs"). This list is what will drive your job.
Analysis Methods
You have a choice of how to convert the art files into histograms or other summary formats to get at the analysis quantities you need. We have a summary of at Ntuples. It is common to work with files that are not fully reconstructed. You have the option to run additional simulation and reconstruction after you read the files and before you write out an analysis ntuple. Design these details requires working with a experienced collaborator, familiar with your goals.
You will need an Offline build to run your jobs in. This can be one of the published releases:
ls /cvmfs/mu2e.opensciencegrid.org/Offline
or you can build, and modify, the code locally, and build a code tarball, using muse tarball, to be used for grid submission.
Running jobs
To start, you can run on the first file interactively:
mu2e -S fcllist.txt -n 100 -c <your_job_fcl>
be sure to limit the number of events since you have the entire dataset available for input.
A quick way to see the contents of almost all products in an art file is Validation:
mu2e -S fcllist.txt -n 100 -c Validation/fcl/val.fcl
which will write validation.root
containing many histograms. Or you can print the products:
mu2e -S fcllist.txt -n 10 -c Print/fcl/print.fcl > products.txt
If you are new to grid jobs, please review Grids, Dcache, DataTransfer, and JobPlan. To
submit a grid job you will need to make a set of fcl files, each takes your basic interactive fcl and customizes it for different input and output files. The resulting set of fcl files are submitted to the grid. This is likely to be "example 2" of the SubmitJobs page.