Ntuples: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
(Created page with " TrkAna docdb 7775")
 
No edit summary
Line 1: Line 1:


TrkAna docdb 7775
==Introduction==
 
Our primary data is stored in the [[Code|art]] format.  This format uses root I/O, but embeds it in a framework with restrictive rules for data access.  These framework rules are important during primary processing in order to precisely track the provenance of the data.  At some point later in the analysis process, the dominate problem becomes accessing the data in a convenient way, rather than a very controlled way.  The solution is to copy the most high-level parts of the data (such the number of hits on a track, and its momentum) into a smaller and faster format - an ntuple.  Once the data is in this format, usually a root tree, the user can make histograms with simple cuts.  Since only the high-level data values are stored, the dataset is very small and access is very fast.
 
Ideally the collaboration would chose a primary ntuple format and officially support this format.  The official support would include code and document support, and priority in support and processing.  With one central format, everyone's work in creating datasets and tools could be shared.  At this time (8/2018) the selection of a primary ntuple format is not done, but it is still the plan.  Meanwhile, there are several methods
 
==Stntuple==
** [https://sites.google.com/view/stntuple/home Stntuple] [ssh://p-mu2eofflinesoftwarestntuple@cdcvs.fnal.gov/cvs/projects/mu2eofflinesoftwarestntuple/Stntuple.git git url] - one choice of user ntuple
 
==TrkAna==
 
TrkAna [https://mu2e-docdb.fnal.gov/cgi-bin/private/ShowDocument?docid=7775 docdb 7775]
 
==Custom root tree==
 
==gallery==
 
==Other formats==
 
Tools other than root have been explored at times, but there is no major effort on mu2e at this writing.
** [http://jupyter.org/ Jupyter]
** [https://www.hdfgroup.org/ HDF5]
** [https://www.r-project.org/ r]

Revision as of 17:57, 2 August 2018

Introduction

Our primary data is stored in the art format. This format uses root I/O, but embeds it in a framework with restrictive rules for data access. These framework rules are important during primary processing in order to precisely track the provenance of the data. At some point later in the analysis process, the dominate problem becomes accessing the data in a convenient way, rather than a very controlled way. The solution is to copy the most high-level parts of the data (such the number of hits on a track, and its momentum) into a smaller and faster format - an ntuple. Once the data is in this format, usually a root tree, the user can make histograms with simple cuts. Since only the high-level data values are stored, the dataset is very small and access is very fast.

Ideally the collaboration would chose a primary ntuple format and officially support this format. The official support would include code and document support, and priority in support and processing. With one central format, everyone's work in creating datasets and tools could be shared. At this time (8/2018) the selection of a primary ntuple format is not done, but it is still the plan. Meanwhile, there are several methods

Stntuple

TrkAna

TrkAna docdb 7775

Custom root tree

gallery

Other formats

Tools other than root have been explored at times, but there is no major effort on mu2e at this writing.