SamExpert: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 332: Line 332:
</pre>
</pre>


== Touching a dataset ==
Example command to read a few bytes of each file and therefore reset their "last read" time and prevent them from being purged from disk.
  samweb -e gm2 run-project --defname=gm2pro_daq_preproduction_run3D_5205P --schema https 'echo %fileurl && curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath /etc/grid-security/certificates -H "Range: bytes=0-3" %fileurl && echo'


<!*********************************************************************>
<!*********************************************************************>

Revision as of 19:03, 21 September 2022

Introduction

This pages just captures some documentation which will be useful in the future.

SAM Locator Process

Around 2020, someone turned on our SAM station "locator process". This process run continuously in the background and adds locations to SAM records. Every night it receives a list of files which have moved from tape-backed dCache disk only, to also actually being on tape. If the file already has a location entry of the form "enstore:/pnfs/path", with the correct path, then the locator will add the tape volume and file positions strings to this SAM location string.

DD vs snaps

It would be logical to specify data by a snapshot id when submitting grid jobs, since it is fixed file list, but jobsub doesn't allow this yet - it only allows specifying data by a dataset definition. This is fine for most work since most datasets are fixed. If you need to submit a job to run on a snapshot, the work-around is to create a dataset definition based on a snapshot:

export SAM_DD_SNAP=${USER}_test_0_snap
samweb create-definition $SAM_DD_SNAP "snapshot_id=$SAM_SNAP_ID"

This dataset definition then represents a permanent, fixed file list.


<! *****************************************************************>

Reading Data with SAM

When a grid job is submitted to read the files in a SAM dataset, the file list is specified by a dataset definition.

The are two parts to running a SAM job. First there is a control process, one per job, which starts up a SAM project on a SAM station. This process will keep track of what files are going to what job sections. The second part is the consumer, a processes which is reading the files. There are typically many consumers, one for each job section on the grid.

The jobsub command, with the right switches set, can act as the control process, or you can run the commands explicitly yourself.

A consumer may be a script running SAM commands, or an art executable which is configured to ask a SAM project for input files. These different options for control and consumer process can be chosen independently.

The combination of using jobsub to handle the control process and art to handle the consumer process is by far the most common usage - start there first.

A consumer contacts the project on the SAM station to establish itself as a consumer and when it is ready for data asks for the next available file. The station responds with a filespec and the consumer uses the "ifdh cp" command to copy the file out of dCache and onto the worker node where it can be processed. The art input module will automatically delete files after they are closed but a script-based consumer will have to delete the files itself. The consumers run until there are no more files left. You can set a maximum number of files to be delivered to a consumer. When the maximum number of files are delivered to a consumer, or when project sees all files have been delivered, the project will return an empty string when asked for the next file.


<! ********************************************************* >

Running on the Grid with jobsub, SAM, and art

This will be the most common method for reading files through SAM. The control process will be performed by jobsub. An art exe will be configured to act as a consumer. At this time (5/2015) mu2egrid has not been updated to include this functionality, though that is part of the plan.

The jobsub_submit command only requires adding one switch to provide the dataset definition which defines the input data. You need a dataset definition string to give to this switch. It will look like:

export SAM_DD=sim.mu2e.example-beam-g4s1.1812a.art

See the discussion above for how to create a dataset definition, or more likley you will get these datasets definitions from an expert, or datasets page or TDR sample If you set SAM_FILE_LIMIT=N and pass it to jobsub with -e, this will be the maximum files given to each copy of mu2e exe run in your job (each consumer).

jobsub_submit --dataset_definition=$SAM_DD [...]

jobsub_client will start a SAM project for you. It will set several environmentals in the grid job processes, which we will use to setup the consumers.


The consumer in this example is an art executable. You will have to perform one step before running art - start the consumer.

setup mu2e
# utlities you will need
# -q grid defines the mgf functions which 
# you don't generally need interactively
setup dhtools -q grid
# setup an offline
source Offline setup.sh
# set a limit on files for this exe, if you want..
export SAM_FILE_LIMIT=2
# this is a bash function supplied by dhtools
mgf_sam_start_consumer -v

Now start the art exe with

mu2e --sam-web-uri=$SAM_PROJECT_URL --sam-process-id=$SAM_CONSUMER_ID [...]

The environmentals were set by the mgf function. No other input or services configuration is needed. The exe will contact the SAM project on the SAM station to get file url's and call ifdh to move the files to the grid node. It will also delete the files when it is done.

The art module will detect when the SAM station stops delivering files and will stop the exe. You should then shut down this consumer:

# this consumer is done reading files
mgf_sam_stop_consumer -v

The "-v" option just prints some checks and other info, you can leave it off for a quiet operation.

<!*********************************************************************>

Running SAM with Scripts

This section describes how to start a SAM project, create consumers, get files, and finish a project all though using script commands instead of jobsub and art. You might use this if you were operating on the input dataset in some other way than reading with art. We would expect that most users would not need this type of operation.

There are two parts.  The first can be called the 

the "control" process which starts and stops the project. This is run once per job and would probably be done interactively. (When using jobsub_submit, the jobsub process performs this function.) The second part is the consumer. Typically the consumer script will be run many times, one per section in the grid job. (The art input module can also do this, except for establishing the consumer which is done at the command level.)

The Control process - begin This process first needs to create a file selection criteria and use it to create a dataset definition. Usually this will just be handed to you as a dataset name, like "sim.mu2e.example-beam-g4s1.1812a.art", which is also a special dataset definition name. Doing the control process steps explicitly replaces the actions done by jobsub_submit when the --dataset_definition is set.

export SAM_DD=sim.mu2e.example-beam-g4s1.1812a.art
mgf_start_project -v

The consumer process

This script would typically be part of the grid job. The consumer process first establishes itself as a consumer of the project. If jobsub_submit is used as the control process, then it will set SAM_PROJECT_URL in the grid environment. If you started the project, then you will need to explcity pass this to the gird enviroment with -e SAM_PROJECT_URL switch to jobsub_submit.

setup mu2e
# setup an offline, or whatever you need
source Offline/setup.sh
# define mgf functions
setup dhtools -q grid

# prepare sam project to give us files
mgf_sam_start_consumer -v

# get a file url
mgf_sam_getnextfile -v

while [ "$SAM_FILE_URL" != "" ]; do
  # copy the file locally
  # could also "ifdh cp $SAM_FILE_URL $SAM_FILE"
  mgf_ifdh_with_backoff $SAM_FILE_URL $SAM_FILE
  # use local file $SAM_FILE here

  # tell sam you are done with this file
  mgf_sam_releasefile ok
  # you need to delete it
  rm -f $SAM_FILE
  # see if ther eis another one
  mgf_sam_getnextfile -v
done

# the SAM project stopped givign us files
mgf_sam_stop_consumer -v


The Control process - end When the consumers are all done, the project should be ended:

# requires SAM_PROJECT to be set (would be set by mgf_start_project)
mgf_stop_project -v

Don't worry about projects and consumers that might be forgotten or stopped incorrectly, they will eventually be stopped by SAM.


<!*********************************************************************>

mgf functions

These functions are defined with

setup dhtools -q grid

They are designed to combine several samweb commands and checks into one command. In general they would only be used in a grid job.


# mgf_tee
#
# echo message to stdout and stderr
# useful for coordinating output in job's .out and .err
#

# mgf_date
#
# echo date and message to stdout and stderr
# useful for coordinating output in job's .out and .err
#

# mgf_section_name
#
# sets MGF_SECTION_NAME=cluster_process (formatted)
#

# mgf_system
#
# print info about the system
#  -l long version
#  -v longer version
#

# mgf_sam_start_consumer
#
# Starts a consumer for the sam project
# the job should have been submitted with sam settings
# you need to setup ifdh or setup a base release
# you need to setup sam_web_client
# requires SAM_PROJECT to be set (jobsub will do that)
# you may set SAM_FILE_LIMT
#
# return environmental SAM_CONSUMER_ID
#
# -v verbose
#

# mgf_sam_getnextfile
#
# Get the next file for a consumer.
# This function is only used if mu2e executable is not used to read files.
# The job should have been submitted with sam settings
# You need to setup ifdh and sam_web_client
# Requires SAM_CONSUMER_ID to be set (mgf_start_sam_consumer will do that)
#
# returns environmental SAM_FILE_URL (probably a gridftp url suitable for ifdh)
# and SAM_FILE which is the basename.  These are empty if there are no more files.
#
# -v verbose
#


#  mgf_sam_releasefile
#
# If getnextfile was called to get a file, then the file should be 
# released with this method.  This function is not used if 
# an art executable is used with sam switches.
# Requires  SAM_PROJECT_URL  SAM_CONSUMER_ID  SAM_FILE_URL
# Ideally, you set SAM_FILE_STATUS=ok   (or notOk) according
# to whether the procesing was successfull.  You can also pass
# this as the first argument.
# You need to setup sam_web_client
#


# mgf_sam_stop_consumer
#
# Stops the consumer for the sam project.
# Requires SAM_PROJECT and SAM_CONSUMER_ID
# to be set.
#
# -v verbose
#


# mgf_start project
#
# Start a SAM project. Not used in typical grid jobs.
# Requires SAM_DD to be set to a dataset definition
# to be set.  If SAM_PROJECT is set, it is used.
# Sets SAM_PROJECT and SAM_PROJECT_URL
#
# -v verbose
#


# mgf_stop_project
#
# Stop a SAM project. Not used in typical grid jobs.
# Requires SAM_PROJECT to be set.
#
# -v verbose
#


# mgf_ifdh_with_backoff
#
# Execute an ifdh command with retries and backoff
# Default command is "ifdh cp $1 $2", can be redefined with
# MGF_IFDH_COMMAND.  The retries have the following
# sleep pattern "600 1800 3600 3600" in seconds, which can be changed
# with MGF_IFDH_SLEEP_PLAN
#
#


Touching a dataset

Example command to read a few bytes of each file and therefore reset their "last read" time and prevent them from being purged from disk.

 samweb -e gm2 run-project --defname=gm2pro_daq_preproduction_run3D_5205P --schema https 'echo %fileurl && curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath /etc/grid-security/certificates -H "Range: bytes=0-3" %fileurl && echo'

<!*********************************************************************>

Expert Procedures

look up a file history, including staging and purging

get pnfsid
cat '/pnfs/mu2e/tape/usr-sim/dig/oksuzian/CRY-cosmic-general/cry3-digi-hi/art/99/d1/.(id)(dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.001002_00068805.art)'

go to kibana and enter

 pnfsid:<id>

or

file_name:dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.001002_00035646.art


fife_utils check fraction on disk

sam_validate_dataset --stage_status --name dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.art

These are not needed by users, included here for completeness.

Values SAM have two levels of metadata entries, those intrinsic to SAM, which are called values and those that defined by the experiment, called parameters. Values can only be one of a list of values.

samweb list-values --help-categories
samweb list-values data_tiers

Mu2e collaborators with admin privileges can add to the list of allowed entries:

samweb add-value data_tiers "raw"

Parameters Parameters are arbitrary metadata fields which can be added to the metadata, for all files, by Mu2e collaborators with admin privileges. They have the form

category.name

Currently, we have the categories: dh, mc, job. Information generated in data handling procedures is stored in dh, mc description is stored in mc, and job is used in processing. Parameters can be listed:

samweb list-parameters
 ...
 dh.first_subrun (true_int)
 ...
 mc.primary_particle (string)
 ...

Parameters may be of types:

string
true_float
true_int

and new ones may be created:

samweb add-parameter  dh.first_subrun_event true_int

Declare a file

export SAM_FILE=foo.bar
samweb declare-file ${SAM_FILE}.json

where the json file contains all required metadata for the file

Add a location to a file

samweb add-file-location ${SAM_FILE} /pnfs/dir/etc/$SAM_FILE

Dump a file

samweb get-metadata ${SAM_FILE}
samweb locate-file ${SAM_FILE}

Delete a file permanently

samweb retire-file $SAM_FILE
   samRm [OPTIONS] [-f FILE]  [-s FILEOFNAMES] [-d DATASET]
      -n interpret file lists, but don't actually do the delete
      -h print help

Check if a file is on tape

samweb locate-file $SAM_FILE
enstore:/pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090(48@vpe272)

"vpe272" is the tape volume label. "48" is the file position on the tape.

get more deep info, such as enstore crc or enstore file ID: For every file in dCache, there is a local name starting like "/pnfs/mu2e/tape/.." which can be converted to the universal name like "/pnfs/fnal.gov/usr/mu2e/tape/..". The universal name often works better in these enstore commands. Sometimes the only form that works is doubling the first part of the path: ""/pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/tape/..". Once you have a pnfsid or bfid, those always work to identify the file.

setup encp v3_11 -q stken

> BFID=`enstore pnfs --bfid /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art`
> enstore info --file $BFID

volume: VPE272
location_cookie: 0000_000000000_0000048
size: 2335324404
file_family: phy-sim
original_name: /pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art
map_file: 
pnfsid_file: 0000E7C4A992E86E4AF88E957FEAE686F5E5
pnfsid_map: 
bfid: CDMS142775179900000
origdrive: enmvr083:/dev/rmt/tps4d0n:576004003683
crc: 3288144023


some other commands

> enstore pnfs --info /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art

 > enstore info --file `cat /pnfs/fnal.gov/usr/mu2e/tape/phy-sim/sim/mu2e/cd3-cosmic-g4s2-target6/v621_v621/art/ce/80/".(id)(sim.mu2e.cd3-cosmic-g4s2-target6.v621_v621.000001_00003876.art)"`


 > enstore pnfs --layer  /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art 4
VPE272
0000_000000000_0000048
2335324404
phy-sim
/pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art

0000E7C4A992E86E4AF88E957FEAE686F5E5

CDMS142775179900000
enmvr083:/dev/rmt/tps4d0n:576004003683
3288144023

[the "4" in the above command refers to the layer. Layer 0
is the file itself.  Layers 1-4 are various information.]

 > enstore info --file  CDMS142775179900000                        
or
 > enstore info --file 0000E7C4A992E86E4AF88E957FEAE686F5E5
{'active_package_files_count': None,
 'archive_mod_time': None,
 'archive_status': None,
 'bfid': 'CDMS142775179900000',
 'cache_location': None,
 'cache_mod_time': None,
 'cache_status': None,
 'complete_crc': 3288144023L,
 'deleted': 'no',
 'drive': 'enmvr083:/dev/rmt/tps4d0n:576004003683',
 'external_label': 'VPE272',
 'file_family': 'phy-sim',
 'file_family_width': 1,
 'gid': 0,
 'library': 'CD-10KCF1',
 'location_cookie': '0000_000000000_0000048',
 'original_library': 'CD-10KCF1',
 'package_files_count': None,
 'package_id': None,
 'pnfs_name0': '/pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art',
 'pnfsid': '0000E7C4A992E86E4AF88E957FEAE686F5E5',
 'r_a': (('131.225.240.49', 53163),
         1L,
         '131.225.240.49-53163-1457645694.555144-16594-140381988148992'),
 'sanity_cookie': (65536L, 1641907538L),
 'size': 2335324404L,
 'storage_group': 'mu2e',
 'tape_label': 'VPE272',
 'uid': 0,
 'update': '2015-03-30 16:43:19.966574',
 'wrapper': 'cpio_odc'}



 
 > enstore info --file  /pnfs/fnal.gov/usr/mu2e/phy-etc/cnf/mu2e/cd3-beam-g4s4-proton/0918a/037/622/cnf.mu2e.cd3-beam-g4s4-proton.0918a.004001_00000125.fcl
{'active_package_files_count': 3001,
 'archive_mod_time': '2015-09-18 23:34:04',
...

>>> pnsfid to file name??
>enstore sfs --info 00006C08CEBDFCB240A4A3FC665D0DEA219A

volume: VPN204
location_cookie: 0000_000000000_0000926
size: 406918969
file_family: nova_production
original_name:
/pnfs/fnal.gov/usr/nova/production/daq/R17-03-09-prod3genie.g/nd/genie/000116/11616/neardet_genie_nonswap_genierw_fhc_v08_2500_r00011616_s14_c000_R17-03-09-prod3genie.g_v2_20171003_044326_sim.daq.root
map_file:
pnfsid_file: 00006C08CEBDFCB240A4A3FC665D0DEA219A
pnfsid_map:
bfid: CDMS150892823800000
origdrive: enmvr087:/dev/rmt/tps1d0n:576004004047
crc: 3204489581


web interface: port 8480 for reading, 8483 and https for writing (snapshots, projects, etc)

http://samweb.fnal.gov:8480/sam/mu2e/api
http://samweb.fnal.gov:8480/sam/EXPERIMENT/api/files/list/dimensions
http://samweb.fnal.gov:8480/sam/mu2e/definition_editor/
http://samweb.fnal.gov:8480/sam/mu2e/api/files/list/?dims=dh.dataset=sim.mu2e.example-beam-g4s1.1812a.art
http://samweb.fnal.gov:8480/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/metadata
http://samweb.fnal.gov:8480/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/locations

edit the admin users with https://samweb.fnal.gov:8483/sam/mu2e/admin/users