SamExpert: Difference between revisions
No edit summary |
|||
(10 intermediate revisions by 2 users not shown) | |||
Line 96: | Line 96: | ||
perform one step before running art - start the consumer. | perform one step before running art - start the consumer. | ||
<pre> | <pre> | ||
mu2einit | |||
# utlities you will need | # utlities you will need | ||
# -q grid defines the mgf functions which | # -q grid defines the mgf functions which | ||
Line 171: | Line 171: | ||
switch to jobsub_submit. | switch to jobsub_submit. | ||
<pre> | <pre> | ||
mu2einit | |||
# setup an offline, or whatever you need | # setup an offline, or whatever you need | ||
source Offline/setup.sh | source Offline/setup.sh | ||
Line 213: | Line 213: | ||
<!*********************************************************************> | <!*********************************************************************> | ||
==Notes on SAM projects and consumers== | |||
Projects: | |||
* in POMS DAG, starts SAM_START job which just starts the project and exits | |||
* DAG job will wait until all worker jobs are complete. Jobs in hold are not considered complete, since they might be released. If jobs are deleted from hold, then DAG can move on. | |||
* if all DAG workers are done, DAG will submit SAM_END job which will end the project | |||
* SAM project timeout, after no activity, is currently 72h, set in station configuration. | |||
* if jobs are left in hold, the SAM project will timeout, then DAG will eventually timeout, submit SAM_END job which will find project already closed. | |||
* will have "ended complete" status if all files were delivered to consumers by get-next-files calls, regardless of the file or process status | |||
* sam project status is, I think, "running", "ended complete", or "ended incomplete" | |||
Consumers: | |||
* call start-process | |||
* call get-next-file | |||
** if the process has not exceeded its file count limit and staged files are available, returns URL and RC=0 | |||
** the first call after the process has received all files up to consumer count limit, returns no output and RC=0, and stops the consumer on the server | |||
** second call after the process has received all files up to consumer count limit, returns RC=1 and message to stderr. This is because the process doesn't exist | |||
** if get-next is called before the consumer file count limit is reached, but the project has no more files, it will return no output and RC=0, and stop the consumer on the server | |||
** if no timeout is set but there are no staged files, then the call will block until a file is staged | |||
** is a timeout is set, and no files are staged before the timeout expires, then an error will be printed to stderr, nothing to stdout, and RC=1 | |||
* after using the file, call set-process-file-status with status "consumed" or "skipped". This is equivalent to calling release-file with --status=ok, or no status flag | |||
* after consumer is done, call set-process-status with status = "completed" or "bad". This can be done after the consumer is stopped on the server | |||
==mgf functions== | ==mgf functions== | ||
These functions are defined with | These functions are defined with | ||
Line 331: | Line 355: | ||
</pre> | </pre> | ||
== Touching a dataset == | == Touching a dataset == | ||
Example command to read a few bytes of each file and therefore reset their "last read" time and prevent them from being purged from disk. | Example command to read a few bytes of each file and therefore reset their "last read" time and prevent them from being purged from disk. | ||
samweb | vomsCert | ||
export X509_USER_PROXY=/tmp/x509up_u$UID | |||
samweb run-project --defname=rec.mu2e.CRV_wideband_cosmics.CRVWB-000-004-000.root --schema https 'echo %fileurl && curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath /etc/grid-security/certificates -H "Range: bytes=0-3" %fileurl && echo' | |||
<!*********************************************************************> | <!*********************************************************************> | ||
==Expert Procedures== | ==Expert Procedures== | ||
Line 387: | Line 413: | ||
samweb add-file-location ${SAM_FILE} /pnfs/dir/etc/$SAM_FILE | samweb add-file-location ${SAM_FILE} /pnfs/dir/etc/$SAM_FILE | ||
</pre> | </pre> | ||
<b>Add recent missing locations</b> | |||
<pre> | |||
export SAM_EXPERIMENT=mu2e | |||
samweb create-definition missing_tape_2021_07_22 " | |||
(start_time > '2021-07-01T00:00:00' and | |||
start_time < '2021-07-20T00:00:00' and | |||
full_path like '/pnfs/$SAM_EXPERIMENT/tape/%' | |||
) minus tape_label like '%'" | |||
samweb count-files defname:missing_tape_2021_07_22 | |||
sam_validate_dataset --tapeloc --name missing_tape_2021_07_22 | |||
samweb list-files defname:missing_tape_2021_07_22 > files_not_on_tape | |||
</pre> | |||
<b>Dump a file</b> | <b>Dump a file</b> | ||
<pre> | <pre> | ||
Line 395: | Line 435: | ||
<pre> | <pre> | ||
samweb retire-file $SAM_FILE | samweb retire-file $SAM_FILE | ||
</pre> | |||
Note that you can retire a file even if it has children. If you do that, then the child's metadata parent link looks like | |||
<pre> | |||
Parents: (Retired file dts.mu2e.CosmicCRYCat.MDC2020n_10h.001205_00000006.art - 92000301) | |||
</pre> | </pre> | ||
<pre> | <pre> | ||
Line 515: | Line 559: | ||
web interface: | web interface: | ||
<pre> | <pre> | ||
http:// | http://sammu2e.fnal.gov:8483/sam/mu2e/api | ||
http:// | http://sammu2e.fnal.gov:8483/sam/EXPERIMENT/api/files/list/dimensions | ||
http:// | http://sammu2e.fnal.gov:8483/sam/mu2e/definition_editor/ | ||
http:// | http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/list/?dims=dh.dataset=sim.mu2e.example-beam-g4s1.1812a.art | ||
http:// | http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/metadata | ||
http:// | http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/locations | ||
</pre> | </pre> | ||
edit the admin users with | edit the admin users with | ||
[https:// | [https://sammu2e.fnal.gov:8483/sam/mu2e/admin/users https://sammu2e.fnal.gov:8483/sam/mu2e/admin/users] | ||
===Repair stuck SAM projects=== | |||
An old improperly-completed project (probably due to dCache outages) may be locking files so that can't be modified (such as retired). | |||
samweb list-projects --experiment=mu2e --state=reserved | |||
Try to "start-project" then "stop-project". If that doesn't work, here is another method: | |||
<pre> | |||
1) do a "samweb -e mu2e find-project kutschke_sim.mu2e.cd3-cosmic-g4s1-general.0715a.art_prestage_20160605115431" | |||
2) do a "wget --no-check-certificate -O - " on the resulting URL to get the station to do a "get_project" internally so it would "know about" the project. | |||
3) THEN I could do a "samweb -e mu2e stop-project kutschke_sim.mu2e.cd3-cosmic-g4s1-general.0715a.art_prestage_20160605115431" and actually end the project. | |||
</pre> | |||
===Metadata for a retired file=== | |||
get file ID number, and use that in the get-metadata | |||
samweb list-files --fileinfo "file_name=dts.mu2e.CosmicCORSIKACalibAll.MDC2020ae.001202_00004996.art and availability:retired,anylocation" | |||
samweb get-metadata 99346436 | |||
==SAM Station Notes== | |||
Station configuration 3/2023 | |||
Main file | |||
<pre> | |||
experiment: mu2e | |||
station: mu2e | |||
tier: prd | |||
sam_base_url: https://samwebgpvm06.fnal.gov:8483/sam/mu2e/api | |||
concurrent_samweb_requests: 25 | |||
authentication: | |||
x509-certificate: /home/sam/private/gsi/samcert.pem | |||
x509-key: /home/sam/private/gsi/samkey.pem | |||
logging: | |||
level: DEBUG | |||
log_dir: /home/sam/logs/station_mu2e | |||
console: False | |||
local_db_dir: /var/tmp/ | |||
schema_mapping: /home/sam/config/file_schema_mapping.json | |||
web_server: | |||
#listen_host: 127.0.0.1 | |||
#listen_port: 30009 | |||
listen_unix: /var/tmp/station_mu2e.sock | |||
#listen_backlog: 256 | |||
external_url: https://samwebgpvm05.fnal.gov:8483/sam/mu2e/stations/mu2e | |||
log_http_proxy: True | |||
projects: | |||
location_map_file: /home/sam/config/mu2e_location_map.yaml | |||
project_idle_timeout: 259200 | |||
monitoring: | |||
kafka_cluster_uris: PLAINTEXT lssrv03:9092,lssrv04:9092,lssrv05:9092 | |||
kafka_monitoring_topic: 'ingest.sam.events' | |||
dcache: | |||
handler: http | |||
http_limit: 50 | |||
dcache_timeout: 10 | |||
dcache_uri: 'https://fndca.fnal.gov:3880' | |||
poll_staging_file_interval: 300 | |||
location_refresh_period: 14400 | |||
</pre> | |||
mu2e_locations_map | |||
<pre> | |||
- node: . | |||
locations: [ dcache, enstore, dbdata0vm.fnal.gov ] | |||
schemas: [ gsiftp, root, srm, https ] | |||
</pre> | |||
file_schema_mapping | |||
<pre> | |||
{ | |||
"parameters" : { | |||
"bluearc" : "%{experiment}data|minerva_bluearc", | |||
"dcache" : "fnal-dcache|dcache|enstore" | |||
}, | |||
"schemas" : { | |||
"gsiftp" : { | |||
"%dcache" : [ "/pnfs/(.*)", "gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/\\1/%filename" ], | |||
"%bluearc" : "gsiftp://fg-bestman1.fnal.gov:2811%path/%filename", | |||
"cern-eos" : "gsiftp://eospublicftp.cern.ch%path/%filename" | |||
}, | |||
"xroot, root" : { | |||
"%dcache" : [ "/pnfs/([^/]+/.*)", "%{schema}://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/\\1/%filename" ], | |||
"cern-eos" : "root://eospublic.cern.ch/%path/%filename", | |||
"castor" : "root://castorpublic.cern.ch//castor/cern.ch%path/%filename", | |||
"ph.liv.ac.uk" : "root://hepgrid11.ph.liv.ac.uk/%path/%filename", | |||
"hep.manchester.ac.uk" : "root://bohr3226.tier2.hep.manchester.ac.uk/%path/%filename", | |||
"gridpp.ecdf.ed.ac.uk" : "root://srm-rdf.gridpp.ecdf.ed.ac.uk:/%path/%filename", | |||
"particle.cz" : "root://golias100.farm.particle.cz:/%path/%filename", | |||
"echo.stfc.ac.uk" : "root://xrootd.echo.stfc.ac.uk/dune:%path/%filename" | |||
}, | |||
"http, https" : { | |||
"%dcache" : [ "/pnfs/(.*)", "https://fndca1.fnal.gov:2880/\\1/%filename" ], | |||
"ph.liv.ac.uk" : "https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home%path/%filename", | |||
"dbdata0vm.fnal.gov" : "https://dbdata0vm.fnal.gov:8444%path/%filename" | |||
}, | |||
"srm" : { | |||
"%dcache" : [ "/pnfs/(.*)", "srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/\\1/%filename" ], | |||
"%bluearc" : "srm://fg-bestman1.fnal.gov:10443/srm/v2/server?SFN=%{path}/%{filename}", | |||
"ph.liv.ac.uk" : "srm://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home%path/%filename" | |||
}, | |||
"dcap" : { | |||
"%dcache" : [ "/pnfs/(.*)", "dcap://fndca1.fnal.gov:24125/pnfs/fnal.gov/usr/\\1/%filename" ] | |||
}, | |||
"file" : { | |||
"." : "file://%path/%filename" | |||
}, | |||
"s3" : { | |||
"s3" : "s3:/%{path}/%filename" | |||
} | |||
} | |||
} | |||
</pre> | |||
[[Category:Computing]] | [[Category:Computing]] | ||
[[Category:Workflows]] | [[Category:Workflows]] | ||
[[Category:DataHandling]] | [[Category:DataHandling]] |
Latest revision as of 17:58, 15 October 2024
Introduction
This pages just captures some documentation which will be useful in the future.
SAM Locator Process
Around 2020, someone turned on our SAM station "locator process". This process run continuously in the background and adds locations to SAM records. Every night it receives a list of files which have moved from tape-backed dCache disk only, to also actually being on tape. If the file already has a location entry of the form "enstore:/pnfs/path", with the correct path, then the locator will add the tape volume and file positions strings to this SAM location string.
DD vs snaps
It would be logical to specify data by a snapshot id when submitting grid jobs, since it is fixed file list, but jobsub doesn't allow this yet - it only allows specifying data by a dataset definition. This is fine for most work since most datasets are fixed. If you need to submit a job to run on a snapshot, the work-around is to create a dataset definition based on a snapshot:
export SAM_DD_SNAP=${USER}_test_0_snap samweb create-definition $SAM_DD_SNAP "snapshot_id=$SAM_SNAP_ID"
This dataset definition then represents a permanent, fixed file list.
<! *****************************************************************>
Reading Data with SAM
When a grid job is submitted to read the files in a SAM dataset, the file list is specified by a dataset definition.
The are two parts to running a SAM job. First there is a control process, one per job, which starts up a SAM project on a SAM station. This process will keep track of what files are going to what job sections. The second part is the consumer, a processes which is reading the files. There are typically many consumers, one for each job section on the grid.
The jobsub command, with the right switches set, can act as the control process, or you can run the commands explicitly yourself.
A consumer may be a script running SAM commands, or an art executable which is configured to ask a SAM project for input files. These different options for control and consumer process can be chosen independently.
The combination of using jobsub to handle the control process and art to handle the consumer process is by far the most common usage - start there first.
A consumer contacts the project on the SAM station to establish itself as a consumer and when it is ready for data asks for the next available file. The station responds with a filespec and the consumer uses the "ifdh cp" command to copy the file out of dCache and onto the worker node where it can be processed. The art input module will automatically delete files after they are closed but a script-based consumer will have to delete the files itself. The consumers run until there are no more files left. You can set a maximum number of files to be delivered to a consumer. When the maximum number of files are delivered to a consumer, or when project sees all files have been delivered, the project will return an empty string when asked for the next file.
<! ********************************************************* >
Running on the Grid with jobsub, SAM, and art
This will be the most common method for reading files through SAM. The control process will be performed by jobsub. An art exe will be configured to act as a consumer. At this time (5/2015) mu2egrid has not been updated to include this functionality, though that is part of the plan.
The jobsub_submit command only requires adding one switch to provide the dataset definition which defines the input data. You need a dataset definition string to give to this switch. It will look like:
export SAM_DD=sim.mu2e.example-beam-g4s1.1812a.art
See the discussion above for how to create a dataset definition, or more likley you will get these datasets definitions from an expert, or datasets page or TDR sample If you set SAM_FILE_LIMIT=N and pass it to jobsub with -e, this will be the maximum files given to each copy of mu2e exe run in your job (each consumer).
jobsub_submit --dataset_definition=$SAM_DD [...]
jobsub_client will start a SAM project for you. It will set several environmentals in the grid job processes, which we will use to setup the consumers.
The consumer in this example is an art executable. You will have to
perform one step before running art - start the consumer.
mu2einit # utlities you will need # -q grid defines the mgf functions which # you don't generally need interactively setup dhtools -q grid # setup an offline source Offline setup.sh # set a limit on files for this exe, if you want.. export SAM_FILE_LIMIT=2 # this is a bash function supplied by dhtools mgf_sam_start_consumer -v
Now start the art exe with
mu2e --sam-web-uri=$SAM_PROJECT_URL --sam-process-id=$SAM_CONSUMER_ID [...]
The environmentals were set by the mgf function. No other input or services configuration is needed. The exe will contact the SAM project on the SAM station to get file url's and call ifdh to move the files to the grid node. It will also delete the files when it is done.
The art module will detect when the SAM station stops delivering files and will stop the exe. You should then shut down this consumer:
# this consumer is done reading files mgf_sam_stop_consumer -v
The "-v" option just prints some checks and other info, you can leave it off for a quiet operation.
<!*********************************************************************>
Running SAM with Scripts
This section describes how to start a SAM project, create consumers, get files, and finish a project all though using script commands instead of jobsub and art. You might use this if you were operating on the input dataset in some other way than reading with art. We would expect that most users would not need this type of operation.
There are two parts. The first can be called the
the "control" process which starts and stops the project. This is run once per job and would probably be done interactively. (When using jobsub_submit, the jobsub process performs this function.) The second part is the consumer. Typically the consumer script will be run many times, one per section in the grid job. (The art input module can also do this, except for establishing the consumer which is done at the command level.)
The Control process - begin This process first needs to create a file selection criteria and use it to create a dataset definition. Usually this will just be handed to you as a dataset name, like "sim.mu2e.example-beam-g4s1.1812a.art", which is also a special dataset definition name. Doing the control process steps explicitly replaces the actions done by jobsub_submit when the --dataset_definition is set.
export SAM_DD=sim.mu2e.example-beam-g4s1.1812a.art mgf_start_project -v
The consumer process
This script would typically be part of the grid job. The consumer process first establishes itself as a consumer of the project. If jobsub_submit is used as the control process, then it will set SAM_PROJECT_URL in the grid environment. If you started the project, then you will need to explcity pass this to the gird enviroment with -e SAM_PROJECT_URL switch to jobsub_submit.
mu2einit # setup an offline, or whatever you need source Offline/setup.sh # define mgf functions setup dhtools -q grid # prepare sam project to give us files mgf_sam_start_consumer -v # get a file url mgf_sam_getnextfile -v while [ "$SAM_FILE_URL" != "" ]; do # copy the file locally # could also "ifdh cp $SAM_FILE_URL $SAM_FILE" mgf_ifdh_with_backoff $SAM_FILE_URL $SAM_FILE # use local file $SAM_FILE here # tell sam you are done with this file mgf_sam_releasefile ok # you need to delete it rm -f $SAM_FILE # see if ther eis another one mgf_sam_getnextfile -v done # the SAM project stopped givign us files mgf_sam_stop_consumer -v
The Control process - end
When the consumers are all done, the project should be ended:
# requires SAM_PROJECT to be set (would be set by mgf_start_project) mgf_stop_project -v
Don't worry about projects and consumers that might be forgotten or stopped incorrectly, they will eventually be stopped by SAM.
<!*********************************************************************>
Notes on SAM projects and consumers
Projects:
- in POMS DAG, starts SAM_START job which just starts the project and exits
- DAG job will wait until all worker jobs are complete. Jobs in hold are not considered complete, since they might be released. If jobs are deleted from hold, then DAG can move on.
- if all DAG workers are done, DAG will submit SAM_END job which will end the project
- SAM project timeout, after no activity, is currently 72h, set in station configuration.
- if jobs are left in hold, the SAM project will timeout, then DAG will eventually timeout, submit SAM_END job which will find project already closed.
- will have "ended complete" status if all files were delivered to consumers by get-next-files calls, regardless of the file or process status
- sam project status is, I think, "running", "ended complete", or "ended incomplete"
Consumers:
- call start-process
- call get-next-file
- if the process has not exceeded its file count limit and staged files are available, returns URL and RC=0
- the first call after the process has received all files up to consumer count limit, returns no output and RC=0, and stops the consumer on the server
- second call after the process has received all files up to consumer count limit, returns RC=1 and message to stderr. This is because the process doesn't exist
- if get-next is called before the consumer file count limit is reached, but the project has no more files, it will return no output and RC=0, and stop the consumer on the server
- if no timeout is set but there are no staged files, then the call will block until a file is staged
- is a timeout is set, and no files are staged before the timeout expires, then an error will be printed to stderr, nothing to stdout, and RC=1
- after using the file, call set-process-file-status with status "consumed" or "skipped". This is equivalent to calling release-file with --status=ok, or no status flag
- after consumer is done, call set-process-status with status = "completed" or "bad". This can be done after the consumer is stopped on the server
mgf functions
These functions are defined with
setup dhtools -q grid
They are designed to combine several samweb commands and checks into one command. In general they would only be used in a grid job.
# mgf_tee # # echo message to stdout and stderr # useful for coordinating output in job's .out and .err # # mgf_date # # echo date and message to stdout and stderr # useful for coordinating output in job's .out and .err # # mgf_section_name # # sets MGF_SECTION_NAME=cluster_process (formatted) # # mgf_system # # print info about the system # -l long version # -v longer version # # mgf_sam_start_consumer # # Starts a consumer for the sam project # the job should have been submitted with sam settings # you need to setup ifdh or setup a base release # you need to setup sam_web_client # requires SAM_PROJECT to be set (jobsub will do that) # you may set SAM_FILE_LIMT # # return environmental SAM_CONSUMER_ID # # -v verbose # # mgf_sam_getnextfile # # Get the next file for a consumer. # This function is only used if mu2e executable is not used to read files. # The job should have been submitted with sam settings # You need to setup ifdh and sam_web_client # Requires SAM_CONSUMER_ID to be set (mgf_start_sam_consumer will do that) # # returns environmental SAM_FILE_URL (probably a gridftp url suitable for ifdh) # and SAM_FILE which is the basename. These are empty if there are no more files. # # -v verbose # # mgf_sam_releasefile # # If getnextfile was called to get a file, then the file should be # released with this method. This function is not used if # an art executable is used with sam switches. # Requires SAM_PROJECT_URL SAM_CONSUMER_ID SAM_FILE_URL # Ideally, you set SAM_FILE_STATUS=ok (or notOk) according # to whether the procesing was successfull. You can also pass # this as the first argument. # You need to setup sam_web_client # # mgf_sam_stop_consumer # # Stops the consumer for the sam project. # Requires SAM_PROJECT and SAM_CONSUMER_ID # to be set. # # -v verbose # # mgf_start project # # Start a SAM project. Not used in typical grid jobs. # Requires SAM_DD to be set to a dataset definition # to be set. If SAM_PROJECT is set, it is used. # Sets SAM_PROJECT and SAM_PROJECT_URL # # -v verbose # # mgf_stop_project # # Stop a SAM project. Not used in typical grid jobs. # Requires SAM_PROJECT to be set. # # -v verbose # # mgf_ifdh_with_backoff # # Execute an ifdh command with retries and backoff # Default command is "ifdh cp $1 $2", can be redefined with # MGF_IFDH_COMMAND. The retries have the following # sleep pattern "600 1800 3600 3600" in seconds, which can be changed # with MGF_IFDH_SLEEP_PLAN # #
Touching a dataset
Example command to read a few bytes of each file and therefore reset their "last read" time and prevent them from being purged from disk.
vomsCert export X509_USER_PROXY=/tmp/x509up_u$UID samweb run-project --defname=rec.mu2e.CRV_wideband_cosmics.CRVWB-000-004-000.root --schema https 'echo %fileurl && curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath /etc/grid-security/certificates -H "Range: bytes=0-3" %fileurl && echo'
<!*********************************************************************>
Expert Procedures
look up a file history, including staging and purging
get pnfsid cat '/pnfs/mu2e/tape/usr-sim/dig/oksuzian/CRY-cosmic-general/cry3-digi-hi/art/99/d1/.(id)(dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.001002_00068805.art)'
go to kibana and enter
pnfsid:<id>
or
file_name:dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.001002_00035646.art
fife_utils check fraction on disk
sam_validate_dataset --stage_status --name dig.oksuzian.CRY-cosmic-general.cry3-digi-hi.art
These are not needed by users, included here for completeness.
Values SAM have two levels of metadata entries, those intrinsic to SAM, which are called values and those that defined by the experiment, called parameters. Values can only be one of a list of values.
samweb list-values --help-categories samweb list-values data_tiers
Mu2e collaborators with admin privileges can add to the list of allowed entries:
samweb add-value data_tiers "raw"
Parameters Parameters are arbitrary metadata fields which can be added to the metadata, for all files, by Mu2e collaborators with admin privileges. They have the form
category.name
Currently, we have the categories: dh, mc, job. Information generated in data handling procedures is stored in dh, mc description is stored in mc, and job is used in processing. Parameters can be listed:
samweb list-parameters ... dh.first_subrun (true_int) ... mc.primary_particle (string) ...
Parameters may be of types:
string true_float true_int
and new ones may be created:
samweb add-parameter dh.first_subrun_event true_int
Declare a file
export SAM_FILE=foo.bar samweb declare-file ${SAM_FILE}.json
where the json file contains all required metadata for the file
Add a location to a file
samweb add-file-location ${SAM_FILE} /pnfs/dir/etc/$SAM_FILE
Add recent missing locations
export SAM_EXPERIMENT=mu2e samweb create-definition missing_tape_2021_07_22 " (start_time > '2021-07-01T00:00:00' and start_time < '2021-07-20T00:00:00' and full_path like '/pnfs/$SAM_EXPERIMENT/tape/%' ) minus tape_label like '%'" samweb count-files defname:missing_tape_2021_07_22 sam_validate_dataset --tapeloc --name missing_tape_2021_07_22 samweb list-files defname:missing_tape_2021_07_22 > files_not_on_tape
Dump a file
samweb get-metadata ${SAM_FILE} samweb locate-file ${SAM_FILE}
Delete a file permanently
samweb retire-file $SAM_FILE
Note that you can retire a file even if it has children. If you do that, then the child's metadata parent link looks like
Parents: (Retired file dts.mu2e.CosmicCRYCat.MDC2020n_10h.001205_00000006.art - 92000301)
samRm [OPTIONS] [-f FILE] [-s FILEOFNAMES] [-d DATASET] -n interpret file lists, but don't actually do the delete -h print help
Check if a file is on tape
samweb locate-file $SAM_FILE enstore:/pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090(48@vpe272)
"vpe272" is the tape volume label. "48" is the file position on the tape.
get more deep info, such as enstore crc or enstore file ID: For every file in dCache, there is a local name starting like "/pnfs/mu2e/tape/.." which can be converted to the universal name like "/pnfs/fnal.gov/usr/mu2e/tape/..". The universal name often works better in these enstore commands. Sometimes the only form that works is doubling the first part of the path: ""/pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/tape/..". Once you have a pnfsid or bfid, those always work to identify the file.
setup encp v3_11 -q stken > BFID=`enstore pnfs --bfid /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art` > enstore info --file $BFID volume: VPE272 location_cookie: 0000_000000000_0000048 size: 2335324404 file_family: phy-sim original_name: /pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art map_file: pnfsid_file: 0000E7C4A992E86E4AF88E957FEAE686F5E5 pnfsid_map: bfid: CDMS142775179900000 origdrive: enmvr083:/dev/rmt/tps4d0n:576004003683 crc: 3288144023 some other commands > enstore pnfs --info /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art > enstore info --file `cat /pnfs/fnal.gov/usr/mu2e/tape/phy-sim/sim/mu2e/cd3-cosmic-g4s2-target6/v621_v621/art/ce/80/".(id)(sim.mu2e.cd3-cosmic-g4s2-target6.v621_v621.000001_00003876.art)"` > enstore pnfs --layer /pnfs/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art 4 VPE272 0000_000000000_0000048 2335324404 phy-sim /pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art 0000E7C4A992E86E4AF88E957FEAE686F5E5 CDMS142775179900000 enmvr083:/dev/rmt/tps4d0n:576004003683 3288144023 [the "4" in the above command refers to the layer. Layer 0 is the file itself. Layers 1-4 are various information.] > enstore info --file CDMS142775179900000 or > enstore info --file 0000E7C4A992E86E4AF88E957FEAE686F5E5 {'active_package_files_count': None, 'archive_mod_time': None, 'archive_status': None, 'bfid': 'CDMS142775179900000', 'cache_location': None, 'cache_mod_time': None, 'cache_status': None, 'complete_crc': 3288144023L, 'deleted': 'no', 'drive': 'enmvr083:/dev/rmt/tps4d0n:576004003683', 'external_label': 'VPE272', 'file_family': 'phy-sim', 'file_family_width': 1, 'gid': 0, 'library': 'CD-10KCF1', 'location_cookie': '0000_000000000_0000048', 'original_library': 'CD-10KCF1', 'package_files_count': None, 'package_id': None, 'pnfs_name0': '/pnfs/fnal.gov/usr/mu2e/pnfs/fnal.gov/usr/mu2e/phy-sim/sim/mu2e/tdr-beam-mixp3-x050/1716a/001/090/sim.mu2e.tdr-beam-mixp3-x050.1716a.16417890_000024.art', 'pnfsid': '0000E7C4A992E86E4AF88E957FEAE686F5E5', 'r_a': (('131.225.240.49', 53163), 1L, '131.225.240.49-53163-1457645694.555144-16594-140381988148992'), 'sanity_cookie': (65536L, 1641907538L), 'size': 2335324404L, 'storage_group': 'mu2e', 'tape_label': 'VPE272', 'uid': 0, 'update': '2015-03-30 16:43:19.966574', 'wrapper': 'cpio_odc'} > enstore info --file /pnfs/fnal.gov/usr/mu2e/phy-etc/cnf/mu2e/cd3-beam-g4s4-proton/0918a/037/622/cnf.mu2e.cd3-beam-g4s4-proton.0918a.004001_00000125.fcl {'active_package_files_count': 3001, 'archive_mod_time': '2015-09-18 23:34:04', ... >>> pnsfid to file name?? >enstore sfs --info 00006C08CEBDFCB240A4A3FC665D0DEA219A volume: VPN204 location_cookie: 0000_000000000_0000926 size: 406918969 file_family: nova_production original_name: /pnfs/fnal.gov/usr/nova/production/daq/R17-03-09-prod3genie.g/nd/genie/000116/11616/neardet_genie_nonswap_genierw_fhc_v08_2500_r00011616_s14_c000_R17-03-09-prod3genie.g_v2_20171003_044326_sim.daq.root map_file: pnfsid_file: 00006C08CEBDFCB240A4A3FC665D0DEA219A pnfsid_map: bfid: CDMS150892823800000 origdrive: enmvr087:/dev/rmt/tps1d0n:576004004047 crc: 3204489581
web interface:
http://sammu2e.fnal.gov:8483/sam/mu2e/api http://sammu2e.fnal.gov:8483/sam/EXPERIMENT/api/files/list/dimensions http://sammu2e.fnal.gov:8483/sam/mu2e/definition_editor/ http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/list/?dims=dh.dataset=sim.mu2e.example-beam-g4s1.1812a.art http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/metadata http://sammu2e.fnal.gov:8483/sam/mu2e/api/files/name/sim.mu2e.example-beam-g4s1.1812a.16638329_000018.art/locations
edit the admin users with https://sammu2e.fnal.gov:8483/sam/mu2e/admin/users
Repair stuck SAM projects
An old improperly-completed project (probably due to dCache outages) may be locking files so that can't be modified (such as retired).
samweb list-projects --experiment=mu2e --state=reserved
Try to "start-project" then "stop-project". If that doesn't work, here is another method:
1) do a "samweb -e mu2e find-project kutschke_sim.mu2e.cd3-cosmic-g4s1-general.0715a.art_prestage_20160605115431" 2) do a "wget --no-check-certificate -O - " on the resulting URL to get the station to do a "get_project" internally so it would "know about" the project. 3) THEN I could do a "samweb -e mu2e stop-project kutschke_sim.mu2e.cd3-cosmic-g4s1-general.0715a.art_prestage_20160605115431" and actually end the project.
Metadata for a retired file
get file ID number, and use that in the get-metadata
samweb list-files --fileinfo "file_name=dts.mu2e.CosmicCORSIKACalibAll.MDC2020ae.001202_00004996.art and availability:retired,anylocation" samweb get-metadata 99346436
SAM Station Notes
Station configuration 3/2023
Main file
experiment: mu2e station: mu2e tier: prd sam_base_url: https://samwebgpvm06.fnal.gov:8483/sam/mu2e/api concurrent_samweb_requests: 25 authentication: x509-certificate: /home/sam/private/gsi/samcert.pem x509-key: /home/sam/private/gsi/samkey.pem logging: level: DEBUG log_dir: /home/sam/logs/station_mu2e console: False local_db_dir: /var/tmp/ schema_mapping: /home/sam/config/file_schema_mapping.json web_server: #listen_host: 127.0.0.1 #listen_port: 30009 listen_unix: /var/tmp/station_mu2e.sock #listen_backlog: 256 external_url: https://samwebgpvm05.fnal.gov:8483/sam/mu2e/stations/mu2e log_http_proxy: True projects: location_map_file: /home/sam/config/mu2e_location_map.yaml project_idle_timeout: 259200 monitoring: kafka_cluster_uris: PLAINTEXT lssrv03:9092,lssrv04:9092,lssrv05:9092 kafka_monitoring_topic: 'ingest.sam.events' dcache: handler: http http_limit: 50 dcache_timeout: 10 dcache_uri: 'https://fndca.fnal.gov:3880' poll_staging_file_interval: 300 location_refresh_period: 14400
mu2e_locations_map
- node: . locations: [ dcache, enstore, dbdata0vm.fnal.gov ] schemas: [ gsiftp, root, srm, https ]
file_schema_mapping
{ "parameters" : { "bluearc" : "%{experiment}data|minerva_bluearc", "dcache" : "fnal-dcache|dcache|enstore" }, "schemas" : { "gsiftp" : { "%dcache" : [ "/pnfs/(.*)", "gsiftp://fndca1.fnal.gov:2811/pnfs/fnal.gov/usr/\\1/%filename" ], "%bluearc" : "gsiftp://fg-bestman1.fnal.gov:2811%path/%filename", "cern-eos" : "gsiftp://eospublicftp.cern.ch%path/%filename" }, "xroot, root" : { "%dcache" : [ "/pnfs/([^/]+/.*)", "%{schema}://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/\\1/%filename" ], "cern-eos" : "root://eospublic.cern.ch/%path/%filename", "castor" : "root://castorpublic.cern.ch//castor/cern.ch%path/%filename", "ph.liv.ac.uk" : "root://hepgrid11.ph.liv.ac.uk/%path/%filename", "hep.manchester.ac.uk" : "root://bohr3226.tier2.hep.manchester.ac.uk/%path/%filename", "gridpp.ecdf.ed.ac.uk" : "root://srm-rdf.gridpp.ecdf.ed.ac.uk:/%path/%filename", "particle.cz" : "root://golias100.farm.particle.cz:/%path/%filename", "echo.stfc.ac.uk" : "root://xrootd.echo.stfc.ac.uk/dune:%path/%filename" }, "http, https" : { "%dcache" : [ "/pnfs/(.*)", "https://fndca1.fnal.gov:2880/\\1/%filename" ], "ph.liv.ac.uk" : "https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home%path/%filename", "dbdata0vm.fnal.gov" : "https://dbdata0vm.fnal.gov:8444%path/%filename" }, "srm" : { "%dcache" : [ "/pnfs/(.*)", "srm://fndca1.fnal.gov:8443/srm/managerv2?SFN=/pnfs/fnal.gov/usr/\\1/%filename" ], "%bluearc" : "srm://fg-bestman1.fnal.gov:10443/srm/v2/server?SFN=%{path}/%{filename}", "ph.liv.ac.uk" : "srm://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home%path/%filename" }, "dcap" : { "%dcache" : [ "/pnfs/(.*)", "dcap://fndca1.fnal.gov:24125/pnfs/fnal.gov/usr/\\1/%filename" ] }, "file" : { "." : "file://%path/%filename" }, "s3" : { "s3" : "s3:/%{path}/%filename" } } }