Rucio
Introduction
In spring 2024, Mu2e is planning ot migrate from the SAM file catalog and tools, to a new set of tools. The new system would consist of these parts
- Metacat - a database of file metadata (docs GUI)
- Rucio - a database of file locations, and servers which can move and track data, responding to user rules (docs)
- Data Dispatcher - a replacement for SAM project file delivery (docs GUI)
- mdh - Mu2e data-handling commands added to supplement the above systems (see
mdh -h
) - we expect little to no interaction of users and Rucio or DataDispatcher, almost all user work can be done with metacat and mdh
A few overarching concepts to keep in mind
- metacat recognizes authentication with tokens, but Rucio used x509 proxies. mdh will make either from a kerberos ticket, as needed.
- metacat requires you to be authenticated to write to the database (create files, datasets)
- all files belong to a namespace, also known as the scope. Namespaces can't be deleted.
- the combination of namespace:filename uniquely identifies a file and is called a did
- if you create a namespace, it must start with your username, like ${USER}_myana (by policy)
- Rucio scopes and standard dataset names will follow the metacat conventions (by policy)
- files must be named by the naming convention (by policy)
- all files are readable by all users
- metacat has roles. Users can be members of a role, and then the role can create and own objects
- files records are not deleted - either retired (no new file of the same name can be created) or modified
- when declared, a file must belong to at least one existing dataset (which might not be in the same namespace)
Quick start
setup
setup mu2e setup mdh
will setup all related data-handling tools
Authentication
As of 1/2024, all collaboration members should have a metacat account, but rucio accounts have to be made by hand.
To use metacat commands to list files and other database content, you do not need authentication. To use metacat commands that involve a database write, you must authenticate yourself to metacat.
metacat auth login -m token $USER
- if you get a token file not found, please run getToken, or see token docs
- if you get Authentication failed, you might not have an account
Your authentication lasts as long as your token valid period. To check your authentication
metacat auth list
There is no logout
If your authentication is expired, and you attempt a write command, the only error you get may be Connection reset by peer
.
Data dispatcher uses the same authentication plan.
To use mdh commands, you only need a kerberos ticket and mdh will manage your authentication.
Rucio uses an x509 proxy, which you can get from a kerberos ticket using vomsCert. Rucio requires the proxy for all commands. Rucio should change to tokens at some point.
Listing
metacat namespace list # list all namespaces metacat namespace list -u $USER # list your namespaces metacat namespace list -u mu2epro # list production namespaces metacat dataset list # list all datasets metacat dataset list $USER:* # list your datasets metacat dataset list mu2e:dig.*MDC2020* # list production datasets metacat file show rlc:mcs.rlc.dh_test.001.001200_000000.art -m -p # print metadata about one file metacat query "datasets matching mu2e:dig.mu2e*" metacat query "files from rlc:mcs.rlc.dh_test.001.art" metacat query "files from rlc:mcs.rlc.dh_test.001.art where rs.first_subrun=0"
creating
metacat namespace create $USER # your personal namespace metacat namespace create -o pro sim # a namespace owned by a role metacat dataset create rlc:test1 -M -m @ds.json "some comment" # create an ad-hoc dataset ds.json contains: { "ds.myMetadata" : "myValue" } metacat dataset create rlc: -M -m @ds.json "some comment" # create an ad-hoc dataset metacat dataset create rlc:mcs.rlc.dh_test.001.art "first test" # a formal file dataset metacat dataset create sim:mcs.mu2e.dh_test.002.art "test files for a role" # a new pro dataset # make json metadata for a file -s = scope (namespace) f,n,v,p options to add to metadata mdh file-json -s rlc -f production -n Reco -v 000-000-000 -p mcs.mu2e.dh_test_parent.001.001200_000000.art mcs.mu2e.dh_test.001.001200_000000.art # all files must be added to a dataset when they are created # the dataset must be official dataset name, by policy metacat file declare -f mcs.rlc.dh_test.001.001200_000000.art.json rlc:mcs.rlc.dh_test.001.art ls *.json | while read FF; do metacat file declare -f $FF sim:mcs.mu2e.dh_test.002.art; done
Implementation
export METACAT_SERVER_URL=https://metacat.fnal.gov:9443/mu2e_meta_prod/app export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/mu2e export DATA_DISPATCHER_URL=https://metacat.fnal.gov:9443/mu2e_dd_prod/data export DATA_DISPATCHER_AUTH_URL=https://metacat.fnal.gov:8143/auth/mu2e Rucio request /pnfs/mu2e/tape /pnfs/mu2e/persistent/datasets /pnfs/mu2e/scratch/datasets (expect to have a greedy cleanup of two weeks) Rucio 6/21/23 two nondeterministic RSEs FNAL_DCACHE_SCRATCH FNAL_DCACHE_PERSISTENT
Admin
- create new metacat accounts via the GUI
- "anonymized user Id" is for token access and is the text from the "sub" field from the user's token. Enabling token access is required.
- DN is for proxy access and easiest to get from the user account and
metacat auth mydn -c /tmp/x509up_u$UID
. Could also be left blank - password can be left blank since we expect only token access
- create new metacat roles via the GUI
- add role on the role metacat GUI
- add user to a role via the user's page (not role page)
- create new Rucio accounts via the command line
create user namespace
First make sure your namespace exists
metacat namespace list $USER
if you don't see your namespace you can run this command (only ever once).
kinit getToken metacat auth login -m token $USER metacat namespace create $USER