Rucio

From Mu2eWiki
Revision as of 19:35, 12 September 2023 by Rlc (talk | contribs) (→‎Listing)
Jump to navigation Jump to search

Introduction

Rucio is a CERN software system for storing file metadata and organizing the delivery of that data to users. Its primary features are scalability, flexibility, adaptive file replication, and built-in monitoring. It can use various backends for databases, various platforms for its servers and daemons, various transfer and storage method plug-ins, and a command line and python interface for users.

The new system would consist of these parts

  • Metacat - a database of file metadata (docs GUI)
  • Rucio - a database of file locations, and servers which can move and track data, responding to user rules (docs)
  • Data Dispatcher - a modern replacement for SAM project file delivery (docs GUI)
  • mdh - Mu2e data-handling commands added to supplement the above systems (see mdh -h)

A few overarching concepts to keep in mind

  • these system only recognition authentication with tokens
  • metacat requires you to be authenticated to write to the database (create files, datasets)
  • all files belong to a namespace, also known as the scope. Namespaces can't be deleted.
    • if you create a namespace, it must start with your username (by policy)
    • the combination of namespace:filename uniquely identifies a file and is called a did
  • files must be named by the naming convention (by policy)
  • all files are readable by all users
  • metacat has roles. Users can be members of a role, and then the role can create and own objects
  • files records are not deleted - either retired (no new file of the same name can be created) or modified
  • when declared, a file must belong to at least one existing dataset (which might not be in the same namespace)

Quick start

setup

setup mu2e
setup mdh

will setup all related data-handling tools

Authentication

Authenticate yourself

metacat auth login -m token $USER
  • if you get a token file not found, please run getToken, or see token docs
  • if you get Authentication failed, you might not have an account

Your authentication lasts as long as your token valid period. To check your authentication

metacat auth list

There is no logout If your authentication is expired, and you attempt a write command, the only error you get may be Connection reset by peer.

Listing

 metacat namespace list             # list all namespaces
 metacat namespace list -u $USER    # list your namespaces
 metacat namespace list -u mu2epro  # list production namespaces

 metacat dataset list              # list all datasets
 metacat dataset list $USER:*      # list your datasets
 metacat dataset list mu2e:dig.*MDC2020*      # list production datasets

 metacat file show rlc:mcs.rlc.dh_test.001.001200_000000.art -m -p    # print metadata about one file

 metacat query "datasets matching mu2e:dig.mu2e*"

 metacat query "files from rlc:mcs.rlc.dh_test.001.art"
 metacat query "files from rlc:mcs.rlc.dh_test.001.art where rs.first_subrun=0"

creating

metacat namespace create $USER    # your personal namespace
metacat namespace create -o pro sim  # a namespace owned by a role

metacat dataset create rlc:test1 -M -m @ds.json "some comment"   # create an ad-hoc dataset
   ds.json contains:
  {
    "ds.myMetadata" : "myValue"
  }

metacat dataset create rlc: -M -m @ds.json "some comment"   # create an ad-hoc dataset
metacat dataset create rlc:mcs.rlc.dh_test.001.art "first test"  # a formal file dataset
metacat dataset create sim:mcs.mu2e.dh_test.002.art "test files for a role"  # a new pro dataset

# make json metadata for a file -s = scope (namespace) f,n,v,p options to add to metadata
mdh file-json -s rlc -f production -n Reco -v 000-000-000  -p mcs.mu2e.dh_test_parent.001.001200_000000.art   mcs.mu2e.dh_test.001.001200_000000.art

# all files must be added to a dataset when they are created
# the dataset must be official data name, by policy
metacat file declare  -f mcs.rlc.dh_test.001.001200_000000.art.json rlc:mcs.rlc.dh_test.001.art
ls *.json  | while read FF; do metacat file declare -f $FF sim:mcs.mu2e.dh_test.002.art; done

Implementation

export METACAT_SERVER_URL=http://dbweb5.fnal.gov:9094/mu2e_meta_prod/app
export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/mu2e
export DATA_DISPATCHER_URL=https://metacat.fnal.gov:9443/mu2e_dd_prod/data
export DATA_DISPATCHER_AUTH_URL=https://metacat.fnal.gov:8143/auth/mu2e


Rucio request
/pnfs/mu2e/tape
/pnfs/mu2e/persistent/datasets
/pnfs/mu2e/scratch/datasets  (expect to have a greedy cleanup of two weeks)

Rucio 6/21/23
two nondeterministic RSEs
FNAL_DCACHE_SCRATCH
FNAL_DCACHE_PERSISTENT

Admin

  • create new accounts via the GUI
    • "anonymized user Id" is for token access and is the text from the "sub" field from the user's token. Enabling token access is required.
    • DN is for proxy access and easiest to get from the user account and metacat auth mydn -c /tmp/x509up_u$UID. Could also be left blank
    • password can be left blank since we expect only token access
  • create new roles via the GUI
    • add role on the role metacat GUI
    • add user to a role via the user's page (not role page)

References