An example of how to archive files to tape

From Mu2eWiki
Jump to navigation Jump to search

The procedure is for saving "random" assortments of files as tape archives. One would have to retrieve and expand a tar file later to access its content; the individual files will not be accessible from their tape storage in a transparent way. This corresponds to the "Random files" option here. The current page walks through steps in an example use case.


Decide how to divide data

Decide how to divide data to be archived into chunks. A good size target is 2GB per chunk. For example


  $ du -ks /disk2/data/tmnguyen/socketudp/data*2020  | awk '{s+=$1; print s"\t"$0}'
  433728  433728  /disk2/data/tmnguyen/socketudp/data_Feb02_2020
  1058092 624364  /disk2/data/tmnguyen/socketudp/data_Feb03_2020
  1271012 212920  /disk2/data/tmnguyen/socketudp/data_Feb04_2020
  1476712 205700  /disk2/data/tmnguyen/socketudp/data_Feb06_2020
  2498948 1022236 /disk2/data/tmnguyen/socketudp/data_Feb07_2020
  2498956 8       /disk2/data/tmnguyen/socketudp/data_Feb09_2020
  3022512 523556  /disk2/data/tmnguyen/socketudp/data_Feb10_2020
  3022712 200     /disk2/data/tmnguyen/socketudp/data_Feb11_2020
  3042356 19644   /disk2/data/tmnguyen/socketudp/data_Feb12_2020
  3770908 728552  /disk2/data/tmnguyen/socketudp/data_Feb17_2020
  4705628 934720  /disk2/data/tmnguyen/socketudp/data_Feb19_2020
  4795148 89520   /disk2/data/tmnguyen/socketudp/data_Feb20_2020
  5092448 297300  /disk2/data/tmnguyen/socketudp/data_Feb21_2020
  5751636 659188  /disk2/data/tmnguyen/socketudp/data_Feb22_2020
  5755548 3912    /disk2/data/tmnguyen/socketudp/data_Feb25_2020
  5762860 7312    /disk2/data/tmnguyen/socketudp/data_Feb26_2020
  6084616 321756  /disk2/data/tmnguyen/socketudp/data_Jan03_2020
  6094680 10064   /disk2/data/tmnguyen/socketudp/data_Jan22_2020
  6129020 34340   /disk2/data/tmnguyen/socketudp/data_Jan23_2020
  6236776 107756  /disk2/data/tmnguyen/socketudp/data_Jan24_2020
  6241600 4824    /disk2/data/tmnguyen/socketudp/data_Jan29_2020
  6628736 387136  /disk2/data/tmnguyen/socketudp/data_Jan30_2020
  7372852 744116  /disk2/data/tmnguyen/socketudp/data_Jan31_2020


so the total size of all of the "data*2020" directories is about 7.3 GB. Therefore they can be archived as a single tar file, or split into a few files.

Prepare tar files

The naming of files for tape upload is important. File names must follow the Mu2e convention explained here.

File names will look like

  data_tier.owner.description.configuration.sequencer.file_format

A critical point is that every file name must be unique, the "sequencer" field is to make this possible when uploading a "dataset" of similar files.

Continuing with the example,

 $ cd  /disk2/data/tmnguyen/socketudp
 $ tar jcvf /disk2/bck.tmnguyen.socketudp.v0.2020part1.tbz data*Jan*2020
 $ tar jcvf /disk2/bck.tmnguyen.socketudp.v0.2020part2.tbz data*Feb0*2020
 $ tar jcvf /disk2/bck.tmnguyen.socketudp.v0.2020part3.tbz data*Feb1*2020

Copy the files to a mu2egpvm machine

  ssh mu2egpvm01.fnal.gov
  $ mkdir /mu2e/data/users/tmnguyen
  $ cd  /mu2e/data/users/tmnguyen
  $ scp -p mu2etest.fnal.gov:/disk2/bck.\*.tbz .

(and remove them from mu2etest)

Tape upload

This step is best done using VNC or a terminal server like "screen" or "tmux", because the command will probably take more than a day to complete. A terminal server will prevent the command from being killed if ssh session disconnects for any reason.

  ssh mu2egpvm01.fnal.gov
  $ screen
  $ mu2einit
  $ setup mu2efiletools
  $ kx509
  $ cd /mu2e/data/users/tmnguyen
  $ mu2eFileMoveToTape bck.*.tbz

Another way to ensure that the command will complete is to run it under nohup. A complication here is the buffering of the nohup.out file.

Check

Check the next day whether the process is complete. You can start a new ssh shell and re-connect to the existing "screen" session:

  ssh mu2egpvm01.fnal.gov
  $ screen -d -r

If everything worked fine you'll see the source files given to the mu2eFileMoveToTape command deleted. If they are still there and you are SURE that the original mu2eFileMoveToTape process is not running any more (on any mu2egpvm node!), you can re-run the same mu2eFileMoveToTape command again, until the upload succeeds.