Disks: Difference between revisions
Line 65: | Line 65: | ||
== Ceph Transition == | == Ceph Transition == | ||
Starting | Starting in the summer of 2023, we are asking that people who are staring new work, do so using the /exp/mu2e disks, not the /mu2e disks. | ||
On Nov 15, 2023, CSAID will migrate all files on /mu2e/app to /exp/mu2e/app. You will be responsible for moving your own files from /mu2e/data and /mu2e/data2 to /exp/mu2e/data. The reason for two data areas was an accident of what disk space was available when we asked that /mu2e/data be extended. We are consolidating both into /exp/mu2e/data. | |||
Before copying your files to /exp/mu2e we ask that you audit your files to identify old files that you can delete or archive to tape. Please do not copy these files to /exp/mu2e. You can find files older than, say 4 years (1460 days), with the command: | Before copying your files to /exp/mu2e we ask that you audit your files to identify old files that you can delete or archive to tape. Please do not copy these files to /exp/mu2e. You can find files older than, say 4 years (1460 days), with the command: | ||
Line 73: | Line 73: | ||
find /mu2e/app/users/<your-username> -type f -not -mtime -1460 -exec ls -ld {} \; | find /mu2e/app/users/<your-username> -type f -not -mtime -1460 -exec ls -ld {} \; | ||
If your old files have no archival value, please delete them. If they do have archival value, please archive them to tape; contact the Mu2e computing leadership if you need help archiving files to tape. | |||
The recommended way to copy files from a /mu2e disk to a /exp/mu2e disk is: | The recommended way to copy files from a /mu2e disk to a /exp/mu2e disk is: | ||
cd /exp/mu2e/users/<yourname> | cd /exp/mu2e/data/users/<yourname> | ||
rsync -ar /mu2e/ | rsync -ar /mu2e/data/users/<yourname>/<directory_name> . | ||
rm -rf /mu2e/ | rm -rf /mu2e/data/users/<yourname>/<directory_name> | ||
Check that rsync completed correctly before deleting the | Check that rsync completed correctly before deleting the original. This rsync command will recursively (-r) copy the directory named as the first positional argument to the current working directory and it will trahsfer the files in archive mode (-a), which preserve file metadata such as permissions and dates. In one recent test it took about 5 minutes to copy 8 GB. | ||
Everyone with a quota on the /mu2e disks has a similarly sized quota on the /exp/mu2e disks. | Everyone with a quota on the /mu2e disks has a similarly sized quota on the /exp/mu2e disks. | ||
Line 89: | Line 89: | ||
=== Reseating Symbolic Links=== | === Reseating Symbolic Links=== | ||
Many people | Many people have used the following pattern to make it easy to keep source code and binaries on the app disk while providing low-keystroke access to related files on the data disk: | ||
cd /mu2e/app/users/<yourname>/<my project> | |||
mkdir -p /mu2e/data/users/<yourname>/<my project> | |||
ln -s /mu2e/data/users/<yourname>/<my project> out | |||
Different people have used different names for the symbolic link, with the two most common being "data" and "out". | |||
After you move your files from /mu2e/data(2) to /exp/mu2e/data, you will need to reseat your symbolic links, as follows: | |||
cd /exp/mu2e/app/users/<yourname>/<my project> | cd /exp/mu2e/app/users/<yourname>/<my project> | ||
rm out | |||
ln -s /exp/mu2e/data/users/<yourname>/ | ln -s /exp/mu2e/data/users/<yourname>/<my project> out | ||
=== Reseating Symbolic Links For the Computing Tutorials === | |||
Many people worked on the [[ComputingTutorials]] at the Mu2e Tutorial Day, Saturday Oct 4, 2023, or soon after. At that time the CEPH disks were named differently than they are now: | |||
/srv/mu2e/app | |||
/srv/mu2e/data | |||
The tutorial instructions told you to use symbolic link pattern described in the previous section. | |||
Since that time, these directories have been renamed /exp instead of /srv/. Your files are now in the newly named locations. | |||
If you worked on the tutorials at that time, when you return to your working area you will need to reseat the symbolic links to the data area. | |||
cd /exp/mu2e/app/users/<yourname>/Tutorial | |||
rm out | |||
ln -s /exp/mu2e/data/users/<yourname>/Tutorial out | |||
==Ceph Disk Notes== | ==Ceph Disk Notes== |
Revision as of 17:36, 9 November 2023
Introduction
There are several categories of disk space available at Fermilab. Thsee include limited home areas, Mu2e project disks for building code and small datasets, dcache (/pnfs) for large datasets and sending data to tape, and a wide area readonly disk ( /cvmfs) for distribution of code and some auxillary data files.
During the fall of 2023 the Mu2e project disks are being transitioned to a new disk technology and their names will change. See below for the discussion of the #Ceph Transition.
When reading this section pay careful attention to which disks are backed up. It is your responsibility to ensure that files you require to be backed up are kept on an appropriate disk. It is equally your responsibility to use the backed up space wisely and not fill it with files that can easily be regenerated, such as root files, event-data files, object files, shared libraries and binary executables.
To learn how to where you may create your own directory on the project and dCache disks, see #Recommended_use_patterns and the data transfer page. When you do make your own directory, you must name it with using your kerberos principal (your Fermilab username).
The table below summarizes the information found in the sections that follow.
Name | Quota (GB) | Backed up? | Worker | Interactive | Purpose/Comments |
---|---|---|---|---|---|
Home Disks | |||||
/nashome | 5.2 | Yes | --- | rwx | mu2egpvm*, mu2ebuild*, and FNALU only |
/sim1 | 20 | Yes | --- | rwx | detsim only; in NAS space |
Mu2e Project Disk on Ceph (Phased in during fall 2023 - please start new work here) | |||||
/exp/mu2e/data | 87,961 | No | --- | rwx | Event-data files, log files, ROOT files. |
/exp/mu2e/app | 3,848 | No | --- | rwx | Exe's and shared libraries. No data/log/root files. |
Mu2e Project Disk on NAS (Begin Phased out in fall 2023) | |||||
/mu2e/data | 75,161 | No | --- | rwx | Event-data files, log files, ROOT files. |
/mu2e/data2 | 10,737 | No | --- | rwx | Event-data files, log files, ROOT files. |
/mu2e/app | 3,758 | No | --- | rwx | Exe's and shared libraries. No data/log/root files. |
/grid/fermiapp/mu2e | 232 | Yes | --- | rwx | Exe's and shared libraries. No data/log/root files. |
Special Disks | |||||
/cvmfs | - | Yes | r-x | r-x | readonly code distribution - all interactive and grid nodes |
/pnfs | - | No/Yes | --- | rwx | distributed data disks - all interactive nodes |
Local Scratch | |||||
/scratch/mu2e/ | 568 | No | --- | rwx | detsim only; local disk. |
Mu2e web Site | |||||
/web/sites/mu2e.fnal.gov/htdocs | 8 | Yes | --- | rwx | mounted on mu2egpvm* and FNALU (not detsim); see website instructions |
Marsmu2e Project disk on NAS | |||||
/grid/data/marsmu2e | 400 | No | rw- | rw- | Event-data files, log files, ROOT files. |
/grid/fermiapp/marsmu2e | 30 | Yes | r-x | rwx | Grid accessible executables and shared libraries |
Notes on the table:
- The project and scratch spaces each have a subdirectory named users. To use these disks, make a subdirectory users/your_kerberos_principal and put your files under that subdirectory.
- The home disks and the /mu2e disks have individual quotas. The Ceph disks have directory based quotas.
- The columns headed Worker and Interactive show the permission with which each disk is mounted on, respectively, the grid worker nodes and the interactive nodes (detsim, mu2egpvm*). In the above table, full permissions are rwx, which denote read, write, execute, respectively. If one of rwx is replaced with a - then that permission is missing on the indicated machine. If the the permission is given as ---, then that disk is not mounted on the indicated machine. The point of some partitions not having w or x permission is a security measure, discussed below.
Ceph Transition
Starting in the summer of 2023, we are asking that people who are staring new work, do so using the /exp/mu2e disks, not the /mu2e disks.
On Nov 15, 2023, CSAID will migrate all files on /mu2e/app to /exp/mu2e/app. You will be responsible for moving your own files from /mu2e/data and /mu2e/data2 to /exp/mu2e/data. The reason for two data areas was an accident of what disk space was available when we asked that /mu2e/data be extended. We are consolidating both into /exp/mu2e/data.
Before copying your files to /exp/mu2e we ask that you audit your files to identify old files that you can delete or archive to tape. Please do not copy these files to /exp/mu2e. You can find files older than, say 4 years (1460 days), with the command:
find /mu2e/app/users/<your-username> -type f -not -mtime -1460 -exec ls -ld {} \;
If your old files have no archival value, please delete them. If they do have archival value, please archive them to tape; contact the Mu2e computing leadership if you need help archiving files to tape.
The recommended way to copy files from a /mu2e disk to a /exp/mu2e disk is:
cd /exp/mu2e/data/users/<yourname> rsync -ar /mu2e/data/users/<yourname>/<directory_name> . rm -rf /mu2e/data/users/<yourname>/<directory_name>
Check that rsync completed correctly before deleting the original. This rsync command will recursively (-r) copy the directory named as the first positional argument to the current working directory and it will trahsfer the files in archive mode (-a), which preserve file metadata such as permissions and dates. In one recent test it took about 5 minutes to copy 8 GB.
Everyone with a quota on the /mu2e disks has a similarly sized quota on the /exp/mu2e disks.
The NAS disks had user based quotas. The Ceph disks have directory based quotas. That means that /exp/mu2e/app has a quota and we can set smaller quotas at any directory level. For example each user directory has a quota and each project directory has a quota.
Reseating Symbolic Links
Many people have used the following pattern to make it easy to keep source code and binaries on the app disk while providing low-keystroke access to related files on the data disk:
cd /mu2e/app/users/<yourname>/<my project> mkdir -p /mu2e/data/users/<yourname>/<my project> ln -s /mu2e/data/users/<yourname>/<my project> out
Different people have used different names for the symbolic link, with the two most common being "data" and "out".
After you move your files from /mu2e/data(2) to /exp/mu2e/data, you will need to reseat your symbolic links, as follows:
cd /exp/mu2e/app/users/<yourname>/<my project> rm out ln -s /exp/mu2e/data/users/<yourname>/<my project> out
Reseating Symbolic Links For the Computing Tutorials
Many people worked on the ComputingTutorials at the Mu2e Tutorial Day, Saturday Oct 4, 2023, or soon after. At that time the CEPH disks were named differently than they are now:
/srv/mu2e/app /srv/mu2e/data
The tutorial instructions told you to use symbolic link pattern described in the previous section.
Since that time, these directories have been renamed /exp instead of /srv/. Your files are now in the newly named locations.
If you worked on the tutorials at that time, when you return to your working area you will need to reseat the symbolic links to the data area.
cd /exp/mu2e/app/users/<yourname>/Tutorial rm out ln -s /exp/mu2e/data/users/<yourname>/Tutorial out
Ceph Disk Notes
quota for any directory
getfattr -n ceph.quota.max_bytes /exp/mu2e/data/projects/tracker/
more attributes, including used space:
getfattr -d -m 'ceph.*' /exp/mu2e/data/projects/tracker
NAS Disks
Fermilab operates a large disk pool that is mounted over the network on many different interactive machines. It is not mounted on grid nodes. The pool is built using Network Attached Storage (NAS) systems from the BlueArc Corporation. This system has RAID 6 level error detection and correction.
As of 2023, Mu2e has a quota of about 90 TB, distributed as shown in the Mu2e Project disk section of the table above.
The disk space on /mu2e/data and /mu2e/data2 is intended as our primary disk space for event-data, log files ROOT files and so on. This space is not backed up.
If you want to run an application on the grid, the executable file(s) and the shared libraries might be delivered in two ways. If it is pre-built release of the code, it will be available, read-only, on cvmfs and this mounted on all grid nodes. If you are building your own custom code, that should be built on /mu2e/app, available on all the interactive nodes. See Muse for code building and making tarballs for submissions to grid.
In the summer of 2023, these disks will be replaced with new disks based on the CEPH technology: [1].
Snapshots
In the table above, some of the NAS disks are shown to be backed up. The full policy for backup to tape is available at the Fermilab Backup FAQ.
In addition to backup to tape, the NAS file system supports a feature known as snapshots, which works as follows. Each night the snapshot code runs and it effectively makes a hard link to every file in the filesystem. If you delete a file the next day, the blocks allocated to the file are still allocated to the snapshot version of the file. When the snapshot is deleted, the blocks that make up the file will be returned to the free list. So you have a window, after deleting a file, during which you can recover the file. If the file is small, you can simply copy it out of the snapshot. If the file is very large you can ask for it to be recreated in place.
On /mu2e/app and /exp/mu2e/app a snapshot is taken nightly and retained for 14 nights; so a deleted file can be recovered for up to 14 calendar days. Many years ago snapshots were also used on /mu2e/data but that is no longer done.
If you create a file during the working day, it will not be protected until the next snapshot is taken, on the following night. If you delete the file before the snapshot is taken, it is not recoverable.
After a file has been deleted, but while it is still present in a shapshot, space occupied by the file is not charged to the mu2e quota. This works because the disks typically have free space beyond that allocated to the various experiments. However it is always possible for an atypical usage pattern to eat up all available space. In such a case we can request that snapshots be removed.
How does this work? While the NAS file system looks to us as an nfs mounted unix filesystem, it is actually a much more powerful system. It has a front end that allows a variety of actions such as journaling and some amount of transaction processing. The snapshots take place in the front end layer.
You can view the snapshots of the file systems at, for example, /mu2e/app/.snapshot/, /grid/fermiapp/.snapshot/ and /exp/mu2e/app/.snap . Snapshots are readonly to us.
Home Disks
The interactive nodes in GPCF and FNALU share the same home disks. Fermilab policy is that large files such as ROOT files, event-data files and builds of our code, should live in project space, not in our home areas. Therefore these home disks have small quotas. The home disks do not have enough disk space to build a release of mu2e Offline. Therefore the Mu2e getting started instructions tell you to build your code on our project disks. You can contact the Service desk to request additional quota but you will not get multiple GB.
The grid worker nodes do not see the home disk. When your job lands on a grid worker node, it lands in an empty directory.
As of SL7 OS version, access to the home disk requires a kerberos ticket. The nfs system can cache your ticket so you can continue to access the home area in a old window even after the ticket is expired, so you probably won't notice. But this does come up more in cron jobs, which may fail because they do not use the interactive kerberos patterns. If you have a cron job that sometimes can't access the home disk, see this article. You can also setup you cron job to use kcron, a kerberos aware cron command.
Sharing files
By default all of your home area is private, which makes it hard to share files with collaborators. You can copy files to /mu2e/app
, or make a mu2e directory:
cd $HOME mkdir mu2e chmod 750 mu2e
This directory can remain group-readable, but other areas will revert to private automatically. The same can be done with other experiments, like "nova".
cvmfs
This is a distributed disk system that is described in cvmfs. It is used to provide pre-built releases of the code and UPS products to all users, interactive nodes, and grids.
dCache
This is a distributed disk system that is described in dCache. It has a very large capacity and is used for high-volume and high-throughput data interactively or in grid jobs. All grid jobs may read and write event data to/from dCache; it is not possible for grid jobs to move data to/from the NAS disks.
stashCache
There exists the case of rather large files (more than a GB) that has to be sent to every grid node. This might be a library of fit or simulation templates, or a set of pre-computed simulation distributions. CVMFS is best for many small files, but has a size limit. For this case stashCache is the ideal solution.
Mu2e website
The mu2e web site lives at /web/sites/mu2e.fnal.gov; this is visible from mu2egpvm*. Selected Mu2e members have read and write access to this area - ask offline management if you need to get access. For additional information see the instructions for the Mu2e web site. The space is run by the central web services and space is monitored here.
Disks for the group marsmu2e
There are two additional disks that are available only to members of the group marsmu2e; only a few Mu2e collaborators are members of this group. The group marsmu2e was created to satisfy access restrictions on the MCNP software that is used by MARS. Only authorized users may have read access to the MARS executable its associated cross-section databases. This access control is enforced by creating the group marsmu2e, limiting membership in the group and making the critical files readable only by marsmu2e.
The two disks discussed here are /grid/fermiapp/marsmu2e, which has the same role as /grid/fermiapp/mu2e, and /grid/data/mars, which has the same role as /grid/data/mu2e.
This is discussed further on the pages that discussion running MARS for Mu2e.
Recommended use patterns
Here is a summary of recommended use patterns
- Personal utility scripts, analysis scripts, histograms and documents should go on the home area.
- Builds of the offline code should go under /mu2e/app/users/$USER.
- Small (<100 GB) datasets, such as analysis ntuples, should go under /mu2e/data/user/$USER.
- Large datasets (>100GB), and any dataset that is written or read in parallel from a grid job should reside on scratch dCache: /pnfs/mu2e/scratch/users/$USER. This area will purge your old files without warning.
- Datasets of widespread interest or semi-permanent usefulness should be uploaded to tape.