Disks: Difference between revisions
(47 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
===Introduction=== | ===Introduction=== | ||
There are several categories of disk space available at Fermilab. | There are several categories of disk space available at Fermilab. Thsee include limited home areas, Mu2e project disks for building code and small datasets, [[Dcache | dcache ]] (/pnfs) for large datasets and sending data to tape, and a wide area readonly disk ([[Cvmfs | /cvmfs]]) for distribution of code and some auxillary data files. | ||
When reading this section pay careful attention to which disks are backed up. It is your responsibility to ensure that files you require to be backed up are kept on an appropriate disk. It is equally your responsibility to use the backed up space wisely and not fill it with files that can easily be regenerated, such as root files, event-data files, object files, shared libraries and binary executables. | When reading this section pay careful attention to which disks are backed up. It is your responsibility to ensure that files you require to be backed up are kept on an appropriate disk. It is equally your responsibility to use the backed up space wisely and not fill it with files that can easily be regenerated, such as root files, event-data files, object files, shared libraries and binary executables. | ||
To learn how to where you may create your own directory on the project and dCache disks, see [[#Recommended_use_patterns]] and the [[DataTransfer| data transfer page]]. When you do make your own directory, you must name it with using your kerberos principal (your Fermilab username). | |||
The table below summarizes the information found in the sections that follow | The table below summarizes the information found in the sections that follow. | ||
{| class="wikitable" | {| class="wikitable" | ||
! Name || Quota (GB) || Backed up? || Worker || Interactive || Purpose/Comments | ! Name || Quota (GB) || Backed up? || Worker || Interactive || Purpose/Comments | ||
|- | |- | ||
! colspan="6" scope="row" | Home Disks | ! colspan="6" scope="row" | User Home Disks | ||
|- | |- | ||
| /nashome ||5.2 ||Yes || --- ||rwx || mu2egpvm*, mu2ebuild*, and FNALU only | | /nashome ||5.2 ||Yes || --- ||rwx || mu2egpvm*, mu2ebuild*, and FNALU only | ||
|- | |- | ||
| | ! colspan="6" scope="row" | Mu2e Project Disk on Ceph (Phased in during fall 2023 - please start new work here) | ||
|- | |- | ||
| /exp/mu2e/data || 87,961 ||No ||--- ||rwx ||Event-data files, log files, ROOT files. | |||
|- | |- | ||
| /mu2e/ | || /exp/mu2e/app || 3,848 || No ||--- ||rwx || Exe's and shared libraries. No data/log/root files. | ||
|- | |- | ||
| /grid/fermiapp/mu2e ||232 ||Yes ||--- ||rwx || Deprecated. Not for use by general users. | |||
| /grid/fermiapp/mu2e ||232 ||Yes || | |||
|- | |- | ||
! colspan="6" scope="row" | Special Disks | ! colspan="6" scope="row" | Special Disks | ||
|- | |- | ||
| /cvmfs || - || | | /cvmfs || - || [[Cvmfs#Cvmfs_is_not_backed_up | Indirectly]] || r-x ||r-x || readonly code distribution - all interactive and grid nodes | ||
|- | |- | ||
| /pnfs || - || No/Yes||--- ||rwx || distributed data disks - all interactive nodes | | /pnfs || - || No/Yes||--- ||rwx || distributed data disks - all interactive nodes | ||
|- | |- | ||
! colspan="6" scope="row" | Mu2e web Site | ! colspan="6" scope="row" | Mu2e web Site | ||
|- | |- | ||
|/web/sites/mu2e.fnal.gov/htdocs || 8 || Yes || --- || rwx ||mounted on mu2egpvm* and FNALU | |/web/sites/mu2e.fnal.gov/htdocs || 8 || Yes || --- || rwx ||mounted on mu2egpvm* and FNALU; see [[#website|website instructions]] | ||
|- | |- | ||
! colspan="6" scope="row" | Marsmu2e Project disk on | ! colspan="6" scope="row" | Marsmu2e Project disk on NAS | ||
|- | |- | ||
| /grid/data/marsmu2e || 400 || No || rw- ||rw- || Event-data files, log files, ROOT files. | | /grid/data/marsmu2e || 400 || No || rw- ||rw- || Event-data files, log files, ROOT files. | ||
|- | |- | ||
| /grid/fermiapp/marsmu2e || 30 || Yes ||r-x || rwx || Grid accessible executables and shared libraries | | /grid/fermiapp/marsmu2e || 30 || Yes ||r-x || rwx || Grid accessible executables and shared libraries | ||
|- | |||
|} | |} | ||
Line 52: | Line 45: | ||
#The project and scratch spaces each have a subdirectory named users. To use these disks, make a subdirectory users/your_kerberos_principal and put your files under that subdirectory. | #The project and scratch spaces each have a subdirectory named users. To use these disks, make a subdirectory users/your_kerberos_principal and put your files under that subdirectory. | ||
#The | #The Ceph disks have directory tree based quotas. | ||
#The columns headed Worker and Interactive show the permission with which each disk is mounted on, respectively, the grid worker nodes and the interactive nodes ( | #The columns headed Worker and Interactive show the permission with which each disk is mounted on, respectively, the grid worker nodes and the interactive nodes (mu2egpvm*, mu2ebuild02). In the above table, full permissions are rwx, which denote read, write, execute, respectively. If one of rwx is replaced with a - then that permission is missing on the indicated machine. If the the permission is given as ---, then that disk is not mounted on the indicated machine. The point of some partitions not having w or x permission is a security measure, discussed below. | ||
== Ceph Transition == | |||
You will be responsible for moving your own files from /mu2e/data and /mu2e/data2 to /exp/mu2e/data. The reason for two data areas was an accident of what disk space was available when we asked that /mu2e/data be extended. We are consolidating both into /exp/mu2e/data. | |||
Before copying your files to /exp/mu2e we ask that you audit your files to identify old files that you can delete or archive to tape. Please do not copy these files to /exp/mu2e. You can find files older than, say 4 years (1460 days), with the command: | |||
find /mu2e/app/users/<your-username> -type f -not -mtime -1460 -exec ls -ld {} \; | |||
If your old files have no archival value, please delete them. If they do have archival value, please archive them to tape; contact the Mu2e computing leadership if you need help archiving files to tape. Please complete this by Jan 12, 2024. | |||
The | The recommended way to copy files from a /mu2e disk to a /exp/mu2e disk is: | ||
cd /exp/mu2e/data/users/<yourname> | |||
rsync -ar /mu2e/data/users/<yourname>/<directory_name> . | |||
rm -rf /mu2e/data/users/<yourname>/<directory_name> | |||
Check that rsync completed correctly before deleting the original. This rsync command will recursively (-r) copy the directory named as the first positional argument to the current working directory and it will trahsfer the files in archive mode (-a), which preserve file metadata such as permissions and dates. In one recent test it took about 5 minutes to copy 8 GB. | |||
Everyone with a quota on the /mu2e disks has a similarly sized quota on the /exp/mu2e disks. | |||
The NAS disks had user based quotas. The Ceph disks have directory based quotas. That means that /exp/mu2e/app has a quota and we can set smaller quotas at any directory level. For example each user directory has a quota and each project directory has a quota. | |||
=== | === Existing directories on /exp/mu2e/app === | ||
If you do not already have a directory /exp/mu2e/app/users/<yourname>, then the migration on Nov 15 will be simple. When the interactive machines are rebooted following the new downtime, your files will be in the new location. Some people do already have a directory /exp/mu2e/app/users/<yourname>; for those people, their migrated files will be at | |||
/exp/mu2e/app/sync/users/<yourname> | |||
On the data disks, a snapshot is | Please use mv to move directories and files from the sync area to /exp/mu2e/app/users/<yourname>, taking care to not overwrite existing files. When done, delete your directory in the sync area. | ||
=== Reseating Symbolic Links=== | |||
Many people have used the following pattern to make it easy to keep source code and binaries on the app disk while providing low-keystroke access to related files on the data disk: | |||
cd /mu2e/app/users/<yourname>/<my project> | |||
mkdir -p /mu2e/data/users/<yourname>/<my project> | |||
ln -s /mu2e/data/users/<yourname>/<my project> out | |||
Different people have used different names for the symbolic link, with the two most common being "data" and "out". | |||
After you move your files from /mu2e/data(2) to /exp/mu2e/data, you will need to reseat your symbolic links, as follows: | |||
cd /exp/mu2e/app/users/<yourname>/<my project> | |||
rm out | |||
ln -s /exp/mu2e/data/users/<yourname>/<my project> out | |||
=== Reseating Symbolic Links For the Computing Tutorials === | |||
Many people worked on the [[ComputingTutorials]] at the Mu2e Tutorial Day, Saturday Oct 4, 2023, or soon after. At that time the CEPH disks were named differently than they are now: | |||
/srv/mu2e/app | |||
/srv/mu2e/data | |||
The tutorial instructions told you to use symbolic link pattern described in the previous section. | |||
Since that time, these directories have been renamed /exp instead of /srv/. Your files are now in the newly named locations. | |||
If you worked on the tutorials at that time, when you return to your working area you will need to reseat the symbolic links to the data area. | |||
cd /exp/mu2e/app/users/<yourname>/Tutorial | |||
rm out | |||
ln -s /exp/mu2e/data/users/<yourname>/Tutorial out | |||
==Ceph Disk Notes== | |||
===Route SNOW Tickets Directly to Ceph=== | |||
https://fermi.servicenowservices.com/nav_to.do?uri=%2Fservice_offering.do%3Fsys_id%3Df3907a4e1b1321906ee0ea42f54bcb0e%26sysparm_view%3Dess%26sysparm_affiliation%3D | |||
===Quotas=== | |||
To see your quota and used space on /exp/mu2e/app/users/<yourname>, /exp/mu2e/data/users/<yourname> and ~<yourname>. use the command: | |||
mu2einit | |||
mu2eQuota | |||
You can also look at the quotas and space used on the ceph disks for another user: | |||
mu2eQuota <other_user_name> | |||
For more details, see the next section. | |||
===Quotas And Other Attributes=== | |||
The Ceph disks have directory based quotas. For example, /exp/mu2e/app has a quota and each directory in /exp/mu2e/app/users has a quota. If a directory does not explicitly set a quota, then you should walk up the directory tree to find the first directory for which a quota is set; that quota is controlling. The default quota for a user directory in /exp/mu2e/app/users is 25 GiB and the default quota for a user directory in /exp/mu2e/data/users is 150 GiB. A Mu2e collaborator may request to the Mu2e Offline Computing Coordinators that their quota be increased; you will need to provide a good reason for your request. | |||
To see the quota for a directory that has a quota: | |||
getfattr -n ceph.quota.max_bytes /exp/mu2e/data/projects/tracker | |||
On SL7 you can see all of the attributes of a directory using: | |||
getfattr -d -m 'ceph.*' /exp/mu2e/data/projects/tracker | |||
However on AL9 the wildcard features has been turned off and you can only get individual attributes by name, for example: | |||
getfattr -n "ceph.dir.rbytes" /exp/mu2e/data/projects/tracker | |||
where the named attribute is the total number bytes in directory tree descended from the specified directory. | |||
Here are the full set of named attributes that match the wildcard on SL7: | |||
{| class="wikitable" | |||
! Name || Meaning | |||
|- | |||
! ceph.dir.entries || Number of entries in the specified directory, including files and directories. | |||
|- | |||
! ceph.dir.files || Number of files in the specfied directory. | |||
|- | |||
! ceph.dir.rbytes || Number of bytes alllocated to files in the directory tree (recursive). | |||
|- | |||
! ceph.dir.rctime || It is intended to be the highest modification time of anything in the directory tree. It is known to be buggy. | |||
|- | |||
! ceph.dir.rentries || Number of entries in the directory tree (recursive), files and directories | |||
|- | |||
! ceph.dir.rfiles || Number of files in the directory tree (recursive). | |||
|- | |||
! ceph.dir.rsubdirs || Number of subdirectories in the directory tree (recursive). | |||
|- | |||
! ceph.dir.subdirs || Number of subdirectories in the specfied directory. | |||
|} | |||
===Ceph Snapshots=== | |||
Ceph supports snapshots. See the general discussion of [[#Snapshots]]. There are two details of snapshots that are unique to the ceph disks. | |||
The snapshots exist at each level, for example: | |||
/exp/mu2e/app/users/mu2epro/nightly/secondary/repo/.snap/_scheduled-2023-12-13-00_00_00_UTC_1099511627788/REve.log | |||
There is a small glitch with ceph snapshots. If you ls a directory that is below a snapshot directory, the command will sometimes hang. But, if you open a file in that directory, it will work correctly. After you have opened the file, then the ls will work, perhaps with a small delay the first time. | |||
===Sharing Files=== | |||
If you wish for your colleagues to be able to read or write files in Ceph diskspace that you own, use the normal unix group permissions. All members of mu2e are in the unix group named "mu2e". | |||
===Moving Files Across Quota Domains=== | |||
Using the unix mv command to move files from quota domain to another will actually do a copy and delete, not a true mv. Instead use rsync | |||
Instead of: | |||
mv /exp/mu2e/app/users/a/my_directory /exp/mu2e/app/users/a | |||
Use: | |||
rsync -ar /exp/mu2e/app/users/a/my_directory /exp/mu2e/app/users/a | |||
rm -rf /exp/mu2e/app/users/a/my_directory | |||
==NAS Disks== | |||
Fermilab operates a large disk pool that is mounted over the network on many different interactive machines. It is not mounted on grid nodes. The pool is built using Network Attached Storage (NAS) systems from the BlueArc Corporation. This system has RAID 6 level error detection and correction. | |||
As of 2023, Mu2e has a quota of about 90 TB, distributed as shown in the Mu2e Project disk section of the table above. | |||
The disk space on /mu2e/data and /mu2e/data2 is intended as our primary disk space for event-data, log files ROOT files and so on. This space is not backed up. | |||
If you want to run an application on the grid, the executable file(s) and the shared libraries might be delivered in two ways. If it is pre-built release of the code, it will be available, read-only, on [[Cvmfs|cvmfs]] and this mounted on all grid nodes. If you are building your own custom code, that should be built on /mu2e/app, available on all the interactive nodes. See [[Muse]] for code building and making tarballs for submissions to grid. | |||
In the summer of 2023, these disks will be replaced with new disks based on the CEPH technology: [https://fifewiki.fnal.gov/wiki/Ceph]. | |||
===Snapshots=== | |||
In the table above, some of the NAS disks are shown to be backed up. The full policy for backup to tape is available at the [http://computing.fnal.gov/site-backups/faq.html Fermilab Backup FAQ]. | |||
In addition to backup to tape, the NAS file system supports a feature known as snapshots, which works as follows. Each night the snapshot code runs and it effectively makes a hard link to every file in the filesystem. If you delete a file the next day, the blocks allocated to the file are still allocated to the snapshot version of the file. When the snapshot is deleted, the blocks that make up the file will be returned to the free list. So you have a window, after deleting a file, during which you can recover the file. If the file is small, you can simply copy it out of the snapshot. If the file is very large you can ask for it to be recreated in place. | |||
On /mu2e/app and /exp/mu2e/app a snapshot is taken nightly and retained for 14 nights; so a deleted file can be recovered for up to 14 calendar days. Many years ago snapshots were also used on /mu2e/data but that is no longer done. | |||
If you create a file during the working day, it will not be protected until the next snapshot is taken, on the following night. If you delete the file before the snapshot is taken, it is not recoverable. | If you create a file during the working day, it will not be protected until the next snapshot is taken, on the following night. If you delete the file before the snapshot is taken, it is not recoverable. | ||
Line 85: | Line 215: | ||
After a file has been deleted, but while it is still present in a shapshot, space occupied by the file is not charged to the mu2e quota. This works because the disks typically have free space beyond that allocated to the various experiments. However it is always possible for an atypical usage pattern to eat up all available space. In such a case we can request that snapshots be removed. | After a file has been deleted, but while it is still present in a shapshot, space occupied by the file is not charged to the mu2e quota. This works because the disks typically have free space beyond that allocated to the various experiments. However it is always possible for an atypical usage pattern to eat up all available space. In such a case we can request that snapshots be removed. | ||
How does this work? While the | How does this work? While the NAS file system looks to us as an nfs mounted unix filesystem, it is actually a much more powerful system. It has a front end that allows a variety of actions such as journaling and some amount of transaction processing. The snapshots take place in the front end layer. | ||
You can view the snapshots of the file systems at, for example, /mu2e/app/.snapshot/ | You can view the snapshots of the file systems at, for example, /mu2e/app/.snapshot/, /grid/fermiapp/.snapshot/ and /exp/mu2e/app/.snap . Snapshots are readonly to us. | ||
There are some features of snapshots that are unqiue to the ceph disks, see [[#Ceph Snapshots]]. | |||
==Home Disks== | ==Home Disks== | ||
The interactive nodes in GPCF and FNALU share the same home disks. Fermilab policy is that large files such as ROOT files, event-data files and builds of our code, should live in project space, not in our home areas. Therefore these home disks have small quotas. Therefore the Mu2e getting started instructions tell you to build your code on our project disks. You can contact the Service desk to request additional quota but you will not get multiple GB. | The interactive nodes in GPCF and FNALU share the same home disks. Fermilab policy is that large files such as ROOT files, event-data files and builds of our code, should live in project space, not in our home areas. Therefore these home disks have small quotas. The home disks do not have enough disk space to build a release of mu2e Offline. Therefore the Mu2e getting started instructions tell you to build your code on our project disks. You can contact the Service desk to request additional quota but you will not get multiple GB. | ||
The home | The grid worker nodes do not see the home disk. When your job lands on a grid worker node, it lands in an empty directory. | ||
The | As of SL7 OS version, access to the home disk requires a kerberos ticket. The nfs system can cache your ticket so you can continue to access the home area in a old window even after the ticket is expired, so you probably won't notice. But this does come up more in cron jobs, which may fail because they do not use the interactive kerberos patterns. If you have a cron job that sometimes can't access the home disk, see [https://fermi.servicenowservices.com/nav_to.do?uri=kb_knowledge.do?sys_id=050871931b1e9c90ced962cfe54bcb8e this article]. You can also setup you cron job to use [[Authentication#Kerberos|kcron, a kerberos aware cron command]]. | ||
===Snapshots=== | |||
Snapshots of your home disk are visible at: | |||
ls ~/.snapshot | |||
You will see that snapshots are made 4 times per day and kept for 30 days. | |||
===Sharing files=== | |||
By default all of your home area is private, which makes it hard to share files with collaborators. You can copy files to <code>/mu2e/app</code>, or make a mu2e directory: | |||
cd $HOME | |||
mkdir mu2e | |||
chmod 750 mu2e | |||
This directory can remain group-readable, but other areas will revert to private automatically. The same can be done with other experiments, like "nova". | |||
==cvmfs== | ==cvmfs== | ||
Line 107: | Line 253: | ||
This is a distributed disk system that is described in [[Dcache|dCache]]. It has a very large capacity and | This is a distributed disk system that is described in [[Dcache|dCache]]. It has a very large capacity and | ||
is used for high-volume and high-throughput data interactively or in grid jobs. All grid jobs | is used for high-volume and high-throughput data interactively or in grid jobs. All grid jobs may read and write | ||
event data to/from dCache | event data to/from dCache; it is not possible for grid jobs to move data to/from the NAS disks. | ||
==stashCache== | ==stashCache== | ||
Line 115: | Line 262: | ||
library of fit or simulation templates, or a set of pre-computed simulation distributions. CVMFS is best for | library of fit or simulation templates, or a set of pre-computed simulation distributions. CVMFS is best for | ||
many small files, but has a size limit. For this case [[StashCache|stashCache]] is the ideal solution. | many small files, but has a size limit. For this case [[StashCache|stashCache]] is the ideal solution. | ||
<div id="website"></div> | |||
==Mu2e website== | ==Mu2e website== | ||
The mu2e web site lives at /web/sites/mu2e.fnal.gov; this is visible from mu2egpvm* | The mu2e web site lives at /web/sites/mu2e.fnal.gov; this is visible from mu2egpvm*. Selected Mu2e members have read and write access to this area - ask offline management if you need to get access. For additional information see the instructions for the [http://mu2e.fnal.gov/atwork/general/webinfo/intro.shtml Mu2e web site]. The space is run by the central web services and space is [http://metrics.fnal.gov/cws/apache.html monitored here]. | ||
==Disks for the group marsmu2e== | ==Disks for the group marsmu2e== |
Latest revision as of 23:36, 18 July 2024
Introduction
There are several categories of disk space available at Fermilab. Thsee include limited home areas, Mu2e project disks for building code and small datasets, dcache (/pnfs) for large datasets and sending data to tape, and a wide area readonly disk ( /cvmfs) for distribution of code and some auxillary data files.
When reading this section pay careful attention to which disks are backed up. It is your responsibility to ensure that files you require to be backed up are kept on an appropriate disk. It is equally your responsibility to use the backed up space wisely and not fill it with files that can easily be regenerated, such as root files, event-data files, object files, shared libraries and binary executables.
To learn how to where you may create your own directory on the project and dCache disks, see #Recommended_use_patterns and the data transfer page. When you do make your own directory, you must name it with using your kerberos principal (your Fermilab username).
The table below summarizes the information found in the sections that follow.
Name | Quota (GB) | Backed up? | Worker | Interactive | Purpose/Comments |
---|---|---|---|---|---|
User Home Disks | |||||
/nashome | 5.2 | Yes | --- | rwx | mu2egpvm*, mu2ebuild*, and FNALU only |
Mu2e Project Disk on Ceph (Phased in during fall 2023 - please start new work here) | |||||
/exp/mu2e/data | 87,961 | No | --- | rwx | Event-data files, log files, ROOT files. |
/exp/mu2e/app | 3,848 | No | --- | rwx | Exe's and shared libraries. No data/log/root files. |
/grid/fermiapp/mu2e | 232 | Yes | --- | rwx | Deprecated. Not for use by general users. |
Special Disks | |||||
/cvmfs | - | Indirectly | r-x | r-x | readonly code distribution - all interactive and grid nodes |
/pnfs | - | No/Yes | --- | rwx | distributed data disks - all interactive nodes |
Mu2e web Site | |||||
/web/sites/mu2e.fnal.gov/htdocs | 8 | Yes | --- | rwx | mounted on mu2egpvm* and FNALU; see website instructions |
Marsmu2e Project disk on NAS | |||||
/grid/data/marsmu2e | 400 | No | rw- | rw- | Event-data files, log files, ROOT files. |
/grid/fermiapp/marsmu2e | 30 | Yes | r-x | rwx | Grid accessible executables and shared libraries |
Notes on the table:
- The project and scratch spaces each have a subdirectory named users. To use these disks, make a subdirectory users/your_kerberos_principal and put your files under that subdirectory.
- The Ceph disks have directory tree based quotas.
- The columns headed Worker and Interactive show the permission with which each disk is mounted on, respectively, the grid worker nodes and the interactive nodes (mu2egpvm*, mu2ebuild02). In the above table, full permissions are rwx, which denote read, write, execute, respectively. If one of rwx is replaced with a - then that permission is missing on the indicated machine. If the the permission is given as ---, then that disk is not mounted on the indicated machine. The point of some partitions not having w or x permission is a security measure, discussed below.
Ceph Transition
You will be responsible for moving your own files from /mu2e/data and /mu2e/data2 to /exp/mu2e/data. The reason for two data areas was an accident of what disk space was available when we asked that /mu2e/data be extended. We are consolidating both into /exp/mu2e/data.
Before copying your files to /exp/mu2e we ask that you audit your files to identify old files that you can delete or archive to tape. Please do not copy these files to /exp/mu2e. You can find files older than, say 4 years (1460 days), with the command:
find /mu2e/app/users/<your-username> -type f -not -mtime -1460 -exec ls -ld {} \;
If your old files have no archival value, please delete them. If they do have archival value, please archive them to tape; contact the Mu2e computing leadership if you need help archiving files to tape. Please complete this by Jan 12, 2024.
The recommended way to copy files from a /mu2e disk to a /exp/mu2e disk is:
cd /exp/mu2e/data/users/<yourname> rsync -ar /mu2e/data/users/<yourname>/<directory_name> . rm -rf /mu2e/data/users/<yourname>/<directory_name>
Check that rsync completed correctly before deleting the original. This rsync command will recursively (-r) copy the directory named as the first positional argument to the current working directory and it will trahsfer the files in archive mode (-a), which preserve file metadata such as permissions and dates. In one recent test it took about 5 minutes to copy 8 GB.
Everyone with a quota on the /mu2e disks has a similarly sized quota on the /exp/mu2e disks.
The NAS disks had user based quotas. The Ceph disks have directory based quotas. That means that /exp/mu2e/app has a quota and we can set smaller quotas at any directory level. For example each user directory has a quota and each project directory has a quota.
Existing directories on /exp/mu2e/app
If you do not already have a directory /exp/mu2e/app/users/<yourname>, then the migration on Nov 15 will be simple. When the interactive machines are rebooted following the new downtime, your files will be in the new location. Some people do already have a directory /exp/mu2e/app/users/<yourname>; for those people, their migrated files will be at
/exp/mu2e/app/sync/users/<yourname>
Please use mv to move directories and files from the sync area to /exp/mu2e/app/users/<yourname>, taking care to not overwrite existing files. When done, delete your directory in the sync area.
Reseating Symbolic Links
Many people have used the following pattern to make it easy to keep source code and binaries on the app disk while providing low-keystroke access to related files on the data disk:
cd /mu2e/app/users/<yourname>/<my project> mkdir -p /mu2e/data/users/<yourname>/<my project> ln -s /mu2e/data/users/<yourname>/<my project> out
Different people have used different names for the symbolic link, with the two most common being "data" and "out".
After you move your files from /mu2e/data(2) to /exp/mu2e/data, you will need to reseat your symbolic links, as follows:
cd /exp/mu2e/app/users/<yourname>/<my project> rm out ln -s /exp/mu2e/data/users/<yourname>/<my project> out
Reseating Symbolic Links For the Computing Tutorials
Many people worked on the ComputingTutorials at the Mu2e Tutorial Day, Saturday Oct 4, 2023, or soon after. At that time the CEPH disks were named differently than they are now:
/srv/mu2e/app /srv/mu2e/data
The tutorial instructions told you to use symbolic link pattern described in the previous section.
Since that time, these directories have been renamed /exp instead of /srv/. Your files are now in the newly named locations.
If you worked on the tutorials at that time, when you return to your working area you will need to reseat the symbolic links to the data area.
cd /exp/mu2e/app/users/<yourname>/Tutorial rm out ln -s /exp/mu2e/data/users/<yourname>/Tutorial out
Ceph Disk Notes
Route SNOW Tickets Directly to Ceph
Quotas
To see your quota and used space on /exp/mu2e/app/users/<yourname>, /exp/mu2e/data/users/<yourname> and ~<yourname>. use the command:
mu2einit mu2eQuota
You can also look at the quotas and space used on the ceph disks for another user:
mu2eQuota <other_user_name>
For more details, see the next section.
Quotas And Other Attributes
The Ceph disks have directory based quotas. For example, /exp/mu2e/app has a quota and each directory in /exp/mu2e/app/users has a quota. If a directory does not explicitly set a quota, then you should walk up the directory tree to find the first directory for which a quota is set; that quota is controlling. The default quota for a user directory in /exp/mu2e/app/users is 25 GiB and the default quota for a user directory in /exp/mu2e/data/users is 150 GiB. A Mu2e collaborator may request to the Mu2e Offline Computing Coordinators that their quota be increased; you will need to provide a good reason for your request.
To see the quota for a directory that has a quota:
getfattr -n ceph.quota.max_bytes /exp/mu2e/data/projects/tracker
On SL7 you can see all of the attributes of a directory using:
getfattr -d -m 'ceph.*' /exp/mu2e/data/projects/tracker
However on AL9 the wildcard features has been turned off and you can only get individual attributes by name, for example:
getfattr -n "ceph.dir.rbytes" /exp/mu2e/data/projects/tracker
where the named attribute is the total number bytes in directory tree descended from the specified directory.
Here are the full set of named attributes that match the wildcard on SL7:
Name | Meaning |
---|---|
ceph.dir.entries | Number of entries in the specified directory, including files and directories. |
ceph.dir.files | Number of files in the specfied directory. |
ceph.dir.rbytes | Number of bytes alllocated to files in the directory tree (recursive). |
ceph.dir.rctime | It is intended to be the highest modification time of anything in the directory tree. It is known to be buggy. |
ceph.dir.rentries | Number of entries in the directory tree (recursive), files and directories |
ceph.dir.rfiles | Number of files in the directory tree (recursive). |
ceph.dir.rsubdirs | Number of subdirectories in the directory tree (recursive). |
ceph.dir.subdirs | Number of subdirectories in the specfied directory. |
Ceph Snapshots
Ceph supports snapshots. See the general discussion of #Snapshots. There are two details of snapshots that are unique to the ceph disks.
The snapshots exist at each level, for example:
/exp/mu2e/app/users/mu2epro/nightly/secondary/repo/.snap/_scheduled-2023-12-13-00_00_00_UTC_1099511627788/REve.log
There is a small glitch with ceph snapshots. If you ls a directory that is below a snapshot directory, the command will sometimes hang. But, if you open a file in that directory, it will work correctly. After you have opened the file, then the ls will work, perhaps with a small delay the first time.
Sharing Files
If you wish for your colleagues to be able to read or write files in Ceph diskspace that you own, use the normal unix group permissions. All members of mu2e are in the unix group named "mu2e".
Moving Files Across Quota Domains
Using the unix mv command to move files from quota domain to another will actually do a copy and delete, not a true mv. Instead use rsync
Instead of:
mv /exp/mu2e/app/users/a/my_directory /exp/mu2e/app/users/a
Use:
rsync -ar /exp/mu2e/app/users/a/my_directory /exp/mu2e/app/users/a rm -rf /exp/mu2e/app/users/a/my_directory
NAS Disks
Fermilab operates a large disk pool that is mounted over the network on many different interactive machines. It is not mounted on grid nodes. The pool is built using Network Attached Storage (NAS) systems from the BlueArc Corporation. This system has RAID 6 level error detection and correction.
As of 2023, Mu2e has a quota of about 90 TB, distributed as shown in the Mu2e Project disk section of the table above.
The disk space on /mu2e/data and /mu2e/data2 is intended as our primary disk space for event-data, log files ROOT files and so on. This space is not backed up.
If you want to run an application on the grid, the executable file(s) and the shared libraries might be delivered in two ways. If it is pre-built release of the code, it will be available, read-only, on cvmfs and this mounted on all grid nodes. If you are building your own custom code, that should be built on /mu2e/app, available on all the interactive nodes. See Muse for code building and making tarballs for submissions to grid.
In the summer of 2023, these disks will be replaced with new disks based on the CEPH technology: [1].
Snapshots
In the table above, some of the NAS disks are shown to be backed up. The full policy for backup to tape is available at the Fermilab Backup FAQ.
In addition to backup to tape, the NAS file system supports a feature known as snapshots, which works as follows. Each night the snapshot code runs and it effectively makes a hard link to every file in the filesystem. If you delete a file the next day, the blocks allocated to the file are still allocated to the snapshot version of the file. When the snapshot is deleted, the blocks that make up the file will be returned to the free list. So you have a window, after deleting a file, during which you can recover the file. If the file is small, you can simply copy it out of the snapshot. If the file is very large you can ask for it to be recreated in place.
On /mu2e/app and /exp/mu2e/app a snapshot is taken nightly and retained for 14 nights; so a deleted file can be recovered for up to 14 calendar days. Many years ago snapshots were also used on /mu2e/data but that is no longer done.
If you create a file during the working day, it will not be protected until the next snapshot is taken, on the following night. If you delete the file before the snapshot is taken, it is not recoverable.
After a file has been deleted, but while it is still present in a shapshot, space occupied by the file is not charged to the mu2e quota. This works because the disks typically have free space beyond that allocated to the various experiments. However it is always possible for an atypical usage pattern to eat up all available space. In such a case we can request that snapshots be removed.
How does this work? While the NAS file system looks to us as an nfs mounted unix filesystem, it is actually a much more powerful system. It has a front end that allows a variety of actions such as journaling and some amount of transaction processing. The snapshots take place in the front end layer.
You can view the snapshots of the file systems at, for example, /mu2e/app/.snapshot/, /grid/fermiapp/.snapshot/ and /exp/mu2e/app/.snap . Snapshots are readonly to us.
There are some features of snapshots that are unqiue to the ceph disks, see #Ceph Snapshots.
Home Disks
The interactive nodes in GPCF and FNALU share the same home disks. Fermilab policy is that large files such as ROOT files, event-data files and builds of our code, should live in project space, not in our home areas. Therefore these home disks have small quotas. The home disks do not have enough disk space to build a release of mu2e Offline. Therefore the Mu2e getting started instructions tell you to build your code on our project disks. You can contact the Service desk to request additional quota but you will not get multiple GB.
The grid worker nodes do not see the home disk. When your job lands on a grid worker node, it lands in an empty directory.
As of SL7 OS version, access to the home disk requires a kerberos ticket. The nfs system can cache your ticket so you can continue to access the home area in a old window even after the ticket is expired, so you probably won't notice. But this does come up more in cron jobs, which may fail because they do not use the interactive kerberos patterns. If you have a cron job that sometimes can't access the home disk, see this article. You can also setup you cron job to use kcron, a kerberos aware cron command.
Snapshots
Snapshots of your home disk are visible at:
ls ~/.snapshot
You will see that snapshots are made 4 times per day and kept for 30 days.
Sharing files
By default all of your home area is private, which makes it hard to share files with collaborators. You can copy files to /mu2e/app
, or make a mu2e directory:
cd $HOME mkdir mu2e chmod 750 mu2e
This directory can remain group-readable, but other areas will revert to private automatically. The same can be done with other experiments, like "nova".
cvmfs
This is a distributed disk system that is described in cvmfs. It is used to provide pre-built releases of the code and UPS products to all users, interactive nodes, and grids.
dCache
This is a distributed disk system that is described in dCache. It has a very large capacity and is used for high-volume and high-throughput data interactively or in grid jobs. All grid jobs may read and write event data to/from dCache; it is not possible for grid jobs to move data to/from the NAS disks.
stashCache
There exists the case of rather large files (more than a GB) that has to be sent to every grid node. This might be a library of fit or simulation templates, or a set of pre-computed simulation distributions. CVMFS is best for many small files, but has a size limit. For this case stashCache is the ideal solution.
Mu2e website
The mu2e web site lives at /web/sites/mu2e.fnal.gov; this is visible from mu2egpvm*. Selected Mu2e members have read and write access to this area - ask offline management if you need to get access. For additional information see the instructions for the Mu2e web site. The space is run by the central web services and space is monitored here.
Disks for the group marsmu2e
There are two additional disks that are available only to members of the group marsmu2e; only a few Mu2e collaborators are members of this group. The group marsmu2e was created to satisfy access restrictions on the MCNP software that is used by MARS. Only authorized users may have read access to the MARS executable its associated cross-section databases. This access control is enforced by creating the group marsmu2e, limiting membership in the group and making the critical files readable only by marsmu2e.
The two disks discussed here are /grid/fermiapp/marsmu2e, which has the same role as /grid/fermiapp/mu2e, and /grid/data/mars, which has the same role as /grid/data/mu2e.
This is discussed further on the pages that discussion running MARS for Mu2e.
Recommended use patterns
Here is a summary of recommended use patterns
- Personal utility scripts, analysis scripts, histograms and documents should go on the home area.
- Builds of the offline code should go under /mu2e/app/users/$USER.
- Small (<100 GB) datasets, such as analysis ntuples, should go under /mu2e/data/user/$USER.
- Large datasets (>100GB), and any dataset that is written or read in parallel from a grid job should reside on scratch dCache: /pnfs/mu2e/scratch/users/$USER. This area will purge your old files without warning.
- Datasets of widespread interest or semi-permanent usefulness should be uploaded to tape.