Gridexport: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
Line 36: Line 36:
Step 4 will create a bzipped tar file, named /pnfs/mu2e/scratch/user/<your username>/gridexport/tmp.xxxxxxx/Code.tar.bz, where xxxxxxxx is a random string.  The random string ensures that a run of gridexport will not step on the output of a previous run.  The last line of output from the command is the absolute path to the Code.tar.bz file - save this because you need it in step 6.
Step 4 will create a bzipped tar file, named /pnfs/mu2e/scratch/user/<your username>/gridexport/tmp.xxxxxxx/Code.tar.bz, where xxxxxxxx is a random string.  The random string ensures that a run of gridexport will not step on the output of a previous run.  The last line of output from the command is the absolute path to the Code.tar.bz file - save this because you need it in step 6.


In step 5, you can put the fcl files anywhere within your scratch space on dCache; a convenient option is to make a subdirectory in the same dCache directory that holds the Code.tar.bz file.  Then put all of the fcl files in that subdirectory, duplicating the tiered subdirectory structure.
In step 6, you can put the --fcllist file anywhere in your scratch space dCache; a convenient option is to put it in the same directory as the Code.tar.bz file.  If you do multiple submissions using the same tar file, this pattern will keep all of the related files in one place.
In step 7 be sure that the version of mu2eprodsys is v5_00_00 or greater.  If not setup the highest version explicitly.
In step 8 the --code option replaces the --setup option; the two are mutually exclusive.  The --code option first appeared in mu2eprodsys v5_00_00.


The default set of exclude patterns are found in  
The default set of exclude patterns are found in  

Revision as of 16:26, 6 February 2018

Introduction

The ups package gridexport will export a build of Mu2e Offline or a build of a satellite release for use on a grid worker node. In the future it may also support a similar function for mu2emars.

Until January 2018, the bluearc disk /mu2e/app (see Disks) was mounted on worker nodes on Fermigrid; it was possible to build code on that disk and run it on Fermigrid. That is no longer possible. If you have a build of Mu2e code on a computer that can see /pnfs/mu2e/scratch, then you can use gridexport to produce a bzipped tar file of your build area. You can then submit a grid job that copies the bzipped tar file to the worker node, unpacks it and runs the code that it finds inside. In particular you can use gridexport on any of the mu2egpvm* machines to export a build that is located on /mu2e/app or in your home area.

When you have exported code in this way, you can submit jobs to site in OSG; previously, if you were running a build on /mu2e/app you were restricted to running on Fermigrid, the only site that mounted that disk on its worker nodes.

Restriction on Satellite Releases

gridexport will work with satellite releases provided one of the following is true:

  1. The base release is in cvmfs
  2. The base release is not on cvmfs but its based on a build of Offline that contains the commit bccf3438 (January 29, 2018).

If you have a satellite release based on a non-cvmfs base release that is older than the commit indicated in item 2, contact Rob Kutschke for a small change to the satellite release setup.sh that will make it work.

Cheat Sheet

  1. setup mu2e
  2. setup gridexport
  3. cd to a valid build of Mu2e Offline or a valid Mu2e satallite release and source its setup.sh
  4. gridexport ( and capture the absolute path to the Code.tar.bz file that is printed at the end of the command )


If the Code.tar.bz for a full build of Mu2e Offline is more than about 350 to 400 MB you have almost certainly captured junk that you do not need on the worker node. Read the following material to learn how to exclude additional files from Code.tar.bz; then remake Code.tar.bz

Details

In step 2 you do not need to provide a version number for gridexport; you will automatically get the version that has been declared current. If you need to choose a different version, you can specify it using the usual ups syntax.

Step 3 is important; gridexport will only work if your current working directory is the root directory of a build of Mu2e Offline or the root directory of a satellite release. In addition gridexport requires that the build be setup prior to running gridexport. This is required because gridexport uses some environment variables that are created by setting up the build; the alternative is for gridexport to parse setup.sh, which seems fragile.

Step 4 will create a bzipped tar file, named /pnfs/mu2e/scratch/user/<your username>/gridexport/tmp.xxxxxxx/Code.tar.bz, where xxxxxxxx is a random string. The random string ensures that a run of gridexport will not step on the output of a previous run. The last line of output from the command is the absolute path to the Code.tar.bz file - save this because you need it in step 6.


The default set of exclude patterns are found in $GRIDEXPORT_DIR/etc/OfflineExcludePatterns.txt

If you wish to give additional exclude patterns to gridexport, you can give the option:

 gridexport --append-exclude-from=FILE

where FILE is a text file that contains the additional exclude patterns. You can abbreviate this to:

 gridexport -A FILE

If you wish to override the default set of exclude patterns, you can use the syntax:

gridexport  --exclude-from=FILE
gridexport  -E FILE

where FILE is a text file that contains the exclude patterns.

If you specify both -E and -A, the two files will be concatenated.

You can see other options of gridexport by:

gridexport --help
gridexport -h

Reminder that /pnfs Scratch is an LRU Cache

The /pnfs scratch space is a Least Recently Used (LRU) cache; when space is needed for new files, the oldest files are deleted to make room for the new ones. The cache is shared across all experiments. The LRU algorithm works on creation date not last-accessed or last-modified date. As of January 2018 cache lifetimes are as short as 1 to 2 weeks, although they can sometimes be longer; the pnfs scrach space is segmented into "pools" and the allocation algorithm for a new file first chooses a pool, then it does LRU within that pool; not all of the pools are the same size so the lifetimes in smaller pools are shorter than the lifetimes in larger pools.

If you plan to use the Code.tar.bz file for a period of more than about a week, you should make a backup copy on /mu2e/app and periodically copy it back to dCache. Two details about this:

  1. You need to delete the original in pnfs before copying in the new one (pnfs does not support overrwriting an existing file).
  2. Do not do this while running jobs might be accessing the file.

Finally, if you need to preserve your .fcl files "forever", be sure to put the in SAM. If you need to keep a copy "for a while", put that copy in your space on /mu2e/data or /mu2e/app; be sure to keep them as a compressed tar file, not as individual files.


Implementation Details

gridexport will make a temporary directory with a name like:

  /mu2e/app/users/<your username>/gridexport/tmp.xxxxxxxx

where xxxxxxx is a randomly chosen string. It is the same randomly chosen string as is used for the path to the Code.tar.bz in /pnfs space.

Inside this directory gridexport will make:

  1. Code - a subdirectory
  2. exclude.txt - a file containing exclude patterns for tar

Under the Code subdirectory there will be a file named setup.sh and one or two symbolic links. In all cases there will be a symbolic link to the directory from which you ran gridexport; the name of that symbolic link will be the name of the directory. If you ran gridexport from within a satellite release and if the base release upon whichthe satellite release is based is NOT located on cvmfs, there will also be a symbolic link to the root directory of the base release.

When you source setup.sh, it forwards to the setup.sh file in the appropriate subdirectory; if there is only one symbolic link then setup.sh forwards to setup.sh in that one subdirectory. If there are two symbolic links, it forwards to the setup.sh in the satellite release; in this case it also instructs the setup.sh in the satellite release to look for it's base release in the correct place on the worker node.

gridexport will use tar to archive the subdirectory:

 /mu2e/app/users/kutschke/gridexport/tmp.xxxxxxxx/Code/

and put it in:

/pnfs/mu2e/scratch/users/kutschke/gridexport/tmp.xxxxxxxx/Code.tar.bz

It tells tar to follow symbolic links. In this way it captures the content of the directory tree from which gridexport was run; if there is corresponding base release on a non-cvmfs disk, it too is captured in the bzipped tar file.

The tar command is told to exclude files that are not needed on the grid worker node, such as source code, object files, SConscript files, Makefiles, the .git subdirectory, and so on. It also excludes any files ending in .art or matching the pattern *.log* . The mechanism is described below.

As of January 2018 it typically takes 3 to 5 minutes to produce the bzipped tar file of a full build of Mu2e Offline. It typically takes seconds to produce the bzipped tar file of a small satellite release that uses a base release in cvmfs; if dCache is heavily loaded it may take up to 30 seconds. For satellite releases that use a base release that is NOT on cvmfs, the base release is also copied into the bzipped tar file; this typically takes 3 to 5 minutes.

For full build of Mu2e Offline, as of January 2018, the bzipped tar file has a size of about 360 MB. Tests showed that bzip slightly outperformed gzip for both size and CPU time.

Why is the temporary space on /mu2e/app and not on /pnfs; there are sometimes delays of up to 30 seconds creating small files on /pnfs.

Why not put the temporary space under current working directory; the reason is that it seemed fragile. The temporary directory contains a symbolic link back to the current working directory. The tar command has to be told to follow symbolic links; this will result in a recursive tar, which will eventually stop when the symbolic link count has been exceeded. One can solve this by excluding the temporary directory from tar command. However we provide users with the ability to specify their own exclude patterns; if they forget to exclude the temporary directory the recursion problem will bite. That's what is fragile.

I experimented with writing the bzipped tar file to a bluearc disk or /tmp and then copying it to pnfs. Both were slower than writing directly to pnfs.

About non-cvmfs Satellite releases

Until commit bccf3438 (January 29, 2018) a satellite release had the path to it's base release hard coded into it's setup.sh file. To support running on the grid, createSatelliteRelease was modified so that it does the following;

  1. If the environment variable source MU2E_GRID_BASE_RELEASE_OVERRIDE is defined, then it will look for it's base release at that environment variable.
  2. If thet environment variable is not defined, then it uses the hard coded path written by createSatelliteRelease.

When you run interactively MU2E_GRID_BASE_RELEASE_OVERRIDE is not defined and the satellite release uses its hard coded base release.

When gridexport writes Code/setup.sh, it will define MU2E_GRID_BASE_RELEASE_OVERRIDE to point at the base release that has been copied onto the worker node; in this way the satellite release correctly uses the copy of the base release.

Interaction with mu2eprodsys

Starting with v5_00_00, mu2eprodsys supports the --code argument. This option and the --source option are exclusive.

mu2eprodsys will use ifdh to transfer the specified tar.bz file to its current working directory. It will extract the content, which results in subdirectory named Code in the current working directory. Then it will source Code/setup.sh

As of January 2018, an exported build of Offline occupies about 1.5 GB of disk space; the bzipped tar file occupies about another 350 to 400 MB. So plan on this code occupying about 2GB of the available disk space on the worker node. Include this in your accounting of the disk space required for your workflow.

To Do List

  1. gridexport does NOT clean up after itself. Should gridexport clean up it's temporary space automatically after the tar command completes? If so, it should be retained if the verbose option is set; we can also add a --keep-tmp option.
  2. An alternative is to periodically clean up the temporary area on /mu2e/app. Should we automate this? Perhaps we should move the temporary directories to /mu2e/data/gridexport/users/ and have cron jobs that expire files older than a month? Also, files on /pnfs will expire from cache but directories do not; so should we think about a cron job to delete empty directories?
  3. Is there a good algorithm to make more meaningful temporary directory names. We still need to guarantee uniqueness. Maybe project-name-yyyy-mm-dd-hh-mm-ss-random, where project-name can be supplied as an argument or defaults to the path to the build area with / turned into _ ? Is the random string really necessary to guarantee uniqueness in this model?
  4. Do we like the capitalized options or should they be lower case. I made the short form uppercase because if you have -a and --append-exclude-from, and then forget the double dash on the long form, it results in a difficult to understand error message.
  5. gridexport should check that the files specified by -A and -E exist and are files, not directories. Should it check that they not empty? It should give understandable error messages if the tests fail.