GitPartialCheckout

From Mu2eWiki
Revision as of 14:25, 16 April 2019 by Rlc (talk | contribs) (→‎Commands)
Jump to navigation Jump to search

Introduction

Following the standard git checkout patterns results in cloning and checking out locally the entire Offline repository. When the entire repo is built, it can take a an annoyingly long time. Partial checkout is one powerful way to mitigate this build time. In partial checkout, you rely on a base Offline build for most of the Offline libraries, and you only checkout a little bit of Offline locally and build that subset of code as you work. When you build, if a needed header file is found locally, that will be used, but if it is not in the local working area, then it will be found in the base build. After building, the exe's will pick up your local libraries first, then link the rest from the base release. The same pattern of including your local working area in a path before the base build, is followed for all the paths (fcl, python, data, bin, etc).

The local area is a valid, complete git repo, so you can pull, push, tag, etc. while you are building a subset of the code. Commands and concepts such as pull, tag, hashes, branches, etc, which normally apply to an entire repo, will still apply to the whole repo, not just the check-ed out part.

To implement the partial checkout, we have provided a script called "pgit" which is in your path after "setup mu2e". We have also provided a set of pre-built Offline areas, to serve as the base builds. These can be provided for the head of selected branches.

Two Warnings

The downside to not building the entire repo locally is that your local partial build may get out of sync with the base build. If code in the base release is compiled with a different header file than the local partial build, then then executables will not run correctly, and will probably result in odd memory errors, such as seg faults. It is also possible that the code will fail silently, so these are very serious issues.

There are probably other ways to cause trouble, but all users must be aware of the following two issues.


First is the issue of intermediate commits. For example, the base build is at a certain commit, call it N, and you create a partial checkout based on this base build. After you are setup, someone else commits N+1 to the branch you are working on. In terms of the git repo, you can handle this like a full repo - you can pull or merge the new commit into your local repo, work on that and commit N+2 to the head of the branch. The problem is that when you do pull the intermediate commit N+1, the header files in your working area may become inconsistent with the header file used for the base build, leading to serious errors. Also, you may have checked out a subset of code X, when the intermediate commit concerned disjoint subset Y, so you will not see the effect of the intermediate commit at all.

If you see an intermediate commit, then you can "git diff" and decide if it is harmless with respect to dependencies. For example, if the commit was only to a cc file, then you can checkout this part locally if you want to get its effect, or ignore it in your local checkout, if you are sure it doesn't matter to your work.

The second major issue concerns your local header files. If you modify a local header file, the only way to get a fully correct build is to recompile every piece of code that includes that header file and what depends on this header may not be at all obvious.

"pgit check" provides some checks for these problems, but there is no substitute for being personally aware of these dependency issues. There are probably other ways to cause trouble, but all users must be aware of the following two issues. When in doubt, you can always "pgit quit" and go back to a normal full checkout, disconnecting from the base build.


Commands

Make sure you're using the Mu2e version of git:

setup mu2e
setup git

For instructions see

pgit help

See the base builds available:

pgit list

2018-12-13 13:50 master/6d77f6b8/SLF6/prof 2018-12-13 13:42 master/6d77f6b8/SLF6/debug The hex is the first 8 char of a commit hash. The list is presented with the most recent at the top.

To start a new partial build, backed by the given base build:

pgit init master/6d77f6b8/SLF6/prof

If you know you want the latest master base release, there are reasonable defaults:

pgit init master

Setup:

cd Offline
source setup.sh

see your paths:

proff

to checkout a couple of directories:

pgit get CaloReco ParticleID

One note on how git thinks. If you "pgit get fcl" you will get "./fcl" and also "*/fcl/*" which includes many other directories you probably didn't intend. To force a full specific path, try using a leading "/" to indicate the head of the directories. So "./fc;" can be added by requesting "/fcl" and a specific subdir can be retrieved with "pgit get Print/fcl" or "pgit get /Print/fcl".

to remove a directory:

pgit rm CaloReco

to check dependencies:

pgit status

build normally:

scons -j 4

At this point you should be able to use regular git commands.

To exit partial checkout and go back to a full checkout

pgit quit

How it works

the checkout

The partial check uses the official git partial checkout methods. The functionality is turned on by

git config core.sparsecheckout true

Once this is set, only files and directories that you specify are copied from the index to the working directory. The checkout list is kept here:

echo "/SConstruct" >>  .git/info/sparse-checkout

The slash says it has has to be in the top level directory. If there is no slash, it is used like a search term so "X/SConstruct" would also be checked out. This file contents can also be set to "*" for everything.

The remote of the local partial checkout is set to the remote of the .git in the base build, which will typically be the main repo.

The last initialized base build is saved here:

git config mu2e.baserelease $base_repo

The pgit command resides:

/cvmfs/mu2e.opensciencegrid.org/bin/pgit

and is put in the path in

/cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh

The base releases

The base releases are kept up to date by a jenkins project called "mu2e_ci_branch". The script is kept in

setup codetools
ls $CODETOOLS_DIR/bin/jenkinsCIBranch.sh

This project is triggered by commits to the repo. The project is configured with a set of branches to monitor. If the commit doesn't change the head of one of those branches, it doesn't build.

Since it is not easy for jenkins to push a result, there is a cron process on mu2epro@mu2egpvm01 which polls the jenkins project and pulls the new tarball when they appear. This script is

~mu2epro/cron/git/moveCIBranch.sh

Then, since only a special user can write to cvmfs, this script runs another:

cvmfsmu2edev@oasiscfs.fnal.gov:pullCIBranch.sh