GitHubWorkflow

From Mu2eWiki
Jump to navigation Jump to search

Introduction

The primary source code management system for the Mu2e Offline software is a git repository. Git is a popular open-source software version control system. Based on the git repository's url, you can check out code, and, if you are a code developer, check in modifications. Git also has capabilities for tagging code, tracking history and supports splitting off and merging development branches, among many other features.


Our main code repositories are stored on the commercial hosting site GitHub. The Mu2e computing group manages the github Mu2e organization. To have full access to Mu2e software, Mu2e collaborators should create a GitHub account, and request to join the Mu2e organization, please see the git introduction page.


This page describes two recommended git workflows for use with the Mu2e Offline code in GitHub, one workflow for regular users and one for developers.


  1. To execute either of these workflows make sure you have defined the Mu2e environment by executing "setup mu2e" in your shell. This ensures that you have known version of git.
  2. To execute the developer workflow, make sure you have a GitHub account, and that you are added to the Mu2e GitHub organization (https://github.com/orgs/Mu2e/people)
  3. In order to authenticate with github, you will need to set up your ssh keys on the machine from which you plan to clone/push, following the instructions here.
    • Password based authentication on GitHub is now deprecated and is scheduled to be disabled soon, as described here.
  4. Before you do any developing, go to your GitHub account and create your own fork of the official Mu2e Offline repo using the GitHub web interface (instructions here).

The main Mu2e repository is 'Offline', which contains the algorithms, data structures, art modules and configuration used in simulation production, online filtering, offline reconstruction, and other tasks. The main development branch within Offline is 'master', which should be used for all code development. Offline also contains legacy branches associated with particular data sets, which are maintained for a limited time after the master branch has been developed past those data, as described in the table below.

Mue2/Offline branches
branch name branch purpose support end date
master development end of Mu2e
MDC2018 MDC2018 dataset support end of 2020
Mu2eII_SM21 Mu2eII code for Snowmass end of 2022

Some additional smaller repositories important to Mu2e are in the Fermilab redmine git server.

Authentication

If you are asking for readonly access to a public repository, such as Mu2e Offline and most other Mu2e repositories, you do not need to authenticate yourself to GitHub. For getting started with Mu2e, this is all you need to know; just follow the instructions. You should also recognize two styles of clone urls:

https://github.com/Mu2e/Offline 
git@github.com:Mu2e/Offline

The first style gives unauthenticated readonly access to public repositories; the second style requires that you authenticate to GitHub.

There are two important cases in which you will need to authenticate to GitHub: to write to your own GitHub fork and to read the few Mu2e repositories that have access restricted to Mu2e members. You will never write directly to the Mu2e GitHub repositories. We recommend that you establish authentication to GitHub using ssh keys and all GitHub examples on the Mu2e wiki assume that you have done so. You may delay this step until it is needed. The page Authenticating to GitHub has instructions. Remember that based authentication to GitHub will be disabled in the near future.


Downloading Offline as a user and NOT a developer

Option 1: you want the default primary version of the code (most people):

  1. clone the repo:
  2. git clone https://github.com/Mu2e/Offline cd Offline
  3. Done!

Option 2: A particular collaborator has a version or branch you want to use:

  1. Find their github user name
  2. Learn the name of the branch they are working on; this may be master but it normally should not be.
  3. Clone their fork:
  4. git clone https://github.com/<their GitHub user name>/Offline cd Offline git checkout origin/<branch name>
  5. Done!

Option 3: You want to use pgit to avoid a long compilation time (EXPERIMENTAL)

  1. Create new directory to put your Offline repo in and move to that directory
  2. mkdir Offline cd Offline
  3. As in Option 2, determine fork and branch name you wish to use
  4. Create a partial checkout clone
  5. pgit2 setup https://github.com/<user name (or mu2e)>/Offline <branch name>
  6. You can now use as normal:
  7. source setup.sh scons -j 4
  8. You might need to source the following before 'source setup.sh':
  9. source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh

Developer Workflow

This section assumes basic familiarity with git, including:


  1. Create your own fork of the official Mu2e Offline repo using the GitHub web interface (instructions here).
    • This fork will be your personal sandbox on GitHub; you can do anything you want to it and it will have no effect on anyone else!
    • You only need to do this once; you can reuse this fork for all your development projects.
  2. On your development machine, create a local clone of your Offline fork. This is another step that you only need to do once and you can use this local clone for multiple development projects in you want to. When you push code to another repository, the default will be to push to your fork; you will never push code directly to the GitHub Mu2e/Offline repository.
  3. setup mu2e
    git clone git@github.com:<your GitHub username>/Offline
    cd Offline
    git remote add -f mu2e https://github.com/Mu2e/Offline
    
  4. The above steps will create a local clone of your fork of the Offline repository with the following properties:
    • There are two remotes: your fork (origin), and mu2e.
    • All of the branches from your fork and from Ofline will be visible as remote branches
    • The only local branch will be the HEAD branch (usually master) of your local fork.
    • The checked out local branch will be the HEAD of your local fork.
    • You can inspect the status of remotes and branches using git remote -v and git branch -avv.
    • In general the head of the master branch of your local fork will be out of date; you should never work from this branch. Therefore you must do the next step before you can begin development work.
  5. Make a new local branch on which to do your development work.
    • If two or more development efforts are not intrinsically coupled, each should be done on its own branch.
    • This branch is local to your clone, so has no impact on the main Offline; when you push it, it will be to your fork. Therefore it has on effect on the GitHub Mu2e version of Offline.
    • The branch name is used for sharing your development work with other people while it is still in progress, and for making your pull request, but has no special meaning to git. Choose a name that will be meaningful.
     git checkout --no-track -b <development branch name> mu2e/master
    
  6. Do your work and commit it.
  7. git commit -m "brief comment describing the changes you are committing" file1 [ file2 file3 .... ]
  8. When you wish to back up your work, or share your work with others, push your branch to your GitHub fork. If you are working on a disk that his not backed up, such as /mu2e/app, we encourage you to push frequently in order to backup your work:
  9. git push -u origin <development branch name> The -u option tells git that your local branch should track the branch in your fork. If you push the branch again, the -u option is not needed but it won't hurt if it is present. You can use git branch -avv to see that your development branch is now tracking the version of itself in your github fork and your local clone.
  10. When your development is complete and tested, go to the web site of your GitHub fork and, using the GUI, request that your branch be pulled into Mu2e/Offline. Your pull request (PR) will start the code review process (see Code Review), which may take anywhere from a few hours to a few days.
    1. In a web browser, open https://github.com/<your GitHub user name>/Offline
    2. Click on the icon that shows all branches
    3. Click on the 'New pull request' button associated with your development branch
    4. There will be an informational message near the top of the page saying if your branch is "Able to merge" or if conflicts exist.
    5. After conflicts, if any, are resolved, fill requested information and click the "Create Pull Request" button.
    6. More info is available at the GitHub instructions for Pull Requests
  11. After you submit your PR, GitHub will automatically start a Continuous Integration (CI), which includes:
    1. In a scratch area it will merge your PR into master and will build your code (prof build only)
    2. It will run several standard fcl scripts; the test passes if the art executable returns a status of 0. There are no checks on the output.
    3. It will run two tests that check that the geometry description has no illegal constructions: the Geant4 surface check and the root overlap check.
    4. It also run code formatting and static analysis checks; at this time these are informational only and their recommendations are not enforced.
    5. It reports how many time it sees the strings "FIXME" or "TODO" in the code in the PR.

    The results of these tests are posted to the PR Conversation page. These tests must pass before your PR will be merged.

  12. If changes are requested during the code review process, make those on the same development branch as your PR. When the changes are complete, commit them, and push your changes back to your fork. GitHub will automatically update your PR to include your new commits. This is because the target of a PR is a branch, not the commit that happened to be at the head of the branch at the time of your initial PR
  13. <edit code as requested by reviewer> git commit -m "Address review comment X" file1 [file2 file3 ...] git push origin <development branch name>
  14. When the code reviewers are satisfied, one of the software coordinators will merge the PR into Mu2e/Offline. Once your PR is merged your changes (commits) will be part of Mu2e/Offline master, and your development branch can be deleted. If you are uncertain if your branch has been merged or not, select the branch, and push the 'compare' button. If this comes back stating there is 'nothing to compare to', it means all your changes were already merged. If it shows differences, those have NOT been merged, so do NOT delete your branch. To delete your branch in GitHub, just push the trash can icon. You can also delete the branch in your shell, as
  15. git branch -d <my branch name> git push origin --delete <my branch name> (this deletes the branch from your github fork as well)
  16. Every night the head of the master branch is used as input to a series of validation tests; these are similar to the CI tests discussed above; however some of the jobs run many events and the output of these jobs is compared to reference output. On the morning following the merge of your PR, you may be asked if the nightly validation behaved as you expected.
  17. To reuse your working directory for a new development, first refresh to the current head of master, then create a new branch as described above. Do NOT reuse branches for new development, as updating those to the head of Mu2e/Offline master will confuse the git history.
  18. git fetch mu2e master git checkout -b <new development branch name>
  19. We encourage you to commit your work frequently and push to your github fork frequently; this is the best way to backup your work. You do NOT need to wait until you are ready for a PR request to push to your fork.

Tips for Good GitHub Hygiene

  1. Prefer many PRs, each on a self contained topic, instead of a single PR that includes many topics.
    1. Of course, extensive changes are sometimes necessary and will require a single large PR.
  2. Within a PR, prefer many commits with a small number of related changes to few commits with many changes each.
  3. Do not make spurious white space changes or formatting changes; if you want to make such changes, do so in a separate PR that includes only those changes.
  4. Do not use hard tabs in your code; instead program your editor to change tabs into the appropriate number of spaces. See Editors. Fixme: Add links to .emacs and .vimrc.

Collaborating on a feature

Sometimes you may want to collaborate on a feature branch with other developers. In this case since the main Offline repository no longer has all the development branches we need to do a couple extra steps

  1. First make sure you actually need to work on the same branch. Are you actually working on the same feature? Can the problem be split into smaller features that can be developed asynchronously? Just because features are related doesn't mean they need to be developed on the same branch
  2. Determine if a large number of people will be developing on the same branch for a significant amount of time. In this case it should become an official branch in the mu2e/Offline, like MDC2018
  3. Decide which user's fork will be the primary repo for this feature branch, and which branch on that fork you are going to use. If a new branch is needed, the owner of that fork start the new branch as follows. First make sure that you have done steps 1 and 2 in #Developer_Workflow. Then do the following:
  4. git fetch mu2e master git checkout --no-track -b <branch name> mu2e/master git push -u origin <branch name>
  5. There are then a couple options for moving forward: either add all other developers as collaborators on the primary fork, or use pull requests to the primary fork
  6. To add developers as collaborators:
    1. The owner of the primary fork opens https://github.com/<their user name>/Offline
    2. click settings on the right, then collaborators
    3. In the collaborators box, type the github user name of each other developer and hit "Add collaborator"
    4. The other collaborators can then either create a read/write access clone of the primary fork, or add it as a remote to an existing offline repo
    5. git clone https://github.com/<primary user name>/Offline or git remote add primaryfork https://github.com/<primary user name>/Offline
    6. The other collaborators can now push directly to the primary fork as if it was their own:
    7. git push primaryfork <branch name>
  7. To use pull requests:
    1. The owner of the primary fork can just push to it as normal following the normal developer workflow
    2. Other developers clone their own fork, but add the primary fork as a remote
    3. git remote add primaryfork https://githbub.com/<primary user name>/Offline
    4. Other developers can pull in and merge changes from collaborators by fetching/pulling/merging from this remote
    5. git fetch primaryfork git merge primaryfork/<branch name>
    6. Other developers push to their own fork
    7. git push origin <branch namee>
    8. Like in the normal developer workflow, they open a pull request. But then in the compare window before creating the request, change the "base repository" from Mu2e/Offline to <primary user name>/Offline (see here)
    9. the owner of the primary fork will need to accept and merge it in
    10. everything else goes like the normal workflow

Rebasing

There will be times when want to, or need to, bring your development branch up-to-date with the head of GitHub Mu2e/Offline/master. One such time is when GitHub reports that your PR has conflicts. There are two ways to bring your branch up-to-date. This section will discuss the preferred method, rebasing your development branch onto the head of GitHub Mu2e/Offline/master; you should not use the other method, merging the head of GitHub/Mu2e/Offline/master onto your development branch.

You can learn about rebasing in the GitHub documentation:

  1. git-rebase Documentation
  2. merging vs rebasing.

Until you are comfortable with rebasing we suggest that, before rebasing, you backup your work by making a gzipped tar file of your working area, excluding .so and .os files.

The instructions below presume that your GitHub fork is the remote named "origin" and that the GitHub Mu2e/Offline repo is the remote named "mu2e". The simplest workflow is:

git checkout <your development branch>
git fetch mu2e master
git rebase mu2e/master
# resolve conflicts if needed; see the git-rebase Documentation and #Tips_For_Resolving_Conflicts
git push origin <your development branch>

Note that "fetch" wants whitespace between "mu2e" and "master" but "rebase" needs a slash "/". You can now put in a pull request on your development branch.

A second option is to keep your development branch as a backup, start a new branch and rebase that branch:

git checkout <your development branch>
git checkout -b <a new development branch> 
git fetch mu2e master
git rebase mu2e/master
# resolve conflicts if needed;  see the git-rebase Documentation and #Tips_For_Resolving_Conflicts
git push origin <a new development branch>

When this process is complete, you will have two branches in your clone: <your development branch> and <a new development branch>. If you pushed both branches, they will also be in your GitHub fork of Offline. You can now create a pull request on <a new development branch>, leaving <your development branch> unchanged.

Chose the second option if it is important to retain the original branch, perhaps because you performed detailed validation using that branch and you wish to preserve the validation work and its source code for future reference.


Tips For Resolving Conflicts

When conflicts are identified by a Pull Request it is your responsibility to resolve them before continuing. There is no formula for this step; you will have to look at the 2 versions of the conflicting code blocks and decide how to best merge both functionality. If you have questions about the intent of the previously-merged conflicting code, work together with the author of those changes to figure that out. You can figure out who last changed a line in a file using the 'git blame' command.

 git blame mu2e/master <name of file that has conflicts>

When you think you are done, it's a good idea to grep to code to look for unresolved conflict markers. If you make extensive changes during rebasing, it's a good idea to check that the code builds; normally this is not necessary because the CI tests are there to catch such problems.

Once all conflicts from the merge are resolved, commit the merge and push it back to your fork. After this, GitHub will allow you to request a pull.

 git add <files that were edited as part of resolving conflicts> 
 git remove <any files that need to be removed to resolve conflicts> 
 git commit -m "Resolve conflicts message"  file1 [ file2 file3 ... ]
 git push origin <branch name>

Code Review

An important part of the GitHub workflow is reviewing new code before putting it back into the repository. Reviews are intended to minimize the risk that the requested changes break anything, check that the content of the changes are sensible, and enforce Mu2e coding standards and policies. Some reviews are automated, such as testing that the code builds and can run a few events of some standard apps. Automated code formatting checks will also be deployed soon.

Offline repo managers are responsible for assigning reviewers to each Pull Request (PR), as well as a manager in charge of each particular PR. The PR author may also assign or suggest reviewers. All assigned reviewers must approve the PR before the assigned manager will merge it in.

PRs can cover multiple subject areas. Reviewers should concentrate on reviewing code in areas in which they have personal expertise and/or subject knowledge. Reviewers are not expected to learn about areas outside their experience, as other reviewers will cover those. If you feel you were incorrectly assigned to a review, contact the repo manager assigned to the PR to request clarification or to be removed as reviewer. Reviewers should attempt to complete their reviews within a few days. Large PRs may take longer to review, and PR authors should plan accordingly. If an assigned reviewer is unavailable, they or the PR author should contact the assigned repo manager to request a substitution. The Offline repo managers should be alerted if a review becomes stuck for any reason.

Reviewers should look at the content of the PR commits for code correctness, good design and efficient implementation. Reviewers don’t need to build or run the code, that’s for the automated tests. The github commit differences referenced in the PR are the easiest way to see and review the changes. Review feedback should be inserted as comments at the relevant lines in the github diff where the reviewer has a concern. After reviewing all files and commits, reviewers should complete their review using the github interface. If you feel changes are required submit your review with that box checked. If you simply have questions submit your review checking the 'neutral' box; this neither approves the review or requires changes, it just requires a response on the part of the author. PR requesters should respond to all review comments or questions in the PR thread and/or by making a new commit inside the PR. Once all the reviewer's concerns and questions have been addressed, the reviewer should re-submit their review checking the 'approved' box. The repo manager assigned to the PR should merge the PR after all reviewers have approved it.

One of the reviewers should check for the following:

  • All modules, services and tools included in the PR must be upgraded to use validated fhicl.
  • Parameters that affect physics performance must not have default values in the code; the recommended values must be specified in the appropriate .fcl files. Parameters that affect debugging and verbosity may be initialized in code and need not be present in the .fcl files.


GitHub Pull Request Procedures and FNALbuild

When you open a Pull Request (PR) on the Mu2e/Offline GitHub repository, you will receive a greeting message from @FNALbuild, which is Fermilab's build bot account. @FNALbuild acts as a glue between the GitHub collaborative environment and Jenkins, and is used by the repository maintainers to trigger tests that may be required to pass before your changes are merged. The tests (or continuous integration, CI, actions) currently are a supplement to the existing review process.

N.B. buildmaster.fnal.gov links (Jenkins server) can only be accessed via Fermilab onsite VPN.

The supported CI actions are listed below. The first of these is triggered automatically by the creation of a PR. In addition, any CI action can be triggered at any time by posting a command in a comment on a Pull Request. fnalbuild-users GitHub Team indicates which GitHub users and Teams in the Mu2e organisation are able to trigger CI actions; some branches have additional users authorized to trigger actions; finally, the requestor of a PR may trigger actions on that PR

@FNALbuild run build test

In regular use. Merges the PR branch into the current HEAD of the base branch and builds the code, then runs a series of small jobs (10 events) to check for runtime errors.

@FNALbuild run code checks

Currently not in use. Checks the files that were changed by a PR for trailing whitespace and hard tabs - fails if modifications are needed. FNALbuild will provide a patch to fix these problems, if any.

This is not an integration test - it only tests the PR branch as-is.


@FNALbuild run validation

Not in regular use. Pulls the built code from a build test and runs a validation job (ceSimReco) over 5000 events, and produces validation plots. Builds, or pulls a cached build of master (not containing any PR changes) and runs the same validation job, producing plots. Runs valCompare between the two sets of plots, and produces a comparison which is published on the PR in a comment.

The Jenkins jobs that run the tests are located here


Where the code is stored

The scripts that are run to test a PR are kept on GitHub in Mu2e/codetools.

The FNALbuild handler scripts are kept here.

The Mu2e version of this bot was based on and re-written using the original process_pr.py script from FNALbuild/cms-bot:

  • process_pr.py: Where everything happens. given a PR, process all comments and figure out which tests to trigger based on those comments.
  • comment_gh_pr.py: this is called from the Jenkins jobs to post a comment on the PR when the result of a test has been determined, or when the status of a test has changed e.g. it is running.
  • test_suites.py: some configuration variables to set up the regular expressions that are used to search for commands that trigger tests.
  • watchers.yaml: Users may add themselves to this to 'watch' a specific folder in Offline for changes, such that they are notified when someone wishes to change it. Regex is supported.
  • auth_teams.yaml: Configuration file of which GitHub teams are able to launch CI actions on a base branch.




Tips and Tricks

Branch prompt

Many people work in a style where they are switching between branches frequently, and everyone does this switch at times. You can put the branch name in your prompt so you are unlikely to get confused about what branch you are on.

In .bash_profile:

parse_git_branch() {
    git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* (.*)/(1)/'
}

export PS1="\h \w $(parse_git_branch) "

Finding deleted files

As a policy we have decided to not keep code that does not compile. Some code may be useful as examples or to revive an abandoned effort, so we should be able to recover them. Here are three methods:

  1. grep the release notes
  2. grep mystring ReleaseNotes/*/*
  3. if you know the file name, look at its history
  4. git log myfile
  5. search git history. The following git command will list every commit that contained the deletion of a a file and it will list the names of all files deleted in that commit:
  6. git log --diff-filter=D --summary and can be pipe do to grep

Once you find the file name, you can check it out based on the commit that deleted it.

git checkout <deleting_commit>^ -- <file_path>

If you are deleting non-trivial files, please note the full file paths in the releases notes.

What branch is Tracking What Other Branch?

 git branch -avv

will show a list of all branches in your working area, both local and remote braches. The additional information provided by -vv is:

  1. The SHA of the commit at the head of the branch
  2. The comment text associcated with that commit
  3. If the branch is tracking another, there is a notation of the form [branch name] between items 1 and 2.

In the example below the local branch master is tracking Mu2e/master.

* work                             9e5e9f0 Merge pull request #152 from goodenou/MT_fcl_fix
  master                           9e5e9f0 [Mu2e/master] Merge pull request #152 from goodenou/MT_fcl_fix
  remotes/Mu2e/MDC2018             7da9814 Merge pull request #74 from gianipez/trkTrig1
  remotes/Mu2e/master              9e5e9f0 Merge pull request #152 from goodenou/MT_fcl_fix
  remotes/origin/HEAD              -> origin/master
  remotes/origin/MDC2018           7da9814 Merge pull request #74 from gianipez/trkTrig1
  remotes/origin/bfield_xyzvec_1   8207ed0 Add accessor that uses XYZVec for input argument and return value.
  remotes/origin/branch_with_error 7b70a3e Deliberate error in order to see how CMS-BOT behaves with an error.
  remotes/origin/master            26bb554 Merge pull request #1 from Mu2e/master

What Remotes are Available in my Working area

 git remote -v

Combined with git branch -avv this allows you to determine which remote branches are attached to which remote repositories.

Long commit histories

avoid posting long commit histories


Looking at the Merge History

So long as we only modify master by merging in pull requests, the following procedure will allow us to find the state master at an arbitrary time in the past. The procedure is robust against people merging master into their working branches.

This information is taken from: [1] and I made a few small modifications in the details of the printout.

The following git command will list all merge commits of merges into master and print some information about each.

git log --merges --first-parent master --pretty=format:"%h %<(10,trunc)%ae %C(yellow)%<(15)%aI%Creset %C(red bold)%<(15)%D%Creset %s" 

The result on the morning of Jan 29, 2020 looks like:

f6bdbaebc kutschke.. 2020-01-28T14:33:45-06:00 HEAD -> master, tag: v08_02_01, Mu2e/master Merge pull request #125 from goodenou/art3_MT
651cf9425 kutschke.. 2020-01-28T14:31:30-06:00 tag: v08_02_00  Merge pull request #124 from resnegfk/g4105
14eeda454 kutschke.. 2020-01-25T17:27:40-06:00 tag: v08_01_00  Merge pull request #123 from ryuwd/refactor-useprodTracker
ae02fa010 kutschke.. 2020-01-24T13:32:14-06:00                 Merge pull request #121 from resnegfk/g4stepper
2c1fd533c Dave_Bro.. 2020-01-23T17:25:06-08:00                 Merge pull request #122 from bonventre/reflections
040a6fd0d kutschke.. 2020-01-22T20:31:50-06:00                 Merge pull request #120 from brownd1978/schema1

The color information was removed by the shell when I captured the output. If you run the command yourself you will see the coloring.