Github CI Maintenance

From Mu2eWiki
Jump to navigation Jump to search

Introduction

CI stands for continuous integration. The CI repo, inspired by CMS-BOT contains code that handles a number of tasks related to ensuring new code merged into the Offline and Production repos passes the necessary tests; specifically, it handles all tasks that involve communication with GitHub, such as checking if a new comment contains a test command or posting the test results on the PR, using the bot account FNALbuild. It does not contain the tests themselves; those are in the codetools repo. Code in the CI repo is written in Python, while codetools is in bash.

CI and codetools are used in our Jenkins jobs that handle build tests for Offline and Production. Each Jenkins job makes a fresh clone of the CI and codetools repos and downloads any new dependencies or dependencies with updated required versions. This means any changes made to these repos take effect immediately, and will be seen the next time someone runs the Offline or Production build tests.

For more information about how the Jenkins jobs get launched in the first place, see Jenkins#Github_PR_Tests.

The procedure for launching build tests from GitHub is given at GitHubWorkflow#GitHub_Pull_Request_Procedures_and_FNALbuild.

LArSoft also uses the GitHub/Jenkins system for CI. Their system shares some commonalities with ours but has additional features. Some useful background information is available in the LArSoft CI documentation.

Some important parts of the CI repo

This section discusses how the code in the CI repo is used to manage testing PR's to Offline and Production.

Standalone scripts for interacting with GitHub

These are executable scripts which may be used by code outside the CI repo, but which use code in the CI repo to interact with GitHub. For examples of how to call these scripts from a bash script, see this codetools function, or any of the functions starting with "cmsbot" on that page. These scripts allow the Jenkins jobs mu2e-offline-build-test and mu2e-production-build-test to merge PR's correctly and update the PR as tests complete.

Bash scripts in the codetools repo can call CI scripts by the following process:

  1. create a Python virtual environment using the venv command, and activate it
  2. clone the CI repository and install the dependencies in requirements.txt using pip install
  3. call the script

Once the first two steps have been done, you can re-activate the virtual environment and all the scripts and dependencies will be there. This is useful if you need a CI script after running a muse setup: using a subshell to activate the venv and run the script keeps dependencies separate (see note at the end of Python version issues).

The standalone scripts are at the top level of the CI repo. Two examples are:

comment-github-pullrequest - Posts a comment on a pull request. Takes three arguments: -p NNN where NNN is the number of the PR, -r Mu2e/ABC where ABC is the name of the repo, and -R filename where filename is a file containing the comment to be posted. An example might be comment-github-pullrequest -r Mu2e/Offline -p 555 -R gh-report.md Comments are made as the bot user FNALbuild.

get-pr-base-sha - Retrieves the last commit sha of the branch a PR is asking to merge into, and writes it to a file. Takes three required arguments and one optional argument. The required arguments are: -p NNN where NNN is the number of the PR, -r Mu2e/ABC where ABC is the name of the repo, and -f filename where filename is the file where the output information will be written. The optional argument is -j anything where anything means it doesn't matter what the value is, if this argument is included, the script will write the ref name (i.e. branch name) to the file rather than the commit sha. An example might be get-pr-base-sha -r Mu2e/Production -p 333 -f myShaFile.txt or get-pr-base-sha -r Mu2e/Production -p 333 -f myRefFile.txt -j true

adding a new script

If you need to create a new script to interact with GitHub, and which needs to be executable by non-python bash scripts, use comment-github-pullrequest, get-pr-base-sha, or report-test-status as a model. All of them use argparse to define and process command-line arguments. Once you're satisfied the script will do what you want, set it to be executable with the command git update-index --chmod=+x your-file-name and then commit. If you don't do this step, the script will not be executable.

Code for controlling whether build tests are run

This is Python code used by the mu2e-github-bot job in our PR tests workflow. It collects information about the pull request and determines which, if any, tests need to be run. In certain cases, it communicates with the PR creator via comments posted on the PR, informing the creator of its decisions.

process-pull-request - an executable script similar to the ones described above. Takes the repo name and the PR number as in process-pull-request repo Mu2e/Offline pr_id 555 and launches the process_pr script in the Mu2eCI folder.

Mu2eCI/process_pr.py - The main script for mu2e-github-bot. Any time Jenkins receives an event from GitHub, it uses this scripts to determine what, if anything, to do. The procedure of this script is as follows:

  1. If the PR has been merged, collect all the open PRs on this repo and run process_pr.py on them -- this will have the effect of posting a comment on them saying the HEAD has changed.
  2. Establish the list of "authorized users": people who can initiate tests on this PR
  3. Check which files were changed by the PR and notify the people who have asked to watch these files -- this information is in config/watchers.yaml
  4. Determine which tests should be run by default on these changed files -- this information is in Mu2eCI/test_suites.py.
  5. Collect all the commit statuses attached to the last commit in the PR. Use these to determine if tests are already running and if the base branch HEAD has changed.
  6. Look for a test command: loop through the comments on the PR, ignoring comments that have already been seen or that predate the last commit. Look for comments by authorized users that contain valid test commands. Mu2eCI/test_suites.py contains the regexes used for this. If a valid command is found, add the appropriate test suite to the list of tests to trigger, and have FNALbuild post a thumbs-up reaction on the comment.
  7. If the PR is brand new and the PR creator is in the Mu2e organization, make sure the default tests are in the list to trigger.
  8. Update the PR with a label indicating the state of any tests, and actually trigger the tests by creating a properties file (code in Mu2eCI/common.py) containing the information about the test to be run. The existence of this file is what tells Jenkins to run the actual tests.
  9. Update the commit statuses to reflect the test suite to be run, if any
  10. Post a comment on the PR. Depending on the situation (new PR, tests requested, no tests requested but the base branch HEAD changed), different messages will be posted. These can be found in Mu2eCI/messages.py.

Commit status, test status, and labels

Commit statuses are collections of information attached to a specific commit on GitHub. The GitHub commit status API is documented here. We use these to keep track of actions Jenkins has taken regarding a pull request, and the results of these actions.

To see the statuses for the last commit in a PR, scroll down to the bottom of the PR's conversation tab:

GitHub display of the commit statuses attached to the last commit in a PR. The build tests had been triggered but were not completed when this image was taken.

One can see the statuses of any commit by querying the API:

curl \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer TOKEN_REDACTED"\
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/Mu2e/Repo_Name/commits/commit_sha_here/statuses

If you're doing this from your own console, the token is the same one you use to push code to GitHub from your local repositories. When Jenkins interacts with GitHub, it has its own.

The commit status fields used by the CI system are:

  • context - the action this status refers to. Options include "jenkins/ghprb" for Jenkins receiving an event from GitHub related to this PR, "mu2e/buildtest/last" for finding the latest commit on the base branch, "mu2e/buildtest" for the build tests, and the names of certain individual tests.
  • state - the current state of the action described in "context." May be "success,", "pending," "failure", or "error".
  • description - the current result of the action. May simply say "The build is running", or may contain more detailed information about a completed test. In the case of "mu2e/buildtest/last", it will include the commit sha for the last commit in the base branch.
  • target_url - url that points to more information. May point to a log file in the case of a completed test, or a Jenkins console in the case of a running one.


Test status in process_pr.py

The process_pr.py script in CI/Mu2eCI keeps track of the status of tests run on a PR, using a dict object called test_statuses. This is populated partially by looking at the latest commit status with the context "mu2e/buildtest". Note that multiple fields in the commit status contribute to the test status. The possible values of a test status partially overlap with the possible values of the "state" field of a commit status, but can also contain information pulled from the description, such as if a test is "running" or "stalled".

PR labels

Labels can placed on a PR; they are attached to the PR as a whole and not individual commits. We add labels to the PR indicating which tests are needed and what their status is. If the build tests are running, the label will say "build running". Labels such as "build pending" and "build running" are set by the mu2e-github-bot Jenkins job in process_pr.py, using information in the test_statuses object mentioned above.

Managing the CI Github repo

This section is about managing the CI repo itself.

automated tools

Three main automated tools work on the CI repo: dependabot, pre-commit-hooks, and tests.

dependabot

Dependabot manages updates to the dependencies listed in CI/requirements.txt. It checks weekly for updates, and makes a PR to CI to change requirements.txt if it finds them. Dependabot actions are controlled by the config file at CI/.github/dependabot.yml. Dependabot runs on GitHub infrastructure.

5/2023 On this page rlc set these two switches:

Dependabot alerts
Dependabot security updates

to disabled

pre-commit-hooks

pre-commit-hooks ensures uniform style and formatting. When you make a PR to the CI repo, it checks for whitespace errors, python warnings, etc, and then adds its own commit to your PR fixing these issues. These actions are controlled by the config file at CI/.pre-commit-config.yaml. When the pre-commit-hooks related packages have updates, pre-commit-hooks makes a PR with the versions in .pre-commit-config.yaml updated. pre-commit-hooks runs on GitHub infrastructure.

tests

The tests are run when new code is pushed to a branch of the Mu2e/CI repo. These are NOT the build tests for Offline and Production; these are only for changes to the CI repo. Both dependabot and pre-commit-hooks make their own branches for their PRs, so the these tests always run on those. The tests are controlled by the config file at CI/.github/workflows/tests.yml. Right now they only check that all the dependencies install correctly in the python versions we want. The tests run on GitHub infrastructure -- referred to elsewhere on this page as the "GitHub test runner".

Python version issues

The OS default python version on the Jenkins machines is Python 3.6. This is the version Jenkins uses when it creates the virtual environment where the CI repo code runs, so all code in the CI repo needs to work properly with that version. Some dependencies have stopped supporting Python 3.6 in their newest versions. Additionally, the GitHub test runner has dropped Python 3.6 from its latest version. This means we need to avoid problems caused by unwanted updates.

If a dependency in CI/requirements.txt is updated to a version that doesn't support Python 3.6, the next Jenkins job will stop downloading dependencies when it reaches the unsupported one, and will continue with whatever dependencies it happens to already have. If a dependency listed later in requirements.txt also received an update, the updated version will not be used. The way to avoid this problem is to always check that the automatic tests run on GitHub all pass, especially the Python 3.6 test, before merging any updates from depdendabot. If they don't pass, simply close the PR. To ignore updates for a dependency in CI/requirements.txt, add it to the "ignore" section of CI/.github/dependabot.yml. If, on the other hand, the Jenkins version of python is upgraded, it may be necessary to remove items from the "ignore" section so they can be updated. As an example of an "ignore" section:

#requirements.txt
updates:
  - package-ecosystem: "pip"
    ... other things ...
    ignore:
      - dependency-name: "PyGithub"
      - dependency-name: "requests"

If the GitHub test runner is set to use the latest version, it will fail to set up Python with the error message "Version 3.6 with arch x64 not found". This will result in PRs meant to update the CI repo failing the tests, even if the PR itself would cause no problems. The way to avoid this problem is to ensure that the "runs-on" item in CI/.github/workflows/tests.yml is set to ubuntu-20.04 and not to ubuntu-latest. If Jenkins is later updated to a new version of Python, this can be changed.

Important note: The python version required for the CI repo is not synchronized with the version required for Offline and Production. They use the version of python required by art, which is regularly updated. This gets set up when muse setup is run. Unfortunately, the build test workflow needs to use the CI repo before the muse setup can happen, so can't just piggyback off Offline to get the same Python version -- hence using the OS default python. This also means that, in order to keep the different dependencies separated, if a Jenkins job calls a script from the CI repo after running muse setup, it should use a subshell to activate the virtual environment and run the script.

codetools scripts for testing PR's

The codetools repository contains scripts to handle many different building and testing tasks, such as building Musings or running nightly validation. For the purposes of this section, they are also used for running tests on pull requests to Offline and Production.

These scripts are used by the Jenkins jobs mu2e-offline-build-test and mu2e-production-build-test.

bin/gh_pr_bootstrap.sh - the script lauched directly by Jenkins. It establishes necessary environment variables and prints some information, before launching the correct job script. It also contains a number of functions for communicating with GitHub, such as posting a comment or updating a commit status. These all work by activating the virtual environment for the CI repo and then running the appropriate standalone script. These functions' names all start with "cmsbot".

All job scripts for PR tests are located in codetools/bin/github/jenkins_tests. Inside this directory, there are subfolders labeled with the names of test jobs -- i.e. there is a folder called "mu2e-offline-build-test" and another called "mu2e-production-build-test". Inside each of these are two scripts: job.sh and build.sh

job.sh - this establishes the list of tests to be run, clones the Offline and Production repos and fetches the PR, performs TODO/FIXME and whitespace checks, and launches the script build.sh. When that is completed, it posts a comment on the PR detailing the tests and whether they passed or failed.

build.sh - builds Offline and runs the build tests. As tests complete, it updates the commit status of the last commit, so the updated test results are visible at the bottom of the PR's conversation page.


PR test debugging tips

If someone posts a test command and does not get a thumbs-up react after a few minutes: This usually means the mu2e-github-bot job did not complete. Sometimes it means the mu2e-github-bot job did complete, but did not recognize the test command or the user is not an authorized user for this PR.

To see if Jenkins received a test command: Find the receive-gh-event (or receive-gh-event-production) job at buildmaster.fnal.gov, in the Mu2e tab, under the directory GitHubPRTests. If you click into that, it should list the recent individual jobs it ran on the left of the page. Job names always have the PR number in them, so if there's no job for that PR at about the right time, it never ran. If it did, you can click into that individual job, and there should be a "Parameters" item in the left menu. The parameters include the comment (if any) that triggered the webhook. This can let you find the comment with the test command (it also lets you see if the test command got typoed, or contained a Markdown-formatted link). Note that every comment on a PR, including from the bot user FNALbuild, will cause this event to fire, so you may have to dig through a few of them to find the right one.

To see if mu2e-github-bot ran for a particular test command: Find the mu2e-github-bot job at buildmaster.fnal.gov, in the Mu2e tab, under the directory GitHubPRTests. Just like receive-gh-event, you can click into that to find the individual jobs that ran recently, and look for ones with the correct PR number. If the job did run, you can click into that individual job, and the landing page should tell you which upstream job triggered this one. There should be an option on the left to look at the console output. Quite a bit gets printed here, including the success/failure of installing the CI repo dependencies, whether it found an actionable test command, whether it ultimately made a properties file to trigger tests, etc.

To investigate why certain tests unexpectedly failed or stalled: Find the mu2e-offline-build-test (or mu2e-production-build-test) job, in the same directory as mu2e-github-bot, and just as for the others, find the recent individual job with the correct PR number. Use the option on the left of the individual job page to load the console output. Note that the console output for these jobs tends to be very long, and by default the first part (containing the printouts from cloning and merging, and much of the output from the build process itself) will be truncated. There should be an option to view the whole thing if you suspect a build problem. The later part of the console output concerns the tests, and should tell you which ones started, roughly how long they took, and their results.

If some tests did run and you want to follow the chain of events that led there: Because GitHub sends a webhook any time anyone comments, including the bot user, an active PR will have a lot of entries for receive-gh-event and mu2e-github-bot even if the tests have only run once. If tests did run and you're trying to follow the chain of jobs that led to them, it's best to find the appropriate entry in the mu2e-offline-build-test (or mu2e-production-build-test) job page first, and then follow the link for the job that triggered it.

local test of CI scripts

To test the CI procedure locally, copy gh_pr_bootstrap.sh and change cmsbot_report() and cmsbot_report_test_status() to print to either the terminal or a file. Then make a launch script as below and source the new gh_pr_bootstrap.sh with the job to test

#! /bin/bash
# set some needed variables
toTest=mu2e-offline-build-test
REPOSITORY=Mu2e/Offline
PULL_REQUEST=642
COMMIT_SHA=166417616a7ba8f47bc490ad8d28583361a7e6f2
MASTER_COMMIT_SHA=b83e8bbcc12b6acfe054359e916dc2df68a8c4ed
WORKSPACE=/home/hcasler/CI_work/TestArea/Workspace
source NEW_gh_pr_bootstrap.sh $toTest