CheckoutAndBuildCodeTutorial: Difference between revisions

From Mu2eWiki
Jump to navigation Jump to search
Line 167: Line 167:
When we started building the code in the exercise above, we killed it because it would take too long.  On a single node it will take an 45 min, and on mu2ebuild01 with its 16 cores, it will take 10min.  This is a common problem, so we have developed two ways to build part of the code.
When we started building the code in the exercise above, we killed it because it would take too long.  On a single node it will take an 45 min, and on mu2ebuild01 with its 16 cores, it will take 10min.  This is a common problem, so we have developed two ways to build part of the code.


In both methods we need a fully built "base release" that already exists, such as the ones we just looked at.  We then make a local directory with just a small piece of the code.  We build this small piece, and then take all the rest of the compiled code from the "base release".
In both methods we need a fully built "base release" that already exists, such as the ones we just looked at.  We then make a local directory with just a small piece of the code.  We build this small piece, and then take all the rest of the compiled code from the "base release".
 
In both methods you must be aware of the possible pitfalls.  If any header file that is compiled into your local small piece of code is different than the same header file compiled into the base release, then the code will have corrupt memory and will probably fail in unpredictable ways. 
 
The first method is called [[SatelliteRelease|Satellite]] releases. You can create one now
 
 
 


The first method is called [[SatelliteRelease|Satellite]] releases.  This





Revision as of 19:22, 21 June 2019


Introduction

In the case of Mu2e, an event records the detector response during one microbunch, consisting of a short burst of millions of protons hitting the production target, thousands of muons stopping in the aluminum stopping target, and those muons interacting or decaying. This happens nominally every 1695 ns and each of these burst forms a Mu2e event.

When we read and process events to simulate the detector, reconstruct raw data, or analyze data events we will need a framework to organize our executables, and way to control the behavior of the executable, and a format to write the events into. These functions are provided by the art' software package.

Art

The art package provides:

  • a framework for creating executables
  • the event format (we define the contents)
  • a control language called "fhicl" or "fcl" (pronounced fickle)
  • services such as random numbers

When we write code, typically to simulate the detector, reconstruct the data, or analyze the data, we write our code in c++ modules. We then run the framework, and tell it to call the code in our module. We also tell it what input files to use, if any. The framework opens the input files, reads data, priovides services, calls our modules with the data, event by event, and then handles any filtering of events, and writing output.

When we run executables in the art framework, we have to tell the framework which modules to run and how to configure the modules. A module configuration might include how to find the input data inside the file, and what parameters and cuts to use in performing its task. This function is provided by a fcl file - you always provide a fcl file when you start up an art executable.

art executables read and write files in the art format. This is a highly-structured format which contains the event data, processing history, and other information. Under the covers, art uses the root package (used widely in HEP) to read and write the event files. Each event contains pieces which are referred to as products. For example, in each event, there may be a product which contains raw hits, one that contains reconstructed hits, and one that contains reconstructed tracks. In simulation, there are products with truth information.

All of these concepts will be fleshed out in subsequent tutorials.


Code

We will start with a tour of the Mu2e code base. We keep our code organized using a piece of software called git. This freeware package stores our code in a compact format and keep track of lots of useful things:

  • all the history of each file as it is developed
  • coherent versions of the whole set of code (tags)
  • tracking parallel development by many people, and merging all that work back together when ready.

So lets look at what is stored in our git repository. If you are in the tutorial docker container,

mkdir -p /home/tutorial_code
cd /home/tutorial_code
source /setupmu2e-art.sh

or if you are on the interactive central servers (mu2egpvm)

mkdir -p /mu2e/app/users/$USER/tutorial_code
cd mu2e/app/users/$USER/tutorial_code
source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh 

Then retrieve the code base from the git repository:

git clone http://cdcvs.fnal.gov/projects/mu2eofflinesoftwaremu2eoffline/Offline.git

after a few minutes you should see a directory Offline. Go into that dir

cd Offline

and look around. Note the directory .git. This is where git keeps information such code history and tags. Most of the directories you see here contain our working code. Each directory contains a relatively small, logically-grouped set of code.

Here are few git commands to get you started.

git status

tells you which branch you are on. When multiple people are working on code development, they will probably be working on their own branches - a version of the code they can control. When the changes are tested and approved, the personal, working branch is merged back into the main code branch, called "master". See the active branches:

git branch -al

You can switch to a different active working branch:

git checkout MDC2018
git status

or to a fixed release:

git checkout v7_4_1

all that output text is saying this fixed point of the code and you shouldn't be trying to modify it, which fine, make sense.

git status

see the files which are different from the previous release:

git diff --numstat v7_4_0 | head

and then see the code that changed in that first file listed:

git diff v7_4_0 Analyses/src/KalRepsPrinter_module.cc

and then the file history:

git log Analyses/src/KalRepsPrinter_module.cc

In the git status you should also see "working branch clean" this means your area is still the same as the central repository you just cloned. If there were new or changed files, it would tell you. You can read much more about git and how we use it, including making your own branches and committing code modifications back to the central repo.

Let's look at the code structure. For example, under

ls -l TrkHitReco

you can see directories: fcl, inc, src, test and data. Most of the top-level directories follow this pattern. The c++ includes related to the code are kept under inc, the c++ source code in src, the fcl scripts that configure modules are in fcl, and sometimes scripts are under test or text files used by the code are kept under data. Many directories don't need test or data - see the TrkReco directory for example. Take a look at what in these directories.

Recall that we write modules, which are then run by the framework to act on the event data. Any module we write is compiled into a shared object library which is loaded by the framework if our fcl tells it to. For example,

TrkHitReco/src/StrawHitReco_module.cc

would be compiled into

lib/libmu2e_TrkHitReco_StrawHitReco_module.so

which we can then ask the framework to load and run (or not). The shared object is not there, because we haven't done the compiling, which we will get to in a minute.

Just as a quick peak, open the cc file:

emacs (or vi, cat) TrkHitReco/src/StrawHitReco_module.cc

Find the line

class StrawHitReco : public art::EDProducer

Since the module code inherits from a framework base class (EDproducer), the framework can manipulate it.

Find the line

StrawHitReco::beginJob

this method is called once at the beginning of the job. Similarly

StrawHitReco::beginRun

is called when the input events change run number.

StrawHitReco::produce

is called for every event. It is called "produce" because it typically produces data to insert into the event, in this case, the reconstructed tracker hits. Other option are "filter" if this module can select a subset of events to write out, and "analyze" if the module is not writing any new data to the event.

Find the line

event.getValidHandle(_sdtoken)

this is retrieving the raw hits from the event. Find the line event.put(std::move(chCol)); This adds a newly created data product to the event. All of these concepts will be covered in more detail later.

Building Code

In a typical use pattern, you would modify some of the code, or add code, then you would want to compile it so you can run it. This is the job of the build system. In our case we use a system called scons. The build system has to look over the code, decide what needs to be compiled, do the compilation according to a recipe, then see what needs to be linked and do that linking. The first time you run scons, it will compile and link everything. After that, it should only do the steps which are effected by how you changed or added to the code.

scons action is driven off the SConstruct file, which is written in python and tells scons how to do its job. You can take a look at it and see lots of stuff related to code building. One of the things it does is to look for SConscript files in the directory tree, such as

cat TrkHitReco/src/SConscript

these files, mostly in src subdirectories, can customize the scons behavior for operating on the code in the directory where it sits.

When you build the Mu2e code, you have some options. Your primary choice is whether to build prof (optimized) or debug (not optimized - easier to debug). You can see your options with

./buildopts

You can switch between prof and debug with

./buildopts build=debug

or the reverse. The other options are beyond the scope of this page. Once you start building you have to stick with your choices until you start over (delete all the built files and setup in a new process). There is a help.

./buildopts --help


There are many software packages written by non-Mu2e parties (like art and root). We need to provide the build system pointers to those packages. We do this by

source setup.sh

You can see everything you now point to:

ups active

observe your environment now points to particular version of art, root and many other packages. (You have to finish with any buildopts choices before running setup.sh, since the latter locks in the choices)

ups is a Fermilab software package that allows you to organize these third party packages and their versions, operating system flavors and inter-dependencies. You can see your OS flavor:

ups flavor

or see what versions of a package, like root, are available:

ups list -aK+ root


Once you're done with buildopts and setups, you can build:

> scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
g++ -o Analyses/src/BkgRates_module.os -c 
...

once it starts compiling, you might as well kill it with ctrl-c, since it will take long time to run. In proactice you will want to build on mnode mu2ebuild01 which has 16 cores and you can build it with 20 threads.

scons -j 20

You can now remove any new files that were written by the build.

scons -c

This leaves all the source code alone. It also doesn't effect the setup choices you made with buildopts and setup.sh, those are still in effect and fixed in your process.

Releases

At appropriate times, we mark the code with a tag such as "v7_4_1". This marks a state of the code we want to save. Maybe we are recording the state of code at a point where we make a production run for simulation, or maybe we are marking major changes in the code. Take look at the release list and the notes for a recent release.

After tagging, we build the code and save it so anyone can use it - this is "release".

If you are in the tutorial container,

ls -l /Offline

If you are on a mu2egpvm machines:

ls -l /cvmfs/mu2e.opensciencegrid.org/Offline | tail

If you wanted to setup this fixed release.

source /setupmu2e-art.sh
source /Offline/v7_4_1/SLF6/prof/Offline/setup.sh

If you are on a mu2egpvm machines:

source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh 
source /cvmfs/mu2e.opensciencegrid.org/Offline/v7_4_1/SLF6/prof/Offline/setup.sh

Since we setup the code above we can't also setup these versions of the code. In order to do that, we need to start a new process, so either start a new tutorial container, or login into the mu2egpvm machines again.


Partial Builds

When we started building the code in the exercise above, we killed it because it would take too long. On a single node it will take an 45 min, and on mu2ebuild01 with its 16 cores, it will take 10min. This is a common problem, so we have developed two ways to build part of the code.

In both methods we need a fully built "base release" that already exists, such as the ones we just looked at. We then make a local directory with just a small piece of the code. We build this small piece, and then take all the rest of the compiled code from the "base release".

In both methods you must be aware of the possible pitfalls. If any header file that is compiled into your local small piece of code is different than the same header file compiled into the base release, then the code will have corrupt memory and will probably fail in unpredictable ways.

The first method is called Satellite releases. You can create one now



  • pgit
  • setup a satellite release
  • create a partial-checkout release of a single package from Offline