GitIntro

From Mu2eWiki
Jump to navigation Jump to search

Introduction

This is an introduction to git as we use it in mu2e. It is general discussion of git concept and procedures. Please start with the procedures linked from git top page, especially for committing code.

git is a Source Code Management System that is more powerful than earlier systems such as cvs or svn. As a new user of git in the Mu2e environment you won't use any of the advanced features and git will feel like verbose version of cvs or svn. As you gain more experience, the power and utility of the advanced features will become clear. Source code management systems are also called Version Control Systems or Revision Control Systems.

The Mu2e Offline git repository is hosted on the Fermilab redmine server.


Getting Help

Mu2e maintains a page that has links to code management systems; this includes both general git information and Mu2e specific information. The most complete resource is on git itself is the git website: http://git-scm.com/doc

We strongly recommend that you read chapters 1 through 3 as soon as is practical

The built-in git documentation can be accessed using either of the following commands (using the git clone command as an example ):

git help clone

or

man git-clone

The command:

git status

gives useful reminders and suggestions. For example it reminds you to commit uncommitted changes. It also tells you how to back out of some operations, such as git add, git rm or git rename.


Getting git into your environment

If you are logged into the interactive machines, you should begin every Mu2e login session with the command:

 setup mu2e

This t adds git to your PATH. For those of you who are familiar with UPS, git is found as a UPS product in: If you are doing Mu2e work at a non-Fermilab site, consult an expert at that site to learn the command that does the equivalent of setup mu2e. If you need to install git yourself, see http://git-scm.com/downloads . To see which version of git you are using, type the command

git --version

When you are reading git documentation, pay attention to version numbers mentioned in the documentation. git has many safety features, some of which are on by default and some of which must be invoked explicitly. In general, the higher the version of git, the more safety features are available and the more that are enabled by default.


Configuring git

The first time that you use git, run the following commands:

git config --global user.name "Your Name" 
git config --global user.email you@example.com
git config --global push.default current

The values will be stored in $HOME/.gitconfig file, so configuration needs to be done only once per home directory. The first line tells git who you are so that your changes to the Mu2e code can be properly labeled; git uses the second line to send you email if you have signed up for notices; the last line changes one of git's defaults to a more intuitive, and safer, behaviour.

Getting a Copy of the Mu2e Offline

You will need 10's of MB to checkout the Offline code and GB to build it. You would typically work on the app disk

/mu2e/app/users/$USER

To get a read only copy of the Mu2e Offline code, cd to a working directory and type the command:

 git clone https://github.com/mu2e/Offline

To get a read/write version use the GitHub workflow instructions.

Outdated instructions for Redmine
To get a copy of the Mu2e Offline code, cd to a working directory and type the command:
git clone ssh://p-mu2eofflinesoftwaremu2eoffline@cdcvs.fnal.gov/cvs/projects/mu2eofflinesoftwaremu2eoffline/Offline.git
If git tells you that you do not have permission, you can get a read-only copy of the code using the command:
git clone http://cdcvs.fnal.gov/projects/mu2eofflinesoftwaremu2eoffline/Offline.git
The ssh url allows you to both read from the repository and write to it; the http url allows read-only access. If you do not have permission to use the ssh url, you need to check two things:
  • [ComputingLogin|Check] that you have a valid kerberos ticket and that it is forwardable.
  • Make sure that you have been added as a member of the mu2e group in redmine. This shoudl happen when you get your orginal login account

First Look at the Layout

The git clone command created a single subdirectory, named Offline, in your current working directory. cd to that directory and look at its contents, including the files that begin with a dot:

cd Offline
ls -a

Most of the files that you see are directories that contain Mu2e code or are files that are part of the Mu2e build system. The two exceptions are

  • The subdirectory .git
  • The file .gitignore

The subdirectory .git contains a complete copy (a clone) of the information stored in the Mu2e GitHub (or Redmine) repository; that is, it contains a complete history of the Mu2e code from the beginning of Mu2e up to the time that you issued the git clone command. If someone later modifies the GitHub repository, the clone in your .git file will not be modified until you ask for it to be updated. The directory tree rooted at .git is large, 10's of MB.

The .git subdirectory is sometimes called your local repository while the GitHub repository is sometimes called the remote repository. Actually, the GitHub repository is just a remote repository, not the remote repository. Git is able to deal with multiple remote repositories, as explained in the remotes section. This feature exists because git is a distributed version control system. For our use case, we will still use the GitHub (or Redmine) repository as the unqiue, authortative central repository.

The file .gitignore tells git to ignore files that match certain patterns; see docs for more details. If you wish to add additional patterns please make a proposal to the Mu2e Offline software team. You may also define a personal .gititnore file that is not seen by other people and which applies to all git projects in which you participate; by default, this files lives in ~/.config/git/ignore; see gitignore for details.

The other files and directories that you see in your Offline directory, and recursively down the through the directories are called your working tree.

About Branches

The next section of this page presumes a minimal knowledge about git branches; this, in turn, requires a minimal knowledge of git commits. This section will endeavor to give you the minimal information that you need to make sense of the remaining material. We strongly encourage you, at the first practical opportunity, to read Chapters 1 to 3 from git docs ; this will give you a good understanding of git commits and git branches. The more you understand from these 3 chapters, the easier the rest of the material will be.

The fundamental unit of git management is called the commit. Suppose that you clone an existing git repository, edit two files and tell git to commit the change. The action of doing a commit stores a copy of the two modified files somewhere under the .git directory (read the git documentation if you want details). git also creates a new internal git object, called a commit, that contains:

  • A "snapshot" of all files that you would have to checkout to recover this commit; not just the two files you committed with this commit, but all files that are part of working tree and are already committed, either as part of this commit or as part of an earlier commit. This snapshot is not a copy of the files but a compact representation of where to find the files; the files actually are found somewhere under the .git directory.
  • A 40 hex-digit hash code of the content of all files that participate in 1); this hash code is the name of the commit.
  • The hash code of the parent commit; the parent commit describes the state of the local repository just before you started to make changes.
  • Other bookkeeping information that you can look up in the git manual.

A git repository is nothing more than a graph of commit objects where the word graph used in the sense of topology. The following figure shows a cartoon of a very simple git repository:

caption


In this figure, the red outline boxes represent commits and each commit has been given a mnemonic name, c1 through c13; in reality git commits are named with 40 hex-digit hash codes. In this figure the arrows that connect the commit boxes denote parentage. The repository begins with commit c1, which has no parent. Commit c1 is the parent of the commit c2. The commit c3 is the parent of both commits c4 and c6. The commits c6, c7, c8 form a branch. This branch is merged back into the main branch at commit c5; note that commit c5 has two parents, c4 and c8. The commit c5 is known as a merge commit. Finally commits c10 and c11 form another branch; this branch is not merged back into the main line; maybe it will merge in at a later date or maybe it won't; both are legal.

The items in the figure that are shown as solid red boxes denote the names of git branches. A git branch is just a move-able, lightweight pointer to a commit; you can think of the present value of a branch as an alias for the 40 hex-digit hash code that is the name of a commit. In the figure the git branch names point to their matching commit. Suppose that we were to start from commit c11 and make a change to some files; if we committed those changes, git would make a new commit object, whose parent is c11 and it would move the branch "feature2" to point at the new commit object. Whenever you do a commit, git automatically advances the appropriate branch.

If you take git out of the box and do not modify any of its defaults, it will contain a branch named main. Many git users, including, Mu2e use git in such a way that the head of the main line of development is always at the head of the main branch.

The last idea illustrated in a figure is a git tag, which is shown as the solid blue box. As for a branch, a tag is just a pointer to a commit. The difference is that once it is seated, a tag stays put; it is not advanced by future commits.

You may have noticed that this section overloads the word branch; it has two different meanings. One meaning is the "move-able, lightweight pointer to a commit". The other use is "a set of commits connected by parent-child relationships", such as (c6,c7,c8) or (c10,c11). This is common usage and you will need to learn to distinguish the two meanings from their context.


About remotes

The next sections also presumes a minimal knowledge about git remotes. This section will endeavor to give you the minimal information that you need to make sense of the remaining material. We strongly encourage you, at the first practical opportunity, to read Chapter 2.5 from the git docs.

Remotes in git are basically links to other repositories for the same code. For example, when you perform the basic clone of the repository on GitHub, it will automatically create a remote called "origin" and point it to the GitHub repo. You can check what remotes exist after creating a git repo using

 git remote -v

If you clone the GitHub repository twice into two different directories, Offline-1 and Offline-2, you can have Offline-1 create a remote that links to Offline-2 using the command

 git remote add offline2 /path/to/offline-2

When you add a remote, your repo gains no information about what it has been given a link to besides the location. To actually copy over a list of the commits and branches that exist in the remote repository, you must do a fetch command.

 git fetch <remote name>

If you now do

 git branch -r

you will get a list of all the branches that exist on all the remotes you have fetched from (but note they will only be as up to date as the last time you fetched). Similarly, you can now merge any remote branch into your local branch using the command

 git merge <remote name>/<remote branch name>


Branches in the Mu2e Offline Repository

Now issue the git command that lists all branches in the .git subdirectory:

git branch -a


For a clone done in late 2017, this produced the output

* main
  remotes/origin/HEAD -> origin/main
  remotes/origin/CaloDigi0816
  remotes/origin/CaloGeom
  remotes/origin/GenVector

The branches of interest to us are main and remotes/origin/HEAD. The rest represent temporary development work. All of these branches, including their full history, are cloned in the .git subdirectory. The line

 remotes/origin/HEAD -> origin/main

tells git that, when someone clones the repository, git should:

  • Create a new local branch named main and initialize it to point at the same commit as remotes/origin/main
  • Create a working tree that contains the files that are part of the local main branch

The line:

* main

says that the local repository contains a local branch named main; we can tell it is a local branch because it's name does not begin with "remotes". The asterisk beside main tells us that our working tree is a checkout of the local branch named main. If you have git colorization enabled the branch that is checked out will be in a unique color; if you do not have colorization enabled, but would like to, see config.

You can list only the local branches by removing the -a option from git branch command:

git branch

which produces the output

* main

It was mentioned earlier that git allows your local repository to be aware of multiple remote repositories and to copy branches from them to your local repository. Any branch that begins with "remotes" is a remote branch. The next field in the name identifies which remote repository it came from. The name "origin" is a shorthand for "the place from which I first made the clone"; you can see its definition in the file .git/config . The last field in the name of a remote branch is the branch name, proper.

One of the most important git commands is,

git status

If you issue it right after a clone it will produce the output

# On branch main
nothing to commit, working directory clean

The first line tells us the same information as did the asterisk beside main in the previous output listing, that the working tree is a checkout of main. The next line tells us that we have not made any changes to the working tree; if the working tree exactly matches its corresponding local branch then it is said to be clean.

Among other things, git status will tell you if you have uncommitted files in your working tree. It is important to watch for these since your work can be lost if you issue a git checkout command without committing your changes. If you are working on the local main branch, git status will also let you know if the it is ahead or behind the corresponding local tracking branch. Your local branch is ahead of the local tracking if the local branch contains commits that have not yet been added to the local tracking branch. Your local branch is behind your local tracking branch if your local tracking branch contains commits that have not yet been added to the local branch.

Use git status regularly to verify that the state of your work is indeed what you thing it is.

For most (all?) git commands, the branch named remotes/origin/main can be called simply origin/main; the rest of this documentation will use the shorter name.

All of the branches whose names begin with "remotes/origin" are called local tracking branches; they track the state of a branch in a remote repository. In particular the branch origin/main is also known as the local main tracking branch.


Changing and Committing Files

Please note that this tutorial describes the conceptual process of committing changes. The mu2e committing workflow must be followed
when committing code to the central mu2e repository.

The general pattern below is to make a modification, then "commit" it to your local respository. In these examples, you would be committing the changes to the current local branch.


Branching

The Mu2e software team recommends that, when you wish to change the repository, you first checkout a local, temporary, working branch and commit your changes to that branch.

This is described on the workflow pages for Mu2e. The reason for this recommendation is that will create a commit history that is much easier to navigate.

After the git cone, to create a local branch:

git checkout -b work

Here, "work" is the name of the local branch, which is arbitrary. The command copies the head of the repository to your working area and, internally,labels and tracks it with the local branch name.

Editing a file

Edit the file using your favorite editor. If you give the git command:

git status

the output will show that the file you just edited has uncommitted changes.

Committing a file

To commit the change to your current branch, use the following git command:

git commit -m "Your commit comment goes here." filename

If you do not specify the -m option git will open an editor session in which you can type your comments. You can control the editor that git chooses for you; git inspects the following resources, in the specified order, and the first one that is defined wins:

   GIT_EDITOR environment variable
   core.editor variable in $HOME/.gitconfig
   VISUAL environment variable
   EDITOR environment variable
   vi

In case you get stuck in vi, here is a vi cheat sheet.

If you change many files, you may choose to commit each file individually; you may choose to commit all files at once; or you may choose to commit the files in several groups. The Mu2e software team recommends that you commit files in related groups. For example, if one logical change touches 3 files, commit all three as a single commit; in this case the commit comment should focus a big picture statement of what the change does, not on the individual changes to each file - we can use git diff for that.

To commit several files at once:

git commit -m "Comment"  file1 file2 file3

To commit all uncommitted changes in the working tree:

git commit -m "Comment" -a

All of the above comments apply to commits for any reason: edited file, new file, deleted file, renamed file. But they will not be repeated in those sections.


Creating a new file

Use your editor to create the file and its initial content. If you give the command

git status

you should see your file is on git's list of untracked files. The next two steps are:

git add filename
git commit -m "Comment" filename

If you do a git status following the git add but before the git commit, you will see your file on the list of files that have been added but not committed. If you decided after the add but before the commit that you want to undo the add operation, you can find the command to do that in the output of git status! The answer is:

git reset HEAD filename


Deleting a file

Delete the file from your working tree using the Unix rm command. If you give the command

git status

you should see that the file is present on the list of deleted files. To tell git that the file should be deleted:

git rm filename
git commit -m "Comment" filename

If you have removed the file, but not yet issued the git rm command, you can recover the file by:

git checkout -- file

The git status command has a reminder about this last command.

If you later need to access to the deleted file, you can checkout an earlier commit in which the file exists. When you do so, the file will appear in your working tree.

There is no option for a "one step" version of:

rm filename
git rm filename


Renaming files

To rename a file, git does all of the work:

git mv sourceFile destinationFile
git commit -m "Comment" sourceFile destinationFile

Note that you must commit both the source and the destination files. This action preserves the full history; that is you can start with destinationFile and follow the commit history back to the creation of sourceFile.


Adding directories

With git you cannot commit an empty directory; you must put at least one file in it before you can commit it.

To begin, create a directory and add one file to it:

mkdir dir1
emacs dir1/file1

Then issue the git commands:

git add dir1
git commit -m "new directory" dir1


Note that you did not need to add or commit dir1/file1 - this is automatically taken care of because dir1/file1 already existed at the time that command "git add dir1" was executed. If you add dir1/file2 after the add and before the commit, you will have to explicitly "git add" that file.

The first few times you do this, you should use git status after each git command to familiarize yourself with its output in these circumstances - its self explanatory.

Untracked Files

When you give the command

git status

it will sometimes tell you have you have untracked files in your working tree. Look at this list carefully, it may contain files that you intended to add but forgot to.

You may notice that editor backup files that end in the tilde character (~) are not present in the list of untracked files. Nor are object files (ending in .o or .os) or dynamic libraries (ending in .so) listed. These are not listed because they match one of the patterns listed in Mu2e .gitignore file. The Mu2e software team has supplied a .gitignore file that should result in a short list of untracked files. If you would like to add a new pattern to the .gitignore file, please suggest it to the Mu2e software team; we ask you to run this past the Mu2e software team because the pattern you wish to exclude may be important for other users.

Some General Comments

Whenever you are editing files in a git environment you will always be working on a git branch, perhaps the local main branch but usually a temporary local working branch. You should be aware of the possibility of 5 copies of any file that you are working on:

  • The copy in your working tree
  • The copy in your local working branch
  • The copy in your local main branch
  • The copy in the local main tracking branch
  • The copy in the main branch of the redmine repository
  • git checkout copies files from a branch to your working tree
  • git commit copies files from your working tree to your local working branch
  • git merge copies files from one local branch to another
  • git push copies files from your local main branch to both the local main tracking branch and to the redmine repository.
  • git fetch copies files from the redmine repository to the local main tracking branch
  • git pull does a fetch and then merges that branch into the local main branch

There is one more critical idea, git rebase, which will be described as part of the recommended workflow for Mu2e.

The recommended Git Workflow for Mu2e will give a recommended way of using these 6 commands to ensure that the commit history of the repository is simple and is easy to understand. This workflow is constructed so that all conflict resolution is done during rebase operations, never in any of the others.

One of the design features of git is that git commit can never generate a conflict. Commit regularly.

detached HEAD

At times when you issue a checkout command, you may find a warning saying you are in a "detached HEAD state". A detached HEAD is a commit that is not on the head of a branch.

git checkout <tagOrCommit>

It is warning you that if you try to commit code here it might get lost in a strange state. If you are only interested in reading or using the code, you can ignore this warning. If you need to make and save changes, you must be on the head of a branch and your commits can go there, on the head. This is easy to do - when starting a project that will requiring commits, please just follow the commit workflows.

Conflicts

Any source code management system has to deal with the following situation:

  • You have cloned a repository, edited some files, added some files and deleted others (and committed ).
  • You would like to return your modified repository to the redmine repository
  • Before you have a chance to send your work to the redmine repository, someone else has modified the redmine repository

This situation is called a conflict. There are several flavors of conflict:

  • The files that you have modified are disjoint from the files that the other person has modified.
  • Both you and the other person have modified some of the same files but in each file the changes are at widely separated places.
  • Both you and the other person have modified the same lines in the same file.

Git's default behavior is this:

  • In cases 1 and 2 git merges the two sets of changes
  • In case 3 git gives up and asks you to fix the file before telling git to continue.

When git gives up it writes both versions of the conflicting text to the file, delimted by conflict markers:

<<<<<<< HEAD
     version of the code from your file
=======
     version of the code from the other file
>>>>>>> b600c43a1af8fb632679c221a71c689c132e25fd

Your job is to ensure that the correct code is in place and to remove the conflict markers; this may involve consultation with the other author.

When you have conflicts, the merge or pull is partially complete. As soon as you can see the conflicts, you can back-out by issuing:

git merge --abort

This is probably the right move if the conflicts are complex. You can use git diff to investigate and update you branch to the code you want as the result of the merge.

Once you understand the situation, you can continue. Or if the conflict is minor, you can follow the pattern here without backing-out first. First run the merge (or pull) and get the conflict message:

git merge <branch>

at this point you are in the middle of a merge, but you are also preparing a commit, since the result of the merge is a commit. git knows you are in this state, so you must abort the merge or fix and commit at this point before proceeding with any more development. Because you are preparing a commit, you can fix the conflicts and run

git add <files>

and they will become part of the commit. This is the main method to resolve conflicts. If you open the file you will see the conflicting code, as above, and can edit it, and add the file. Once you have the conflicted files fixed and added to the commit, you can continue with the merge:

git merge --continue

and the result will be a merge commit, which can be handled like any other commit.

Browsers of Repository History

There are many tools for getting a graphical view of the history of a repository. Two of them are:

   gitk
   SourceTree

If you know of other tools, please add them to this list and describe them briefly below.

If you are working at Fermilab, gitk is distributed as part of the git UPS product and is available after you "setup mu2e". To use gitk, cd to your Offline directory and:

gitk --all&

If you are at a non-Fermilab site consult with whoever supports your git installation.

SourceTree is only available for Windows and Mac OSX, not for Linux variants. You can download it from their website: http://www.sourcetreeapp.com