what does git do? how do you start using git? an aside about github

pa g e 1 of 15 · git · leandro facchinetti
git
Lecture notes
Leandro Facchinetti ⟨[email protected]⟩
Object-Oriented Software Engineering
Johns Hopkins University
2016-09-19
the goal of this lecture is to answer three questions: What does Git do?
How do you start using Git? Where do you learn more about Git? The
intention is not to turn you into a specialist, nor to have you memorize
commands, but to teach you the basic underlying concepts and how to
perform the most used operations. After the lecture and reading these notes,
you should know all necessary to work on the group project.
Accompanying slides are available at:
pl.cs.jhu.edu/oose/resources/git.m4v.
w h at d o e s g i t d o ?
g i t s o lv e s the problem of version control. The name might be unfamiliar,
but problem is not: keeping track of the changes on a project as it evolves,
and sharing and collaborating on it with other people. The version control
problem arises whenever someone copies a file to modify it and compare
versions, sends a project as email attachment, or loses work due to a
corrupted or accidentally deleted file. Git is a version control system, which
solves the version control problem by tracking the project’s history.
As more companies and free-software projects use Git, knowing it
becomes a valuable skill. It can also improve your personal life: version diary
entries, recipes, lecture notes or any kind of personal project. Not only you
will never lose work for technical issues again, but you will also have a rich
history of the progress and more freedom to experiment, exploring and
comparing ideas on different versions the project.
I use Git with almost everything I do on the
computer: from keeping my favorite vegan
recipes to the preparation of this lecture.
The person below does not.
how do you start using git?
there are two main ways to learn Git: install a Graphical User Interface
(g u i ) that acts as a front-end for Git, and click around; or learn how to use
the Command-Line Interface (c l i ) that comes with Git. For daily usage, a
g u i can be more productive, but the lecture is based on the c l i because it
is closer to the underlying concepts. Also, sometimes the g u i does not
support the an operation, so it is important to learn the c l i even if not
using it most of the time.
The following sections cover the most common use cases for Git. First
the underlying concepts, then practical examples on the command-line. The
use cases are divided in three overarching goals: (1) using Git alone on a
local computer; (2) using Git with multiple remote computers; and (3)
using Git for collaboration.
Most of my use of Git is through a GU I that
comes as an extension to my text editor—
Magit, in Emacs. If your preferred editor has
Git support, give it a try. Otherwise, use a
stand-alone Git G UI. See
git-scm.com/downloads/guis.
an a s i de a b ou t githu b
i t i s c o m m o n for people to use the words Git and GitHub
interchangeably, but they are not the same thing. Git is a tool and GitHub is
a service provided by a company that facilitates the use of the tool. This
class covers both Git and GitHub—which is the service the staff chose for
the course—and it is important to know the difference.
Git is to email as GitHub is to Gmail. It
makes the use easier, but is not essential. It
is possible to send emails from providers
other than Gmail, and it is possible to use
Git without GitHub.
pa g e 2 of 15 · git · leandro facchinetti
an a s i de a b ou t git comm an d s
g i t o p e r at i o n s are available through the git executable. The format
of the command lines resemble natural speech: start with the sentence “Git,
please add the file cookies.txt to the index;” then note its essential parts:
“Git, please add the file cookies.txt to the index;” finally, remove the rest
to write the command:
By convention, command lines are prefixed
with $, commentary with #, and verbs
and objects-and-options are
highlighted.
$ git add cookies.txt
In general, a Git command follows the pattern:
$ git verb objects-and-options …
l o c a l s et u p
one of git’s features is to keep track of who did which work. In order
for it to do that, you have to identify yourself by running the following
commands that alter configuration files:
$ git config --global user.name "Bugs Bunny"
$ git config --global user.email "[email protected]"
there are files created by the operating system, text editors and
other tools that should not be under version control, as they are not related
to the project. For example, Apple’s os x creates .DS_Store files to store
custom folder attributes. To teach Git that it must ignore such files:
$
$
#
$
echo ".DS_Store" >> ~/.gitignore_global
echo "*.text-editor-temp" >> ~/.gitignore_global
…
git config core.excludesfile ~/.gitignore_global
first git command
the single most useful Git command asks it to report what it
thinks the world looks like. Run it after setup is complete:
$ git status
fatal: Not a git repository (or any of the parent
directories): .git
The result is a fatal error: Git cannot find a repository. The next section
explains what a repository is, and how to create one.
r e p o s i t o ry
start by th inking of the analogy that the operating system g u i
makes: work on sheets of paper on the desktop and organize them in
folders. Suppose it is necessary to keep track of the history of a project: in
the physical world, one solution is to copy the papers after changes. But that
results in a lot of paper—to manage it, one could group the pieces that
belong together on a paper tray and put the sheets in boxes, label the boxes
and store them in a cabinet. To find the boxes later, keep index cards, similar
to those in libraries. Finally, to distribute your documents, use a
fax machine.
Installation procedures are different
depending on the operating system. Go to
the office hours to get individual assistance
if you are having trouble installing Git on
your machine.
It is important that you choose an email
address that you will own forever.
Institutional emails are bad choices
because, after the affiliation ends, the email
address could be reassigned and the new
owner would gain credit for all work
associated with it.
This is more of an issue on contributions for
public projects, but it is cumbersome to
have multiple profiles and distinguish
between personal work and institutional
work. So, unless an institution insists on the
use of their email address, avoid it.
pa g e 3 of 15 · git · leandro facchinetti
Git extends the computer’s file system with equivalents of a cabinet,
paper tray and fax machine, and provides boxes, labels and index cards. All those
analogies are covered in the following sections; for the moment, it suffices
to know that Git calls working directory the folder in which the project lives,
and the cabinet is the repository.
Git as an extension to the desktop
metaphor. The working directory is the
existing folder. The new elements are a
cabinet, a paper tray, labeled boxes of
changes, a Rolodex of index cards and a
fax machine.
On the command line, create a new folder to contain a project and a new
repository in it:
$ mkdir recipes
$ cd recipes/
$ git init
Initialized empty Git repository in …/recipes/.git/
Git created a hidden folder called .git in the project’s directory. It is the
cabinet—use Git commands to modify its contents, do not do it manually.
The status has changed:
$ git status
On branch master
Initial commit
nothing to commit (create/copy files and use "git add" to
track)
Git is no longer complaining about the repository not existing, but the
output mentions two unknown concepts: branches and commits. The next
sections address those terms.
fine points about repositories
i t i s u p to debate where to draw the line when creating a repository. A
project that is composed of a front-end and a back-end should be in a single
repository, separated in two directories, or in two repositories? Practical
matters such as keeping changes in sync and using tools that integrate with
version control come into play—there is no right answer. To help on the
decision, keep in mind that creating repositories is cheap and easy, so they
may contain as much as a single file, if it stands on its own.
A hidden folder is one whose name starts
with a dot. It receives the name because file
browsers usually do not show it, but it is not
not special in any other way.
pa g e 4 of 15 · git · leandro facchinetti
commit
c o n t i n u i n g w i t h the office metaphor, a typical workday looks like:
work on documents, make copies—do not use the originals, to allow for
history tracking—and organize them in a paper tray, put the group in a box,
label the box with information that helps finding it later, store the box in the
cabinet and make a note about it on the index card. The Git workflow is
similar—the paper tray is is called index or staging area; the box is a commit; the
box label is the commit message; and the cabinet is the repository. The index cards
are references, they are subject of a later section.
On the command-line, start by doing some work and check the status:
$ echo 'Delicious recipe' > vegan-cookies.txt
$ git status
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be
committed)
vegan-cookies.txt
There is a gross over-simplification is in the
metaphor. Storing whole copies of files on
the boxes over and over would waste
resources, because most of the content
remains the same. So Git does not work
with the concept of files, but that of
changes. A change can be the addition or
deletion of a line on a file, the creation of a
whole new file, and so on.
That is why the boxes are depicted with
modifications, not files, in them. To
recreate a point in the history, Git follows a
sequence of boxes and replays their
changes—either forwards or backwards.
nothing added to commit but untracked files present (use
"git add" to track)
Git is saying that the file vegan-cookies.txt is untracked—i.e., Git has
not been introduced to the file, it has never been in the cabinet. Git also says
what to do next:
$ git add vegan-cookies.txt
$ git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file:
vegan-cookies.txt
Now the file is on the paper tray, the staging area. Git teaches how to remove
it from there, to unstage. But the changes are fine, so proceed to put them in
the box that is going to the cabinet:
$ git commit
# Write commit message in text editor.
".git/COMMIT_EDITMSG" 10L, 250C written
[master (root-commit) cc5d34f] Add cookie recipe
1 file changed, 1 insertion(+)
create mode 100644 vegan-cookies.txt
$ git status
On branch master
nothing to commit, working directory clean
Commit (noun): The box with changes.
Commit (verb): The act of creating a box
with changes.
On most machines, the default text editor is
Vim. It is possible to configure a different
editor with:
$ git config --global
core.editor "text-editorexecutable"
Git repositories are append-only. I.e., once a
commit is in the repository, it is there
forever. However, it is possible to lose the
access to a commit. This is covered on a
later section regarding references.
pa g e 5 of 15 · git · leandro facchinetti
When issuing git commit, Git is going to open a text editor. Use it to
write a message describing the commit, it is going to compose the box label
along with other information such as your name and email—configured on
setup—, the current time and which was the previous box on the chain. Once
closing the text editor, Git finishes the commit—closes the box, puts it in the
cabinet, updates the index cards, and gets ready to start over.
fine points about commits
g o o d c o m m i t m e ss a g e s are important when looking for a particular
commit in the history. The same way you would not keep your cabinet
messy, you should inform the reader of the commit message not only what is
in the commit, but also why it exists. Write about the motivating problem,
how the changes solve it, what would be alternative solutions and where to
find more information. It is common for the commit to change a single line
on the project and for its message to be several pages long.
The convention for writing Git commit messages is to start with a 50characters long title, leave an empty line and write prose wrapped on 78
characters. This format allows history-visualization tools to better show the
repository contents. For example:
Add vegan cookie recipe
The fact that a commit stores information
about which were the previous commits in
the history is fundamental. This way, it is
only necessary to have a reference to one
commit, and from it the whole history can
be retrieved by traversing each commit and
following the information in it. Keep this in
mind when reading the later section
regarding references.
Do not confuse the index—the metaphorical
paper tray, a Git concept also called staging
area—with the index cards from the
metaphor, which are Git references.
The flax-seed meal mixture is called flaxeggs. The recipe is real—and it tastes great!
After several experiments, we discovered the best
replacement for eggs in vegan cookies is flax-seed meal
mixed in water. Before that, we tried …
To allow for this level of commit-message quality, it is necessary to group
the changes that belong together. They might be a single line or come from
files on the whole project. Commit early, commit often, do not wait for many
changes to accumulate.
av o i d c o mmi tti n g code that does not compile or does not pass the
test suite. This confuses the readers and renders unusable a Git feature that
finds the commit that introduced a bug—git bisect.
Adding all the changes on a file to the index
is not the only—nor the best—way to
organize changes. It is possible to select
line-by-line what goes in the commit.
git add --interactive and
git diff allow for this level of precision,
but a G UI is better at the task.
these are high standards of commit quality. At first, focus on
getting the basics right, then work the way up to following the rules. Start
by committing all the time; when comfortable with Git, learn how to rewrite
history and craft better and better commits.
r e a d h i s t o ry
the point of carefully keeping track of the project’s history is to read it
later. The simplest way of doing that is:
$ git log
commit cc5d34f5a53278aba79dd056ebd560d1db13da01
Author: Leandro Facchinetti <[email protected]>
Date:
Wed Sep 14 14:27:28 2016 -0400
Add cookie recipe
Visualizing history is another task in which
G UI shines. Text alone is limited.
pa g e 6 of 15 · git · leandro facchinetti
This command shows the latest commits, with their messages, authors and
unique identifier string. To see the details of a commit, including the
changes that went into it:
$ git show cc5d34f5a53278aba79dd056ebd560d1db13da01
commit cc5d34f5a53278aba79dd056ebd560d1db13da01
Author: Leandro Facchinetti <[email protected]>
Date:
Wed Sep 14 14:27:28 2016 -0400
Add cookie recipe
The unique identifier string is also called
SHA -1, after the hashing algorithm used to
generate it.
Any prefix of the unique identifier string
that remains unique works as identifier as
well. This allows cc5d34f5a53278ab… to
be abbreviated to cc5d34f5, for example.
diff --git a/vegan-cookies.txt b/vegan-cookies.txt
new file mode 100644
index 0000000..faa3136
--- /dev/null
+++ b/vegan-cookies.txt
@@ -0,0 +1 @@
+Delicious recipe
another approach to reading history is starting from a file and
asking what were the modifications that led into it. This might help finding
a bug by pointing the commit that introduced a suspicious line—the commit
contains a time stamp, a message written by the author and the identity of
the person. Because this might start fights, the name of the command is
git blame:
$ git blame vegan-cookies.txt
^c8917a9 (Leandro Facchinetti 2016-09-14 14:27:28 -0400
1) Delicious recipe
r e t r i e v e h i s t o ry
once reading the history reveals a commit of interest, it is possible to
open the box and put a copy of its contents on the desktop. I.e., it is possible to
go back in time on the project and have the working directory reflect what it
was at the time of a commit:
$ git checkout cc5d34f5a53278aba79dd056ebd560d1db13da01
Note: checking out
'cc5d34f5a53278aba79dd056ebd560d1db13da01'.
Because git checkout changes the
working directory, it has to be clean—i.e.,
no files changed. Use git status to
check and commit if necessary. If worried
about maintaining a neat history, rewrite it
later or learn about git stash.
You are in 'detached HEAD' state. You can look around,
make experimental changes and commit them, and you can
discard any commits you make in this state without
impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you
create, you may do so (now or later) by using -b with the
checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at cc5d34f... Cookie recipe
It is also possible to
git checkout file-name, in which
case only that particular file is affected—it is
restored to the condition it was on the last
commit. This is helpful when finishing a
quick experiment on a file and wanting to
discard the changes right away, leaving no
trace behind.
pa g e 7 of 15 · git · leandro facchinetti
Now the files on the working directory are the same as they were at the time
of this commit. But Git also output a warning about a detached HEAD, which
is a bit gross. The next section explains why there is nothing to worry about.
reference
a reference is a pointer to a commit—the index card on the
desktop metaphor. HEAD is a special reference pointing to the commit that
represents the state of the working directory. Git updates this reference on
git checkout, git commit and many other operations. The HEAD can
point directly to a commit or do so indirectly, via a branch—more on
branches on the next section. The former is the detached HEAD state.
If the plan is just to look at the files—or compile and run them—, then
being on detached HEAD state is fine. On the other hand, when modifying
the files, it is better to have another reference pointing to the current
commit. Otherwise the work might be lost upon the next git checkout,
when HEAD points to another commit. One other kind of reference available
in Git is the branch, the subject of the next section.
git checkout HEAD does nothing.
branch
a branch i s a named reference—thus, a pointer to a commit as
well. When a repository is created, there are no commits—the cabinet is
empty—so there are no branches. Then, upon the first commit, Git
automatically creates a branch called master, pointing to the first commit.
From then on, whenever there is a new commit, Git advances the branch
along with HEAD.
To create a new branch:
$ git branch brownies
$ git branch
* (HEAD detached from cc5d34f)
brownies
master
$ git status
HEAD detached at cc5d34f
nothing to commit, working directory clean
The created branch points to the commit at which HEAD is pointing at the
moment, but it does not automatically associate HEAD with it. That requires
an explicit git checkout:
$ git checkout brownies
Switched to branch 'brownies'
From now on, when committing, the brownies branch is advanced, but
master is maintained where it was:
$ echo 'Chocolate is vegan' >> vegan-brownies.txt
$ git add vegan-brownies.txt $ git commit
# Write commit message in text editor.
[brownies 1b0657d] Add vegan brownies
1 file changed, 1 insertion(+)
There is nothing special about the name
master, it is just the default.
The idea that branches are just
references—not copies of the whole
project, for example—makes creating
them easy and cheap. This was one of
the features that set Git apart from other
version control systems and led to
its popularity.
Alternatively,
git checkout -b brownies creates a
branch and associates HEAD with it in
one command.
pa g e 8 of 15 · git · leandro facchinetti
create mode 100644 vegan-brownies.txt
$ ls
vegan-brownies.txt
vegan-cookies.txt
$ git checkout master
Switched to branch 'master'
$ ls
vegan-cookies.txt
$ git checkout -b cookies
Switched to a new branch 'cookies'
$ echo 'Less flour' >> vegan-cookies.txt $ git add vegan-cookies.txt $ git commit
# Write commit message in text editor.
[cookies 60a0d3d] Fix cookies recipe---less flour
1 file changed, 1 insertion(+)
$ cat vegan-cookies.txt Delicious recipe
Less flour
$ git checkout master
Switched to branch 'master'
$ cat vegan-cookies.txt Delicious recipe
$ git checkout brownies
Switched to branch 'brownies'
$ cat vegan-cookies.txt
Delicious recipe
$ ls
vegan-brownies.txt
vegan-cookies.txt
Only when the brownies branch is checked out the vegan-brownies.txt file is available on the working directory. Similarly,
only when the cookies branch is checkout out the fix to the cookies recipe
is available. This independence allows work to occur concurrently on
different branches: test an idea on a branch, check another branch out, fix a
bug on it, and so on. The history starts to look like a tree of commits, thus
the name branch.
A tree—sort of—beginning to form. On
these illustrations, the commits point to
their parents and the tree grows upwards
as new commits happen.
If familiar with Directed Acyclic Graph
(D AG ), then it helps to think of the
repository history as one.
It is as easy to delete a branch as it is to create one:
$ git branch non-vegan-recipe
$ git branch -d non-vegan-recipe
Deleted branch non-vegan-recipe (was cc5d34f).
Besides the regular operations that update branches to point to other
commits, it is possible to force a branch to point to a particular commit:
pa g e 9 of 15 · git · leandro facchinetti
$ git checkout -b moved
Switched to a new branch 'moved'
$ git reset --hard 60a0d3d
HEAD is now at 60a0d3d Fix cookies recipe---less flour
git reset is the first example of
potentially destructive command.
Double-check the working directory is
clean before running it.
fine points about branches
beware that, even though Git repositories are append-only, if there are
no references to a commit—or one of its children, as commits know their
parents—, there is no way to retrieve it. So, before running a dangerous
command—such as the ones that rewrite history, covered on a later section—
it is advisable to create a temporary branch and delete it later.
In case of emergency, to retrieve a
commit that was recently checked out
but has no other references, try
git reflog.
g i t i s a f l e x i b l e t o o l by design. This means it can adapt to
different workflows, but it also means that it can be hard to find guidance on
how to start. One of the aspects that can be confusing is to what warrants
the creation a branch. The most common practice is to create a branch for
each feature, bug fix, idea or exploration. Keep in mind that branches can be
created from other branches arbitrarily—but avoid having complex
workflows and branching structures that get in the way of the actual work.
If the repository is huge—gigabytes in
size—and storage space is a concern,
there are garbage collection and
compression routines that delete
inaccessible commits forever.
ta g
a tag is a named immutable refere nce. It is similar to a
branch, except that it forever points to the commit on which it was created.
Tags are useful on software releases, for example.
Creating a tag is similar to creating a branch:
Tags can be digitally signed with GPG to
guarantee the precedence of the
released code.
$ git tag cookbook-1.0
merge
w h e n t h e w o r k on a branch is complete, it is time to merge it back
into the main development line. The process can happen in one of two ways:
either only references are updated, or new commits are necessary as well.
The former happens when no work took place since the branch went off—
i.e., the merged branch is a descendant of the merging branch. Git calls this
fast-forward.
Fast-forward—a merge happened by
changing the commit pointed by the
master branch.
$ git checkout master
Switched to branch 'master'
$ git merge brownies
Updating cc5d34f..1b0657d
Fast-forward
vegan-brownies.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 vegan-brownies.txt
pa g e 10 of 15 · git · leandro facchinetti
$ git branch -d brownies
Deleted branch brownies (was 1b0657d).
If some work happened since the branch went off—i.e., the merged branch
is not descendant of the merging branch—, then there are changes on both
sides. They need to be reconciled, which requires the creation of a new
commit. There are two special characteristics to the commit resulting from a
merge: it has two parents and contains the changes from both of them.
After the merge, it is safe to delete the
merged branch to keep the
repository clean.
Merge that requires the creation of a new
commit—it has two parents and the
changes from both sides.
$ git merge cookies
".git/MERGE_MSG" 7L, 250C written
Merge made by the 'recursive' strategy.
vegan-cookies.txt | 1 +
1 file changed, 1 insertion(+)
$ git branch -d cookies
Deleted branch cookies (was 60a0d3d).
fine points about merges
w h e n t h e c h a n g e s on both parent commits are around the same
lines of the same files, Git is unable to automatically reconcile them. A
conflict happens and manual intervention is required. git mergetool
integrates with text editors to show the differences and allow the resolution.
Conflicts on some kinds of files can be hard to resolve; for example, binary
files and big xml files generated by programming tools—e.g., iOS
storyboard files from XCode. The best strategy is to coordinate the work on
those files in a manner to avoid conflicts arising in the first place.
the right moment to merge is another issue open to debate. Teams
differ on their notions of ready—some only require working code, others
insist on tests and documentation. The recommendation is to avoid longrunning branches that progress independent of the main line of
development. They results are merge conflicts and frustration.
r e w r i t e h i s t o ry
as p reviously stated , repositories are append-only, but developers
are fallible, and it is common to have to rewrite some of the history. Git’s
solution is to create new commits with the modifications and update the
references accordingly—in effect, it looks like history has changed. It is
possible to arbitrarily manipulate the repository’s history tree, but there are
two risks to consider: the first is to lose all references to original commits
Having to coordinate the work on a few
files is annoying and defeats part of the
purpose of using branches. But it is a
necessary evil because these files, by
their nature, are hard to handle. At least
the problem is localized on a few files—
most files on most projects are text-only
and tractable.
pa g e 11 of 15 · git · leandro facchinetti
that are still useful—this can be mitigated by creating branches before
rewriting. The more serious concern is when working with other people:
collaborators may have based their work on the commit before the change in
history, which would void their commits. The solutions to this problem are
to never rewrite commits that are visible by other people—see later sections
about collaboration—, or to coordinate the changes. Having separate
branches for each task helps isolate the work and minimize the issues.
The most useful history rewrite is to amend the last commit—either to
modify its message, or add or remove some changes. Use git commit --amend and Git rewrites the last commit instead of creating
a new one.
The next most useful history rewrite is to change the commit on which
a branch is based. This is necessary when working on a branch that is outof-date in relation to the main development line—it is a solution to the
long-running branch problem. The command to use is git rebase new-base.
As is almost always the case with Git,
there is a way to get out of the situation
in which an ancestor commit changes. It
involves git rebase, covered next.
Note that some history rewrite
operations have to conciliate changes
from multiple sources, so they are
subject to conflicts—similar to merges.
git mergetool is useful in this case
as well.
git rebase brings the branch up to
date with the main development line. A
fast-forward can happen if the new base
descends from the rebased branch—
similar to what happens on git merge.
Finally, the last common kind of history rewrite is to construct an organized
history out of a series of commits. During normal work, commits should
happen early and often—this means committing broken code, failing tests,
works in progress and ideas that do not make to the end of the development
cycle. It is not a history worth keeping around, and rewriting it is the
purpose of git rebase --interactive base-branch. After running
the command, the text editor pops up, showing each commit in the branch
at a line and asking how to proceed. It is possible to completely remove
commits from the history, edit them, reorder, or squash them together—that
is, turn several commits into one.
rewriting history can be hard for Git beginners, so do not worry
about it at first. Once past that stage, try to write small, simple notes to self
on the commit messages during development and use them to craft highquality commits when the work is ready to be merged into the main
development line.
an a s i de a b ou t k e ys
the next section covers setup to work with Git on multiple
machines. To keep privacy and control access to information, it is important
that machines are able to identify each other over the network. They do that
It is common for maintainers of freesoftware projects to ask for contributors
to squash the commits together before
accepting the code. This keeps the
project’s history clean and avoids people
trying to claim more credit than they are
due by artificially climbing up the chart
of commits per contributor.
pa g e 12 of 15 · git · leandro facchinetti
git rebase --interactive
allows for carefully crafted commits after
development is complete.
by using a mechanism analogous to the following scenario: suppose Alice
has the opportunity to meet Bob once, and later, on a second meeting, she
has to prove her identity. What Alice can do is to give Bob a padlock and
keep the key—when they meet again, she opens the padlock with the key
that only she owns.
In cryptography lingo, the key to the padlock is called private key and the
padlock itself is known as public key. The private key, as the name implies,
should be safely stored, away from other people. The public key can be
copied and distributed freely—after all, what could attackers do with a
locked padlock?
remote setup
the first step is to create the pair of private and public keys—see the
previous section for more on that:
$ ssh-keygen -t rsa -b 4096 -C "[email protected]"
The private key is the contents of the file ~/.ssh/id_rsa and the public
key is the contents of the file ~/.ssh/id_rsa.pub, both created by the
command above. Keep ~/.ssh/id_rsa safe and take note of ~/.ssh/id_rsa.pub, as it is necessary later on.
Now, decide what the remote is: it can be any machine accessible over
the network via ss h. On this class, the staff chose GitHub as the remote—
create an account at github.com and add the contents of ~/.ssh/id_rsa.pub the list of ss h keys.
r e m o t e r e p o s i t o ry
c r e at e a r e p o s i t o ry on the remote. On GitHub, click on the
New repository button—for the group projects for the class, the staff
creates the repository, it just shows up on your account, so skip this step.
Then, grab the u r l for the repository, which follows the pattern
[email protected]:<user-or-organization>/<repository>.git.
This is time to introduce the last element on the desktop metaphor: the
fax machine. It is used to share commits over the network to other computers
—it handles the authentication protocol based on the private and public
keys and sends data. It also comes with a list of frequently-called numbers. To
add GitHub’s number, run:
The padlock and the key—the principle
of how computers identify each other. An
alternative to holding a key is
remembering a secret, a password, like
that of padlock that requires a number
combination. In practice, this is
uncomfortable and insecure, because it
requires typing in at every use and
shorter secrets.
For the purposes of this discussion, think
of GitHub as a G UI over a machine with
SSH access and repositories created with
git init --bare—i.e., a repository
that lacks the working directory, or a
cabinet without a desktop.
For free private repositories on GitHub
and other goodies, sign up for the
Student Developer Pack at
education.github.com/pack.
It is also possible to create a remote to
host free private repositories by running
the git init --bare on an empty
folder of any machine accessible via the
network—for example those of the
undergraduate network. The UR L is
going to be of the form ssh://
<user>@ugradx.cs.jhu.edu:
/home/<user>/
<path-to-repository>.
pa g e 13 of 15 · git · leandro facchinetti
$ git remote add origin <remote-url>
In this command, remote add is a verb phrase, origin is the name added
to speed-dial, and the <remote-url> is number. From that point on, it is
possible to refer to <remote-url> by the name origin.
send commits
There is nothing special about the name
origin—it is just the conventional
name for the remote that is the
authoritative source of truth for the
project. When using git clone, Git
creates a remote with that name—more
on it later.
the key feature of distributed version control systems—among them, Git—is
that repositories are copied to remotes. This means that all nodes on the
network have a complete copy of all the history. Thus the fax machine
analogy: both sender and receiver have access to the transferred document.
The commits go over the network to the other computer and the local and
remote repositories are equivalent.
The command is:
$ git push origin master
Total 0 (delta 0), reused 0 (delta 0)
To [email protected]:<user-or-organization>/<repository>.git
* [new branch]
master -> master
origin is where to send and master are the contents of the fax. Git is
smart enough to figure out which commits it needs to send in order to bring
the remote up to date with the local branch, avoiding repeated work.
The local repository sending copies of
commits to the remote. Note the remote
is a repository without a working
directory. There is only a cabinet without
a desktop.
receive commits
the converse of the above operation is asking for the fax machine to
call the other side and request updates, if any. It is necessary to retrieve
collaborator’s work.
The command is:
$ git fetch origin
remote: Counting objects: 4013, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 4013 (delta 7), reused 0 (delta 0), packreused 3998
Receiving objects: 100% (4013/4013), 726.89 KiB | 0
bytes/s, done.
Git can also work as a protocol to
transfer projects around. Deployment
tools—e.g., Heroku—receive the code to
execute via Git and there are package
managers that work on top of it.
pa g e 14 of 15 · git · leandro facchinetti
Resolving deltas: 100% (2800/2800), done.
From github.com:<user-or-organization>/<repository>
* [new branch]
master -> origin/master
# …
$ git checkout master
Switched to branch 'master'
$ git merge origin/master
# …
Note that git fetch brings the commits to the local machine, but does
not automatically update the branches. It does, however, automatically
update references of the kind <remote>/<branch>—e.g., origin/master. The next step is to update the local branches according to
those references—which requires either git merge or git rebase. The
difference is git merge might create a new merge commit and git
rebase tries to rewrite history. In most cases, a fast-forward happens and
the two are equivalent.
To streamline the sequence of git fetch and git merge, there exists
the git pull command. The shortcut is the most convenient—but keep in
mind what is happening under the hood.
new c ol l aborators can start their repositories with:
$
$
$
$
$
$
As is always the case involving
git merge or git rebase, conflicts
that require manual resolution are
possible outcomes.
A configuration is available to make
git pull behave as git fetch
followed by git rebase. This avoids
spurious merge commits when multiple
people commit on the same branch.
mkdir <project> && cd <project>
git init
git remote add origin <remote-address>
git fetch origin
git checkout master
git rebase origin/master
The process happens frequently enough that Git provides the shortcut
git clone <remote-address>.
c o l l a b o r at e
as stat ed a few times thus far, Git is a flexible tool—many workflows
exist around it and teams should feel free to adapt it for what is best for
them. The suggested workflow is one increasingly used in companies and
free-software projects: push the commits to a branch on the remote and
open a pull request. The pull request is a proposal to merge the contribution
back into the main line of development and the start of a conversation.
Contributors review the code and comment on it, the developer changes the
code some more, and so on.
A network of collaborators. The arrows
represent remotes—i.e., the source has
the destination on speed-dial. Note how
there is no central node: that is what puts
distributed in distributed version
control system. Developers git push
and git pull from other developers,
GitHub, deployment services such as
Heroku and anywhere to which they
have access.
pa g e 15 of 15 · git · leandro facchinetti
Even when working on a project alone or on a group that meets in
person, it still makes sense to use the pull request workflow. It documents
the progress on a higher level than commit messages.
Similar to pull requests, GitHub also has a feature called Issues. They are
exactly like pull requests, except that they contain no code—their sole
purpose is to start a conversation. Issues are used for bug reports, feature
requests and support, depending on the project.
Using issues and pull requests is not a
requirement for group projects, but
helps the graders.
Before opening an issue or pull request,
check the project’s guidelines—some use
other tools to accept contributions and
GitHub only to host the repository.
where do you learn more about git?
besides the features already mentioned, Git can do a lot more: hook into
events and run procedures, stash commits for later, bisect the history
looking for a commit that introduced a bug, arbitrarily rewrite the project’s
history, handle repository dependencies as submodules, the list goes on.
There are also other ways to collaborate: sending and applying patches, pull
requests via email instead of GitHub, and more workflows defined by
particular teams.
The authoritative source of information about Git is its manual—
available via the man command and online. But, because it is complete, the
manual can be hard to navigate. The best way to become a Git expert is to
read the Pro Git book, available for free at git-scm.com/book. When trying
to solve an specific issue, try Stack Overflow—the most popular questions
are about Git—and GitHub’s help on help.github.com.
For beginners, there are tutorials available online. Some work on the
browser and do not require installing Git on the machine, for example
try.github.io. Check gitimmersion.com and codeschool.com/courses/try-git
out, too.
To learn more about writing Git commit messages, go to
tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html and read
the repository history of Linux and Git, they are examples to follow.
For examples not to follow, head to
whatthecommit.com and refresh the
browser a few times.
c o n c lu s i o n
the lecture and these accompanying notes started stating the problem
of version control, proceeded showing how Git can solve it and directed to
sources to learn more. It covered the basic underlying concepts and
contained examples of the most useful operations. This knowledge should
be enough for the group projects and to contribute to free-software projects.
colophon
the lecture notes were composed by the author on os x Pages
and the accompanying slides on Keynote. The serif font is Iowan Old Style,
designed by John Downer and released by Bitstream in 1990. The sans-serif
font is Source Sans Pro, an Open Font designed by Paul D. Hunt and
released by Adobe Systems. The typewriter font is Source Code Pro, created
as part of the Source Sans project. The page design is inspired by the works
of Matthew Butterick and Edward Tufte. Colors for the slides come from the
Solarized colorscheme, by Ethan Schoonover. The illustrations are from
Lingo, by The Noun Project.
◼
“I’m an egotistical bastard, and I name all
my projects after myself. First Linux,
now Git.”
—Linus Torvalds, creator of Git.