Managing your files by simply saving them in folders on a hard drive runs afoul of some core concerns of anyone working on a computer for a living:
- Preserving your work. It's easy to accidentally overwrite a file containing significant amounts of work. Depending on how much work is lost, this can be devastating. Pixar, for example, deleted nearly all of Toy Story 2 when an errant
rm -r -f *command was executed (the
-fflags mean "recursive" and "force", respectively). They were saved by the Supervising Technical Director, who had made a copy of the file tree so she could work from home after giving birth to her son.
- Tracking history. If you have a way to know what you did and when you did it, you can perform more dynamic operations on your content. For example, suppose you recently made two rounds of edits on a document, and you decide that the first round should be discarded, because the circumstances that motivated those edits has changed. If you have a way to isolate the first-round edits, it's possible you'll be able to do that in an automated way. Otherwise, you'll have to do it manually.
- Managing versions. Slightly different use cases often require you to maintain different versions of a given codebase. For example, clients might have different requirements that require custom modifications. If you choose to maintain these versions in separate directories, you have to deal with transferring any changes to the common part of the codebase to all of the different copies. This quickly becomes a major maintenance headache.
- Facilitating teamwork. Each team member should have maximum flexibility to work on a project and have that work reflected in their teammates' copies of the project. Some care must be taken to achieve this, because if two people make changes to the same file at the same time, their new versions must be merged.
Software designed to address these concerns is called version control. We will be working with a specific version control system called
git which was created by
Git main concepts
Git keeps a record, called a repository, of the history and versions of the contents of a particular directory (including its subdirectories, their subdirectories, and so on). The typical setup is to create a single directory for all of the files relevant to a given project and initialize a repository in
Git uses two components to manage a repository in a given directory: a command-line program called
git and a
.git. Commands are issued to
git to manipulate the contents of
Unlike syncing services like Dropbox or Google Drive, Git doesn't do anything automatically. All interactions are deliberate. This is helpful, because it means that changes made by a colleague won't be uninvitedly pushed to your machine where they might break your environment.
Conceptually, a git repository consists a collection of complete snapshots of the directory contents. These snapshots are called commits. The commit immediately preceding a given commit is called its parent. Commits and parent-child relationships between commits are the fundamental constructs of a Git repository.
- The name of the hidden subdirectory containing the files Git uses to maintain a repository is
- Git keeps your folder synced to the cloud at all times
- Commits in a Git repository are organized using parent-child relationships between commits
- A commit corresponds most closely to a
Changes in a Git project migrate through a series of zones. When you make changes in your directory, Git initially knows nothing about them. You stage your changes to a staging area, then commit them to the repository. A project involving multiple contributors typically has a remote copy of the repository on a website like GitHub. When you are ready for your colleagues to get your changes, you push your local repository to the remote repository.
Why does Git have so many zones? The staging area is necessary to help you distinguish files you want Git to track from files you don't want Git to track, and to provide an area to prepare for a well-organized commit. Having both local and remote copies of the repository allows you to make commits even when you don't have network access. Although this workflow might seem at first to be overly complicated, its benefits for flexibility and organization are often regarded as a positive distinguishing feature of Git (as compared to version control systems with fewer such zones).
- Removing changes that have been prepared to be included in the next commit is called
- In a typical Git project with 4 zones,
of them are stored on your computer (as opposed to the cloud).
Suppose that you and a colleague begin working on different parts of a project at the same time. The commits you make and the commits they make might share a parent (namely, the latest commit at the time when you begin working). If we visualize the set of commits as a graph, this corresponds to a split in the graph.
You can maintain these two separate lines of development in the same repository by labeling them as new branches, as illustrated in the figure above. The most common convention is to have a main branch called
master and label other branches descriptively. A branch is a pointer to a particular commit. When a commit is added to a given branch, the pointer moves to the new commit:
A branch is a
Typically you will want to merge the changes from your branch back into master. In the example above, the
mybranch commit is a descendant of the
master commit. In this case, there is no potential for conflicts, and the merge can be performed by simplying pointing
master to the same commit as
mybranch. This is called a fast-forward merge. After merging, it's safe to delete the
After your branch is merged into master, your colleague wants to merge their branch as well. If you edited the same parts of the same files as your colleague, a decision will have to be made about what version of those sections to incorporate into master. Git handles this by putting markings in the file which look like:
<<<<<<< master The quick brown fox jumped over the lazy dog ======= The brown fox jumped over the quick lazy dog >>>>>>> mybranch
Your colleague will have to locate and remove these conflict markers one-by-one, and then stage and commit the resulting files. This commit will have two parents, indicating the two commits which were merged.
We will discuss the commands for performing these operations in the Core Git workflow section below.
Suppose that you make a copy of a popular repository on GitHub (called a fork), and you spend a couple of months working on a new feature in a new branch you create. If you propose to merge your new branch back into the master branch of the project (this is called a pull request), it's likely that the merge
When you first set up Git on your machine, there are a few configuration steps you want to take. The first is to let Git know about your name and email address.
git config --global user.name "Jane Doe" git config --global user.email "firstname.lastname@example.org"
You might also want to turn on colors:
git config --global color.ui true
Core Git workflow
In this section, we'll work through all of the commands necessary to carry out the most common Git operations. We'll begin by creating a directory and initializing a Git repository inside it.
mkdir our-novel cd our-novel git init ls -a
We can see that
git init did create a
.git directory. The other way to get a Git repository is to
Next, let's create a file for our initial commit. The git command for staging a file is
git add. The
--all option stages all of the files in the current working directory.
echo 'Once upon a time,' > chapter-1.txt git add chapter-1.txt # or git add --all
We can inspect the status of our working directory and repository using
The contents of the staging area are indented under the heading
Changes to be committed.
Now we can commit the staged changes, including a descriptive commit message with
git commit -m 'Initial commit'
We can display a record of commits using
You'll notice that commits are identified by a long hexadecimal string like
d9599305d257a40c0b394a1af78dfe995f0010c7. This string is a
HEAD is a pointer to the branch you're currently on, so
HEAD -> master indicates that the
master branch is the currently checked out branch.
The output of the
git log command is more helpful with a few of its options set to a non-default state. Let's go ahead and make a git alias so we don't have to type all of these options out every time. We'll use the name
lol, which is a customary choice for this alias.
git config --global alias.lol "log --graph --decorate --all --oneline" git lol
Finally, if we want to store a copy of the repository on GitHub, we visit github.com and create a new repository. Then we connect our local Git repository to the remote one we just created.
git remote add origin email@example.com:jovyan/MyRepo.git git push --set-upstream origin master
jovyan is replaced by your actual GitHub name, and
MyRepo is replaced by your repository's name. The first line makes the connection to the remote repository and names it
origin, while the second line sets the default remote repository to
origin and pushes to GitHub. Note that the
--set-upstream origin master part is only necessary on the first push; subsequent pushes can be done with
It's a good habit to begin each work session by running
git pull to fetch any changes that have been pushed by collaborators to the remote repository and merge those changes into your working directory. This operation aborts if you have changes in your working directory that conflict with the changes from the remote repository. One good way to resolve this issue is to
stash your local changes and then
apply them after you
git stash git pull git stash apply
git stash creates a new commit which is not on any branch, and
git stash apply merges the latest stash into the current branch.
- The command for initializing a new Git repository is
- The command for checking which files are staged is
- The command for staging a file is
- The command for committing is
- The command for showing a decorated history of commits is
Git Branching Commands
Suppose we want to experiment with dragons in the novel's storyline. We can make a new branch called
dragons for working on this idea.
git branch dragons git lol
We've created a new branch called
dragons, but we still have the
master branch checked out (you can tell because
HEAD still points to
master). Let's switch to the new branch:
git checkout dragons git lol
We can now add some dragon content and commit it:
echo '\n\nthere be dragons!' >> chapter-1.txt git add chapter-1.txt git commit -m 'Add some dragons'
Now let's switch back to the master branch and commit some different changes:
git checkout master echo '\n\nin a galaxy far away' >> chapter-1.txt git add chapter-1.txt git commit -m 'Write another line' git lol
Suppose we decide we do want to incorporate the dragons into the story. We want to
While we have the master branch checked out, we do
git merge dragons
Git tells us that this merge led to conflicts, and we'll have to resolve them before making merge commits. Let's look at the new contents of
The next step is to edit the file and commit it. Typically you would edit the file in a text editor (we'll see a particularly good way to do it later in this course when we cover VS Code), but here we'll just use
echo 'Once upon a time..., in a galaxy far away..., there be dragons!' > chapter-1.txt git commit -m "Merge the dragons into the story" git lol
Now we can delete the
dragons branch. Since branches are just pointers to commits, this operation does not result in the loss of any snapshots in our project history.
git branch -d dragons
Write a sequence of Git commands to create two new branches, one with dragons in the story and one with wizards in the story. Commit a change to each branch, then merge the wizard branch into the dragons branch, and finally merge the dragons branch into
git lol to confirm that your repository log reflects the wizards to dragons to master merging sequence.
Suppose you want to have a look at the state of your novel one commit ago. You refer to the commit which is any number of commits back using a tilde followed by the desired number of commits, as in
git show command lets us extract a single file from a given commit:
git show HEAD~1:chapter-1.txt
Alternatively, you can refer to a particular commit by a distinguishing initial segment of its hash (note that you'll have to
git lol to get an appropriate commit identifier for your session before you can run this cell):
git show 06d23b9:chapter-1.txt
We can see just the changes between two commits with a
git diff HEAD HEAD~1 chapter-1.txt
Let's say you decide you want to go back to the version of a file two commits ago. You can
checkout a single file.
git checkout HEAD~2 chapter-1.txt git status
This operation changes the file in the local working directory. You can then stage and commit that change, or edit the file further and then stage and commit.
Write a Git command to replace the contents of
main.py with their contents four commits ago.
Solution. We checkout the file at that commit:
git checkout HEAD~4 main.py.
Write a Git command to show the changes in the file
main.py from four commits ago to two commits ago.
Solution. We use
git diff and specify the two revisions:
git diff HEAD~4 HEAD~2 main.py.