Bernice's Dev Bootcamp Blog ^_^

git, GitHub, and version control

October 5, 2015, Monday

This topic of version control has probably been talked about to death by everyone plus their mothers. But only because it's a really important concept. It cannot be emphasized enough!

Screenshot of version control search results.

The search results are OVER 9000!!!!!!!!!!!

Here's why.

What are the benefits of version control?

Have you ever played those videogames that only have one save slot, and it auto-saves in such a way that you've made the game accidentally unwinnable? (For example, autosaving right after you've accidentally destroyed or lost some plot-advancing item (Diablo 2), or even auto-saving right after your army is dwindled and your important general was killed (Total War), or even realized that some tactical decision you made several turns ago has turned out to be a strategical error and now all the other factions have ganged up against you (Civilization and Total War), or auto-saving right when you've glitched into a wall (Dying Light), or auto-saving when the enemies are right beside you so they kill you instantly as soon as you load the game. Oh great, now you have to start over again from the beginning.

(Yes, I play too many videogames, and in general I dislike games that auto-save and don't allow you multiple save slots. But that's another story for another day).)

Have you ever accidentally saved over your work, but then realized that you've deleted something really important, and you can't remember how to put it back?

Well none of those things will be a problem as soon as you start using version control! ^__^

What is "version control"? One of my friends so succinctly says:
"They are tools to help you AVOID massive fuckups ;-)"
A good way to think about version control is: it is like a videogame with multiple save slots, so if you've accidentally lost that plot-advancing item, or you've made some strategic decisions that gave you a bad outcome, you can just go back to that known good saved state, and you don't need to start the game from the beginning.

Or, if you don't play videogames, you've performed a backup of your entire OS each time right before installing some new software, and you keep each version of the old backups. One day, you've suddenly discovered that the new driver or whatever is incompatible with everything else, or the new software is actually malware that corrupted your system. You can just revert to a recent backed-up state. If you find that this does not solve your problem, you revert backwards through each iteration until you find the backup with a known good state. This way, you don't have to re-install your OS or figure out which software or driver is incompatible with what.

If those examples are still not relatable enough, here's an example that more people might have encountered.

Let's consider that you have crafted this awesomely complicated Excel spreadsheet (let's call this myExcelv1.xlsx). Everything works properly. One day, you needed to change some formulae for whatever reason, but you are not sure if the change that you've made will mess up the other calculations and references.

Screenshot of complicated Excel formulae.

Screenshot of complicated Excel formulae. Imagine all the possible #REF! and #VALUE! errors if there's a mistake!

So instead of making your changes in this known good Excel file, you save a copy of myExcelv1.xlsx, and you can mess with the new copy (myExcelv2.xlsx) with the peace of mind that even if you mess up on the new copy (myExcelv2.xlsx), your incredibly complicated formulae is still safe in the original, and you can keep trying different things until your changes don't break anything. Now, myExcelv2.xlsx will be your official main version.

In summary "version control" is the idea. "Version control system" is the implementation of this practice, but nowadays it is more commonly the software system that implements the practice of controlling versions.

git is just one of the version control systems out there, the other being SVN or "SubVersion" or CVS (not the pharmacy). See here for a more comprehensive list: https://en.wikipedia.org/wiki/List_of_version_control_software. See here for a review of different version control systems: http://www.smashingmagazine.com/2008/09/the-top-7-open-source-version-control-systems/

How does git help you keep track of changes?

Basically, a version control system like git can be downloaded and installed on your local machine, and it can do all these wonderful things like tracking the changes. Since I'm using git, here's an example of my git log output to show the tracked changes:

Screenshot my <code>git log</code> output.

Screenshot my git log output. (I have a lot of commitments...)

When writing your own code alone, you can make your backup copies of your files manually the old-fashioned way, making a new version and manually renaming the file with a different name. But having a version control system will automate this for you. All of that work will be simplified and streamlined. The date, time, who made a change, and what got changed will all be tracked for you. Sure you could do this manually, but wouldn't you want to spend your time doing other things?

If you would like to try out git for yourself, you can download it from https://git-scm.com/downloads. Then follow the instructions for installing it.

There are also lots of useful guides found online: OK, I have to admit that in the beginning this was all very intimidating, even with all the guides. Also, a lot of people don't want to add one more thing to their workflow, or set aside the time to learn this. So I understand the resistance to learning these or adding them to the workflow (not the case for coders, but for non-coders).

Someone may argue, "But Bernice, regular MS Word and regular MS Excel and Photoshop already have these features inside their program. You can undo and redo actions. Why do I need to do all these?" To this, I say, "When you turn off those programs and save, then later turn them back on and realize that something is not right, you cannot access that 'history' and undo stuff." I am also a lazy person, so I don't want to do something if it has no beneficial effect to anyone at all. But in this case, a temporary and small inconvenience is worth it, if it prevents much wailing and gnashing of teeth later on.

Now, this worst-case scenario has happened, and you've saved and shut down your program. How do you revert to the previous commit? The instructions in https://git-scm.com/book/en/v2/Git-Basics-Undoing-Things shows us that git commit --ammend can un-commit something for us.

But what if you want to revert back to something a few commits ago? Like the question of this person in Stackoverflow: http://stackoverflow.com/questions/4114095/revert-to-a-previous-git-commit. In this case, we use git revert and git reset. More details for each are here:
git revert git reset In order to find the commit number, we can check what commits have been logged. To check what commits have been logged, you type git log. To get a more detailed look at each commit information, it's git log -p. But both of these commands will show all the commits since the beginning. If you want to limit the number of things you want to view, it's git log -# (replace '#' with a number). You can even combine them!! Example, typing git log -p -3 will show the detailed log of the last 3 commits. More information here: http://git-scm.com/docs/git-log And just like that, your problems are fixed. Like magic. This was all really confusing in the beginning, and all looks like reversing time or time travel or parallel universes or time paradoxes. (And that's because in a sense, it kind of is.) But after understanding what each step does and deliberately practicing a lot and stepping away and coming back (rinse and repeat), I finally figured it out. The first time I learned this, I did not learn this overnight. Even now, with my newfound knowledge, I still need to look up some of the commands if they are not something that I use everyday. But that will change with more use, as time goes by.

Very important things to take note of:
  • A commit is a recording of a snapshot of where your code is at a particular time. You can capture as many of these snapshots as you would like, and then you can go back and visit any snapshot whenever you would like.
  • You cannot save what you did not git add and git commit.
  • About tracking changes: If a creator/git user does not git add and git commit -m 'Add comment here' early and often, git can only do so much to help you. Think of it like the videogame saves analogy. If you wait until much further in the game to save, then you decide that you want to restore to an earlier save point, that cannot happen if there's no early save point to go back to.
  • Adding messages to your commits are also important, so you will know which ones to look for later on in case you need to revert. Relying on the detailed view of git log -p will do no good if there were lots of changes and you're just greeted by a huge wall of text.
  • There is no penalty for committing multiple times. So git add . like no one's watching; git commit like there's no tomorrow!!
If you follow all the rules properly when you fork and checkout, you will have everything in place, and it will be less confusing. This way, your git repo will something look like this:
... instead of something like these:

Why use GitHub to store your code?

What is GitHub, then? Like the name implies, github.com is a website that is a hub where git repositories (the technical term for git 'information' is "repository") are stored. This is like an offsite/remote backup of your work. There are many other online repositories too, like BitBucket or Atlassian's Stash. But online repositories are more than offsite/remote/hosted storage. The true power of online repos (as they are abbreviated) is the ability for other people to see a project and collaborate on it AT THE SAME TIME, even though they are geographically apart.

Now, I'll tell you why this is super-useful -- even for non-coders. Let's go back to the example of the Excel file with tons of complicated formulae, and references to who-knows-where, which we called myExcelv1.xlsx.

The most likely natural habitat for such complex Excel files are usually in work environments. Someone who works on this file is more likely sharing this file with other people in a team, or from different teams, who are also working on the same file.

Screenshot of complicated Excel spreadsheets
Nice Excel formulas you've got there. It would be a shame if anything happened to them.

If someone were to version-control myExcelv1.xlsx the old-fashioned way, they would make a copy and name it myExcelv2.xlsx. But what if the deadline is coming up, and multiple people need to work on the file? Does this mean that there are multiple copies of myExcelv2.xlsx floating around? If I updated one section of my copy of myExcelv2.xlsx, and you updated another section of your copy of myExcelv2.xlsx, then it stands to reason that each of our copies would have different parts that are updated and outdated at the same time. The other way -- only one person can work on it at a time -- is equally as bad and as inefficient.

This is why online repos like GitHub are great tools for version control. Multiple people can work on something at the same time, even if they are all far apart from each other. And even if they work on something at the same time, they will not overwrite each other's work.
  • Even if nothing else, it's a really good idea to store your work not only in your local machine. For example, something happens to your local machine for whatever reason, nobody will need to cry because all of that work is lost. (Because it's not, it's saved remotely.)
  • If you want to work some more on your own code, you can fork a branch from the original, and mess with it as much as you want, without impunity!!!!!
  • If other people collaborate with you, then they would also fork their own individual branches, and mess with the code to their heart's content. These collaborators don't even have to be in the same location as you.
  • Because everyone is working on a branch of the code, the original known good working version remains intact.
  • Eventually, people need to combine all the work back to the main branch (the master). When this happens, this is called a pull request. People can review each other's changes, and catch mistakes before combining the work back to the master. This makes collaborating in teams much easier.

How does this whole thing work?

There are many guides online, but here is a quick summary in git and GitHub, since those are the version controls that I'm currently using.

Instructions for creating a new repo:

  1. Step 1 - If you don't have this already, create a login and password in GitHub.
  2. Step 2 - In your own GitHub page, look for the "+" sign on the upper right corner beside your avatar/picture. You'll know you're in the right place because when you hover over it, a sign will appear that says "Create new...".
  3. Step 3 - Click on "+" sign on upper right corner beside your avatar/picture. A dropdown menu appears.
  4. Step 4 - On dropdown menu, please click "New repository". A new screen will appear.
  5. Step 5 - In this new screen, you can customize things about your repo such as the name. Please choose the name of your repo, choose whether or not the repo is public vs. private, and choose what kind of license is being used, since these are required fields. The others are optional, but you can add information on them if you want.
  6. Step 6 - When you have made sure that everything in Step 5 is to your liking, please click "Create Repository". Congratulations! You have made a repository in GitHub.
  7. Step 7 - In order to add more files to this repository, let us link it to your local machine through git. For this, please make sure that you have git installed from here: https://git-scm.com/downloads.
    • git config --global user.name "Your GitHub Username Here"
    • git config --global user.email "your.email@address.com"

Instructions for forking an existing repo:

  1. Step 1 - If you don't have this already, create a login and password in GitHub.
  2. Step 2 - Please go to the GitHub page of whichever repo you'd wish to fork.
  3. Step 3 - On the upper right corner of the screen, under your avatar/picture, there is a button called "Fork". Please click on this.
  4. Step 4 - Congratulations! You have successfully forked a repo. You'll know that you have done this correctly if the name before the slash of the repo name is your own username, and under that, it says "forked from originalUser/source" or something like that, and the picture beside it is a branch instead of a book.
  5. Step 5 - In order to add more files to this repository, let us link it to your local machine through git. For this, please make sure that you have git installed from here: https://git-scm.com/downloads (Links to an external site.).
  6. Step 6 - On the right sidebar of the forked repo's page, there is a box under "HTTPS clone URL". Please click the box beside it that says "Copy to clipboard" when you hover over it.
  7. Step 7 - Please open your CLI. Please "cd" to the directory where you want to save your cloned git repos.
  8. Step 8 - Once you are in the directory that you want, please type this command:
    • git clone (and before hitting "Enter" on the keyboard, paste the URL that was copied in Step 6. NOW, hit Enter).
  9. Congratulations! You now have a copy of this forked repo on your local machine.

Instructions for tracking changes:

  1. git pull origin master (to make sure that you have the latest and greatest version of the project)
  2. git checkout -b [name_of_branch] (to make sure that when you make a change gone awry, you can always revert to the original)
  3. git add [can be . or can be the filename(s)] (prepares the changes to be committed)
  4. git commit -m "imperative message here" (saved state, prepares the changes to be pushed to the remote repository)
  5. git push origin [your-branch-name-here] (pushes the changes to the remote repository)
    • It will ask for your username and password in the remote repo here.
    • Then a message will appear with the details of what was pushed to the remote repo.
  6. git checkout master (moves your view into the master branch)
  7. either git pull origin master, or git fetch origin master (makes sure everything is up to date on your local repo AND the remote repo)
Now comes the most powerful part -- the pull request.

This happens in the remote repo; in this case in GitHub. Anyone who logs in and has access to the remote repo in GitHub will see this:

Screenshot of 'You recently pushed branches: edit-blog.  Compare & pull request' in GitHub.
In GitHub, you recently pushed branches: 'edit-blog'. Compare & pull request.


Screenshot of 'Open a pull request' in GitHub.
Open a pull request in GitHub. (Right-click on the image and open on a new tab or new window to expand the image.)

As can be seen in the screenshots above, when someone clicks on "Compare & pull request", the screen takes you to the "Opens a pull request" page. On this page, the submitter can write comments, see who did what and what time, etc. Also, please note that person who submits the pull request can compare a branch to another branch, not just to the master branch. The person who submits the pull request can also compare the changes line by line, by scrolling down in that pull request page. Please take note the part that says "Able to merge. These branches can be automatically merged." When everything is completed, click "Create pull request", and wait for someone to review the pull request.

In the next step, someone else typically reviews the pull request. They will see this screen:
Screenshot of 'Merge pull request' in GitHub.
Merge a pull request in GitHub. (Right-click on the image and open on a new tab or new window to expand the image.)

If for whatever reason, the pull request is not accepted, the reviewer can comment and click "Close pull request". Reasons for not accepting a pull request are detailed here. But if the pull request gets accepted, then the reviewer clicks "Merge pull request". Now, the branch code gets merged back to the master branch. That's why it's very important to review the code and make sure that everything works properly before merging back!!

Now, say it with me: git and GitHub are 2 very different things!

Despite the similarities in their names, git and GitHub are 2 different things. As mentioned earlier, git can be downloaded and installed in your computer, but GitHub cannot be, because it is the website (github.com). git is a system of version control, but there are many other systems too like CVS (again, still not the pharmacy) and SVN. GitHub is an online repository where people store their work, and other people can see this repo to collaborate on their work.

Final Thoughts

I will be the first to admit that in the beginning, I was really scared of putting my work out there like that in GitHub for all to see. It was scary mainly because the work was still raw and not done yet. It was full of errors, and it looked really ugly to me. It's not done yet. It's incomplete. It's all over the place. It's not ready for other people to see. People will see how much I suck.", we tell ourselves. But let's face it: as people who create things, works will never be "done" or "completed". There will always be some kind of tweak that needs to happen, and as we mature and gain more experience and techniques, we'll always look back at our old work, and see what could have been done better. Hindsight is always 20x20, and past work will always suck compared to current work. (Hopefully.)

Anyway, do you know about Brandon Sanderson? Now, this may seem like I'm changing the subject and it's coming out of nowhere, but bear with me and hear me out. Brandon Sanderson is a famous author who writes speculative fiction, specifically in the fantasy genre. He's one of my favorites because in his stories, the worlds are consistent and his writing style is elegant yet not overly flowery. I'd say he's a nice balance between Hemmingway/Walt Whitman/Asimov and Humberto Eco/Jose Luis Borges/Franz Kafka/Neal Stephenson. When people like us read their works, we think "Oh wow, they just made something automatically just like that." But truthfully, we don't see the process behind their writing -- the edits, re-edits, throwing away ideas that don't work with the rest even though it's a great idea by itself.

The point is, Brandon Sanderson wrote a book called Warbreaker. That's not surprising, since he's a writer. But what's different about this book is that he put everything online, including the original drafts, and all the edits. In his own words:
And so, I did something crazy. I went to Tor and asked if they’d be okay with me posting the entire version of Warbreaker AS I WROTE IT. Meaning, rough drafts. The early, early stuff which is filled with problems and errors. For those who are aspiring novelists, I wanted to show an early version of my work so they could follow its editing and progress.
Think of what Brandon Sanderson could have done if he had version control or used GitHub! ^_^

The original essay in full context can be found here: http://brandonsanderson.com/books/warbreaker/warbreaker/, and you can download and read the different iterations of his book Warbreaker there too. He said a bunch of other things too, but the point that I was trying to make is that when I was reading the original drafts, I didn't really think that the writing was bad or problematic. It was good enough. I bet it's the same thing with our code. Regular people will not really think anything ill of it. Also, it's some kind of output rather than nothing, and it can always be fixed at a later iteration. The real reason why each of us is self-conscious of our own works is because we spend a lot of time with ourselves, so we know ourselves. But truthfully, other people don't know any better, and are being self-conscious of their own selves for the same reason that each of us is self-conscious of our own selves.

And when other smarter people pay attention to your code enough to want to fork a branch to fix stuff, that's actually a complement, because they think that it is worth their time to work on your code. So be happy!!! ^___^ (Speaking of which, wanna see my older repos? https://github.com/BerniceChua/BouncingDuckies and https://github.com/BerniceChua/Breakout_-_Java_Homework

So put your work out there, and happy coding!!! ^__^