Github is at the same time one of the most useful technologies developers use and one of the most annoying, especially for novices. I think things are compounded for beginners by having to learn terminal at the same time. This blog will go through what you need to do to get off the ground and offer some commentary from someone slightly above beginner level on common mistakes or why some things just are not intuitive.
What is Github and why do I have to use it?
Being a successful coder will always involve working with other coders on the same project. There’s simply not enough time or brainpower in the possession of one person to accomplish what companies need to get done. At the simplest level, Github is a way to make that process more streamlined and safer. Github is also the standard for version controlling your projects. Version control means saving things in a way where changes can be undone and recorded.
In Github, one of the main things you will utilize is a repository. For our intents and purposes, a repo is basically a folder. You will download the repo to your hard drive, be able to make changes locally then upload these changes to Github.
First things first, you are going to need to make a GitHub account, that should be pretty self-explanatory. Once you are logged in, you can create your first repository (repo if you’re a cool cat). In the upper right of your screen, there will be a bell, a plus sign, and your profile picture or lack thereof. Click the plus sign and you should get to this screen:
The only thing you have to put in here is a repo name. I recommend you add a description and a Readme as well. This next screen might be overwhelming, because Github gives you a lot of different options, and options, when you are just starting out, can be overwhelming.
We are going to do these steps! First, you’re going to have to get to your command line. On a system running macOS, you can just use spotlight search and type in the terminal. I only have experience with macOS and the command line but I am sure it’s straightforward on other systems as well. Once you get your terminal open. You may feel a mixture of emotions, confusion but also the sensation of being a true programmer as you stare into its blank text-driven abyss reminiscent of the matrix or early computer programming in general.
From here you are going to need to install git. Semi confusingly, git and Github are not the same things. Github is a tool for interacting with git. Try typing in git — version, If you have it, it will tell you which version you are using and if you don’t it will prompt you to install it.
Now we can get back to that confusing list of steps GitHub threw at us from the jump. You can just copy and paste these in, I will quickly go through which each one is doing. ECHO is a command that makes a new file on your system. GIT INIT is turning that file into a repository. GIT ADD README is adding your readme file. GIT COMMIT is committing the readme to Github. GIT BRANCH -M MAIN is creating the main branch and switching you to it. GIT REMOTE ADD ORIGIN is one of the most important ones, it is what's6 making sure that the repository you created on GitHub and the folder you are in are interfacing. Lastly, GIT PUSH is another one you will use all the time. It pushes any committed changes to Github, now the repo online and the one on your computer will have the same content.
OK, I’ve got a Repo, what the heck do I do with it?
In this section, I will take you through my workflow using jupyter notebooks. This is only applicable to data scientists/people using python. Here's a primer on jupyter notebooks. First I'll navigate into the repo I created and open a jupyter notebook, then I can type some stuff and save it. Now how do I get this pushed to git?
git add notebook1
First, you need to add the changes. Think of adding changes and writing down what you want to upload but not actually uploading. Then once you have everything you want you can commit:
git commit -m"commit message"
Commit is the actual command here, -m is how you add a message. Whatever is in quotes will appear as a comment along with your commit. SIDEBAR, make an effort here, please. When you’re working on a solo project it may feel like a waste of time, but a. it's a great habit, and b. there's a high chance you will forget what you did 10 commits ago. Here’s a good article on commit messages.
After that git push: the command we already did when we copy and paste all that stuff is what ultimately sends your changes through to Github. Add is kind of a checklist to keep track of changes you made, commit is a way to make those changes permanent locally, and push is the way you upload them to Github. This very finicky workflow can seem overwhelming but when you’re a full-fledged data scientist and developer, version control is incredibly important. Get in good habits now!
Branches are one of the most important concepts when you are working with a partner or group on a Github project. They allow you to test your own changes, without having any effect on what’s already been done. In short, branches are one of the most magical parts of git, so let’s go over their structure and how to make them.
Think of the main branch as your actual project. You don't want to make any changes here
This command will both make a new branch and switch you into it.
git checkout -b "name_of_your_branch"
Now you can literally do anything you want, and until you merge your changes to the main branch nothing will be affected. If your changes completely suck then you can just delete the branch and you can learn from your mistakes without breaking anything.
If you’re confident in your changes then you can merge back to main with:
git merge "branchname"
Don’t forget, you need to add and commit the changes within the branches before you can merge to main. Another option is to push the changes ON YOUR BRANCH to Github then merge on the Github site. This is the approach I use for a couple of reasons.
- The Github site has a nice GUI for dealing with merge conflicts if you mess something up
- I would rather have the changes uploaded but not merged to git, then merge them locally and not be able to upload them to Github for whatever reason.
- If you are working on something with multiple people and they merge their branch then push it to Github you may run into issues if you have poor communication.
- If people are sure to merge their changes on Github, as long as you do git pull frequently to grab whatever has been merged this issue will never happen. And if it does happen, you will still be able to interact with the main, just not merge your changes. Merging locally will create a roadblock where you need to fix the merge conflicts before you can do anything else.
That's really it to get your first project off the ground, it seems like a lot of things to remember, but just think through which step of the workflow you’re at and don't merge into something someone else has merged into. Good communication is one of the most important parts of a good Github workflow. Get in good habits of documenting your code and writing commit messages on your solo projects.