Nova Teque : December 2015

Repository structure

If you are unfamiliar with Git, a broad outline may help introduce terminology before reading one of the many Git tutorials online.

Git is one of several source control systems - others include Subversion, RCS, Mercurial - which are used to keep a record of program source code as development progresses, as a means of keeping track of what went into a particular release, tracking down where a bug was introduced, and any other operation that benefits from a full and detailed history.

A Git repository is where the entirety of a particular project lives. A repository is completely contained within its directory, so a normal system-level copy (or rename, or delete) of an entire Git repository is a safe operation. The resulting copy is both independent of and unaware of the original.

Data

The working tree

The files and directories - the things you consider to be 'under source control' that you change in your editor or otherwise, live under the top repository directory but outside of its .git/ subdirectory. Only the top level directory has a .git/ subdirectory, unlike svn - say.

You can edit, delete, and rename files without telling Git until you want to: Git doesn't operate as a daemon or service, so it only runs while you execute a git command. It is very good at working out what changes you've made.

During your working, you can choose to ask Git to remember some or all of the latest differences between how your files are now and how Git's permanent repository remembers them. The place it remembers them is called 'the staging area' and remembering them is called 'staging'. You can also remove or exclude selected changes from staging if it suits you. Staging diffs is easy, and is similar in concept to creating a patch file. Staging is the precursor to committing those changes as a permanent record in the Git repository.

.gitignore

This is a plain text configuration file which contains filename patterns that git should consider invisible for the purposes of source control - e.g. *.o, *.class, bin/

it is outside the .git directory, along with the 'normal' files, in order to keep it under source control: everyone in a project typically needs the same exclusions.
it can refer to itself, to excuse itself from source control!

Metadata

The 'behind the scenes' source control stuff all lives within the .git/ directory at the root of the repository.

The index

the index - a set of file changes you're preparing (with 'git add') to create in the repository as a single commit (version); it lives in .git/index

file changes you've made but haven't yet told Git about ('unstaged' changes) coexist alongside ones you have added to the index ('staged' changes).
creating a commit (with 'git commit') copies the staged changes (i.e. those currently in the index) into a new commit and then clears the index

creating a commit does not affect the working tree.
the new commit becomes the current commit and incorporates the staged changes, so they no longer show up as differences between the commit and the working tree.
unstaged changes still show up as differences, until added to the index (or reverted).

the index can even be told to stage selected 'hunks' (patches/blocks of change) within a file, while leaving other hunks within the very same file unstaged.

If this sounds useful to you, look for documentation on 'git add -p' or the tool 'gitk'.

Branches, tags, and stash

these are ways to refer to particular commits, and they live in .git/refs and .git/packed-refs

a tag marks a particular commit and is only ever moved manually; other SCMs might call it a label.
a branch is a special mobile tag that is updated whenever a commit is made to that branch.

a branch is thus really a label that is always on the commit that most SCMs would call the branch/HEAD version.
local branches live in .git/refs/heads
branches from other repositories are maintained in .git/refs/remotes

the stash is a special set of commits used to hold temporary changes.

Branches and tags are very cheap and almost identical - you can make them whenever you want to remember a particular point in development.

if curiosity leads you to look in .git/refs/heads/master, you will see the 40 character identifier for the commit that is the master branch. Much Git metadata is in plain text files.

Configuration

information about which remote repositories this repository can talk to, how local branches relate to remote branches, etc, lives in .git/config.

Hooks

special scripts that run for various source-control events to enforce policy.
unfortunately, hooks are within the .git directory and not therefore under source control.

For instance, there could be a hook that runs when you 'commit', to ensure that every commit comment contains a JIRA ticket reference.

Global settings

User preferences - real name, email address, preferences, command aliases, etc - are held in a user's home directory in a file called ~/.gitconfig. If you set them earlier, with the commands

git config --global --add user.email this.person@here.com
git config --global --add user.name "This Person"
then you will be able to see them in that file. You should set them now if you haven't already, because if you fail to set your email address, you will not be able to upload to Gerrit.

If you messed up a commit not yet sent to Gerrit with the wrong email address,

git commit --amend --reset-author

Working Environment

The working environment looks much like it would as if Git wasn't around: the working tree is a normal directory with only the subtle presence of a .git directory to suggest otherwise.

After the first 'git clone' operation (or a 'git init' if you're trying to bring uncontrolled software under version control), git will set the 'current branch' to be 'master' and will make sure that the working tree reflects the commit to which the 'master' branch points. Each commit you make will have its parent set to the commit currently marked 'master', and 'master' will be moved to that new commit, ready for the next one.

On Unix, you will probably want to add something like the following near the end of your ~/.bashrc so the current Git branch shows up in your prompt:

parse_git_branch () {

ref=$(git symbolic-ref HEAD 2> /dev/null) || return;

echo "("${ref#refs/heads/}")"

}

# Some variant of this to set your prompt:

export PS1=$PS1$(parse_git_branch)

You can use 'git branch -a' to list the branches your repository knows about, and 'git checkout' to both change your current branch and update your working tree to reflect the files and directories appropriately.

The commits

you will often see people refer to strings of hexagibberish as a 'commit'.
these 40 character identifiers are hashes of the data that they represent (including the file data and any reference to the parent)

the same change made in two identical repositories will create the same (random-looking) identifier.
e.g. commit d00e4478de7ae732751d256dcfb9eb157138de72

it may be worth pointing out that commits have no knowledge of their children

you can work backwards, along the chain of parents from a commit id (or a tag, or a branch, see later), but not 'forwards' without running a relatively expensive search for 'children of this parent'.
this may sound odd compared with other source control systems, but in practice it isn't a problem.

checked-in versions of all files and directories live in .git/objects

each commit represents a snapshot of the full tree of files current at the point the commit was created.
each commit is uniquely identified by a 40-character identifier, but it's usually sufficiently unique to provide only the first seven characters of it.

git will complain if a shortened identifier is ambiguous.

each commit has a reference to its parent commit(s).
collectively, all the commits form a tree(ish) structure based on the parent relationship, whence the concept of branches derives.

Now I've spoiled most of the surprises, this is probably where you should read a tutorial.

Nova Teque

Monday, 14 December 2015

Git pre-cheatcheet