Monday, 14 December 2015

Git pre-cheatcheet


Repository structure

If you are unfamiliar with Git, a broad outline may help introduce terminology before reading one of the many Git tutorials online.

Git is one of several source control systems - others include Subversion, RCS, Mercurial - which are used to keep a record of program source code as development progresses, as a means of keeping track of what went into a particular release, tracking down where a bug was introduced, and any other operation that benefits from a full and detailed history.

A Git repository is where the entirety of a particular project lives. A repository is completely contained within its directory, so a normal system-level copy (or rename, or delete) of an entire Git repository is a safe operation. The resulting copy is both independent of and unaware of the original.

Data

The working tree

The files and directories - the things you consider to be 'under source control' that you change in your editor or otherwise, live under the top repository directory but outside of its .git/ subdirectory. Only the top level directory has a .git/ subdirectory, unlike svn - say.

You can edit, delete, and rename files without telling Git until you want to: Git doesn't operate as a daemon or service, so it only runs while you execute a git command. It is very good at working out what changes you've made.

During your working, you can choose to ask Git to remember some or all of the latest differences between how your files are now and how Git's permanent repository remembers them. The place it remembers them is called 'the staging area' and remembering them is called 'staging'. You can also remove or exclude selected changes from staging if it suits you. Staging diffs is easy, and is similar in concept to creating a patch file. Staging is the precursor to committing those changes as a permanent record in the Git repository.

.gitignore

This is a plain text configuration file which contains filename patterns that git should consider invisible for the purposes of source control - e.g. *.o, *.class, bin/
  • it is outside the .git directory, along with the 'normal' files, in order to keep it under source control: everyone in a project typically needs the same exclusions. 
  • it can refer to itself, to excuse itself from source control! 

Metadata

The 'behind the scenes' source control stuff all lives within the .git/ directory at the root of the repository.

The index

  • the index - a set of file changes you're preparing (with 'git add') to create in the repository as a single commit (version); it lives in .git/index 
    • file changes you've made but haven't yet told Git about ('unstaged' changes) coexist alongside ones you have added to the index ('staged' changes). 
    • creating a commit (with 'git commit') copies the staged changes (i.e. those currently in the index) into a new commit and then clears the index 
      • creating a commit does not affect the working tree. 
      • the new commit becomes the current commit and incorporates the staged changes, so they no longer show up as differences between the commit and the working tree. 
      • unstaged changes still show up as differences, until added to the index (or reverted). 
    • the index can even be told to stage selected 'hunks' (patches/blocks of change) within a file, while leaving other hunks within the very same file unstaged. 
      • If this sounds useful to you, look for documentation on 'git add -p' or the tool 'gitk'. 

Branches, tags, and stash

  • these are ways to refer to particular commits, and they live in .git/refs and .git/packed-refs 
    • a tag marks a particular commit and is only ever moved manually; other SCMs might call it a label. 
    • a branch is a special mobile tag that is updated whenever a commit is made to that branch. 
      • a branch is thus really a label that is always on the commit that most SCMs would call the branch/HEAD version. 
      • local branches live in .git/refs/heads 
      • branches from other repositories are maintained in .git/refs/remotes 
    • the stash is a special set of commits used to hold temporary changes. 
  • Branches and tags are very cheap and almost identical - you can make them whenever you want to remember a particular point in development. 
    • if curiosity leads you to look in .git/refs/heads/master, you will see the 40 character identifier for the commit that is the master branch. Much Git metadata is in plain text files. 

Configuration

  • information about which remote repositories this repository can talk to, how local branches relate to remote branches, etc, lives in .git/config. 

Hooks

  • special scripts that run for various source-control events to enforce policy. 
  • unfortunately, hooks are within the .git directory and not therefore under source control. 
    • For instance, there could be a hook that runs when you 'commit', to ensure that every commit comment contains a JIRA ticket reference. 

Global settings

User preferences - real name, email address, preferences, command aliases, etc - are held in a user's home directory in a file called ~/.gitconfig. If you set them earlier, with the commands

git config --global --add user.email this.person@here.com
git config --global --add user.name "This Person"

then you will be able to see them in that file. You should set them now if you haven't already, because if you fail to set your email address, you will not be able to upload to Gerrit.

If you messed up a commit not yet sent to Gerrit with the wrong email address,
git commit --amend --reset-author

Working Environment

The working environment looks much like it would as if Git wasn't around: the working tree is a normal directory with only the subtle presence of a .git directory to suggest otherwise.

After the first 'git clone' operation (or a 'git init' if you're trying to bring uncontrolled software under version control), git will set the 'current branch' to be 'master' and will make sure that the working tree reflects the commit to which the 'master' branch points. Each commit you make will have its parent set to the commit currently marked 'master', and 'master' will be moved to that new commit, ready for the next one.


On Unix, you will probably want to add something like the following near the end of your ~/.bashrc so the current Git branch shows up in your prompt:

parse_git_branch () {
 ref=$(git symbolic-ref HEAD 2> /dev/null) || return;
 echo "("${ref#refs/heads/}")"
}
# Some variant of this to set your prompt:
export PS1=$PS1$(parse_git_branch)
You can use 'git branch -a' to list the branches your repository knows about, and 'git checkout' to both change your current branch and update your working tree to reflect the files and directories appropriately.

The commits

  • you will often see people refer to strings of hexagibberish as a 'commit'. 
  • these 40 character identifiers are hashes of the data that they represent (including the file data and any reference to the parent) 
    • the same change made in two identical repositories will create the same (random-looking) identifier. 
    • e.g. commit d00e4478de7ae732751d256dcfb9eb157138de72 
  • it may be worth pointing out that commits have no knowledge of their children 
    • you can work backwards, along the chain of parents from a commit id (or a tag, or a branch, see later), but not 'forwards' without running a relatively expensive search for 'children of this parent'. 
    • this may sound odd compared with other source control systems, but in practice it isn't a problem. 
  • checked-in versions of all files and directories live in .git/objects 
    • each commit represents a snapshot of the full tree of files current at the point the commit was created. 
    • each commit is uniquely identified by a 40-character identifier, but it's usually sufficiently unique to provide only the first seven characters of it. 
      • git will complain if a shortened identifier is ambiguous. 
    • each commit has a reference to its parent commit(s). 
    • collectively, all the commits form a tree(ish) structure based on the parent relationship, whence the concept of branches derives. 
Now I've spoiled most of the surprises, this is probably where you should read a tutorial.

Friday, 27 November 2015

Ruby is evil

I've been programming in Ruby for about nine months several years now, and... like the best and most elaborate practical jokes there are some pretty clever bits in it, and a lot of libraries to fool people into using it - INTERCAL failed because it was clearly stupid. I find Ruby makes coding slower like a constant battle against a combination of evil-minded trolls and well-meaning but deluded innocents, in one of those forests where tree roots are exposed in a way that makes you think they're forever trying to trip you up. It sometimes reduces me to fantasies of going back in time with a baseball bat and stopping Ruby from ever being coded, because it's so choc-full of strange bewilderingly stupid and/or lazy choices - as if the people involved were panic-strickenly desperate to distinguish themselves from other languages and couldn't think further than the ends of their noses in their rush to bodge something together. In truth, Ruby's mainly suffering from failing to learn from every mistake made by every other programming language, and having had no forward planning until it had already committed to painting itself into various stupid corners. It helps that my other colleagues also hate it - even the guy who likes Python! I thought duck-type fans would stick together!

I'm also very amused: Ruby's originator has called Perl a 'toy language'.

So, cons: Fundamentally, doing things that are surprising in a bad way.
  • Duck-typing is probably the single biggest bit of brain-damage: a single global namespace for all method names. Various languages have duck-typing because the designers were too star-struck that they were 'playing in the big leagues' by bodging together their own language, to consider the real-world ramifications. Honestly: given that Ruby felt it necessary to have modules to provide namespaces so that class names don't collide, why did Mats fail to engage his brain and just leave method names in some erzats global namespace? Even IDEs can't distinguish one object's 'start' method from another - what hope does a human have?
    • Duck-typing is supposed to provide flexibility and dynamic programming, but that sounds like the self-modifying code arguments back in 1970. Militant duck typing where I can't put object.MyInterface::start even for a run-time check that this object provides a 'start' that implements the one in MyInterface, is either plain incompetence, a mark of a toy language intended only for teaching, or outright spiteful.
    • Failing to provide a concise check that an object implements a given method just makes me think that the Ruby authors want Ruby to be ugly: object.responds_to?(:method) ? object.method(...) : value_if_not_implemented. Makes it fairly clear Ruby is just a big ugly practical joke that got out of hand. Oh! Maybe that's what Mats meant by 'toy': Perl isn't nearly as ugly as Ruby, so Ruby wins on the practical joke front.
    • Somewhat related: nil&.method is now available as a short-cut run-if-not-nil check.
      • Shame nil&&.method wasn't defined as a check vs method-defined at the same time - that would have upped the practical joke value significantly.
      • I particularly love the array-access variant you sometimes have to use: xxx&.[](1) because not all array-looking things implement get() as an alias for [].
  • Meta-methods mixed into application-level namespace, e.g. 'methods' isn't '_methods' to distinguish it from... oh, say, someone who needs to use the word for the actual programming task.
  • Methods which are *almost* identical, e.g. << and push, where a << *[:A, :B] is illegal and a.push(*[:A, :B]) isn't.
  • Methods which Rubocop implies are equivalent: 'Use each_with_object instead of inject' without mentioning that the memo object is at the OTHER end of the block's parameter list. Yes, I know 'Rubocop' isn't Ruby, but it's doing one of the jobs Ruby should be doing.
    • While I'm on Rubocop, it fails dismally because its rules are never justified and from what I can tell, the most of default set of rules do not line up with actual Ruby developer practice. The whole nonsense about enforcing single vs double quoted strings.
    • What annoys me with Rubocop is the ridiculous tweaks Ruby demands: Ruby has weird well-known inefficiency gotchyas that no developer should be bothered with, and which could clearly be handled in the language layer instead of cluttering and obscuring production code.
    • Like being told not to write 'if this == "a" || this == "b"' like any normal language, but to write 'if %w{a b}.include?(this)' as if that's not something that should be done IN THE INTERPRETER ITSELF, WITH TRIVIAL ANALYSIS OF THE PARSE TREE. Lazy Ruby maintainers. Lazy lazy lazy. I have contempt for Ruby - so much effort wasted in a quagmire of bad decisions.
    • And these little tweaks get in the way! Like if I do "sipped'.sub('ped', '') it tells me to replace it with 'sipped'.delete('ped'), but if I want to change 'ped' for /ped/, then I HAVE TO CHANGE IT BACK TO sub(/ped/, ''). AND what moron thought that 'sub' should be shortened from 'substitute' or 'subst', but 'delete' should NOT be shortened to 'del'?!?!?
  • Methods which *are* identical! Sure, it's annoying that some languages use size when others use length or whatever, but at least you're not worrying about whether there might be subtle differences of meaning between 'find' and 'detect' or 'wait2' and 'waitpid2' - because, you know, Ruby provides methods like a hedgehog has fleas.
  • Methods which should be default opposites: "a b c".split.join="abc". The default should probably be splitting/joining as lines, which is almost certainly more common than splitting into words.
  • Wilfully mangled English: in natural language, one says 'If the file exists'. In Ruby, 'if File.exists?' is deprecated in favour of 'if File.exist?' which sounds like you're making fun of foreigners.
  • The Ruby community's wanton use of monkey patching and dynamic programming. Honestly guys, self-modifying code was acknowledged as a stupid idea back in 6502 days, but at least they had the excuse of having less than 12K of RAM to play with if they were in a high colour screen mode.
  • Strongly-typed-but-dynamic languages in general. It's like a bad joke. You should either have weak typing and dynamic typing - e.g. Perl, where it will forgive trying to perform "10" + 10 because you've just parsed "10" from a regexp, and return 20 like you meant - or you should have strong typing and static typing - e.g. Java, where it will tell you up front when you're doing something stupid anywhere in the code. Programs written in STDLs crash due to errors that the IDE can't warn you about. It's like programming in BCPL. Welcome to the 1960s, home of many of the mistakes that Ruby decided to implement.
    • Kotlin is inspired by Ruby (!) but has useful strong typing (and type inference).
    • Groovy has optional static typing - warning you of mistakes before you run.
    • Even Python has something like that now, I believe, although metaprogramming will always let you hang yourself - unless you've replaced the 'hang_self()' method with something that lets you shoot yourself in the foot instead.
  • Lack of statement separators. Well, that kind of goes hand in hand with STDLs: sloppy syntax for sloppy code(rs). I don't like this in Groovy either - it should be a compile flag or something.
  • Optional braces for functions. Especially no-argument functions. Especially given that you're supposed not to write methods call 'getXxxx' so you end up with xxx = object.xxx, or just xxx = xxx.
  • Optional braces for functions... except they're only sometimes optional: do_something if !filename.nil? && Dir.exist? filename fails. You MUST have brackets in this case.
  • Use of threads and exceptions for timing out operations in a fundamentally broken way.
  • Hash.merge should have been called something clearly non-commutative, perhaps env.prefers(key: :thisvaluepreferred), giving the opportunity to provide the same method with the other priority as env.defaults(key: :thisvalueifunset)
    • Ruby programmers encouraging me to monkey-patch 'hash' to add these will be laughed at.
  • Habitual monkey-patching of base classes. Also known as 'how to make your programs even harder to understand'.
  • Methods with familiar names but broken implementations, and yes I know 'grep' performs === matching, but don't sodding call it grep then: General REGULAR EXPRESSION Parser. Call it 'is' or something.
    • $ echo foo | grep .
    • foo
    • $ ruby -le 'p ["foo"].grep(".")'
    • []
  • Related concepts with unrelated names: If you have a method taking a block parameter, you say 'def function(&block)'. You can name 'block' something useful like 'filter' or 'setter' or whatever if you like clearer code. Now you might think the idiom for calling it would be 'if !block.nil? block.call else some-default-operation end', but in fact you're supposed to call 'yield' instead of 'block.call' and instead of '!block.nil?' or 'yield?', you use the 'block_given?' magic method.
  • Plain inconsistency: RuboCop deprecates the use of Perl regexp variables $1, $2, etc, but the block version of gsub doesn't provide the match object to its block, but the whole matched string. If you want to claim that Ruby is OO, then do that. And sure, I could use positional parameters in the regex, but that turns a simple one-liner into a multi-line monstrosity that is much slower to read.
  • More inconsistency: 'd'.split(':') => ["d"] but ''.split(':') => [] rather than ['']. It's entirely arguable that this is useful, but it's also one more nail in the coffin of inconsistency: empty strings aren't considered false, so why should splitting an empty string do something weird?
  • Deprecated 'return' keyword: while this looks like a clever idea (and return most recent evaluation is something Perl does too) the mantra that you don't use 'return' unless you have to means you end up with ugly 'last statement is a plain variable reference' which... is just ugly. It really doesn't help that when you select the final statement and 'extract variable' in RubyMine, it doesn't extract `{}` into `foo = {}; foo` but merely `foo = {}` so there's no 'between' to put intervening code. And for people accustomed to real programming languages, you end up returning nil or something.
  • Quirky and line-continuation behaviour: [ :a ].tap { puts :t1 }.tap { puts :t2 }.tap { puts :t3 } would - in most languages - be split before the '.' character. Ruby originally only supported splitting after the '.' character, and it shows: If you try to comment out a '.tap {}' in the middle of the list - as you would in any other language - subsequent clauses are a syntax error. The workaround is either to use the laughably ugly \ line continuation character on the preceding line, or to put the dots at the other end, where they're hard to see/check unless you engage in ascii art extra spaces to line them up (which is extremely bad practice because a line length change in one line causes line length changes on all the other lines, which makes scanning patches/diffs for code review harder).
  • Even with its 'only throw an exception if there's a problem with the code' operation, it fails in a way that shell, for instance, doesn't: shell only dies if it actually encounters an error while executing: ruby fails if the script has a flaw anywhere (often, but not - of course - always: Ruby loves being inconsistent).
  • The obvious interpretation is often wrong:
    • 2.1.5 :001 > a = 1
    •  => 1 
    • 2.1.5 :002 > puts defined? b ? 1 : 2 ; puts defined? a ? 3 : 4 
    • expression
    • expression
    •  => nil 
  • The obvious interpretation is often wrong, 2:
    • 2.1.5 :008 >   defined? a
    •  => nil 
    • 2.1.5 :009 > if false ; a = 2 ; end
    •  => nil 
    • 2.1.5 :010 > defined? a
    •  => "local-variable" 
  • The obvious interpretation is often wrong, 3:
    • 2.6.3 :018 > while true ; y = 3 ; break ; end ; y
    •  => 3 
    • 2.6.3 :019 > loop do ; x = 3 ; break ; end ; x
    • NameError (undefined local variable or method `x' for main:Object)
  • The latest issue: In a unit test I was pairing on, I created a 'for_testing' setter function for the @@logger class variable in a collaborator class. Yes, I know that's bad practice and I later mocked an internal accessor instead. However, my colleague replaced my setter by putting @@logger=TestLogger.new into the test... and it worked! Suddenly the TestLogger object was being used by a class not even mentioned in the unit test! After lots of head-scratching, we called yet another colleague in, who pointed out that @@logger was being set on the global Object. Even strings had access to the TestLogger. So while I know - and knew - this is documented Ruby behaviour, it's (1) fscking stupid, and (2) programmer-hostile. Surely setting things on Object should cause at least a warning, if not an outright error unless Object is being changed in a carefully and explicitly scoped context - maybe Object.unfrozen { def something ; end }.
  • People writing their own clever dispatch mechanisms, and making things hard. Fastlane, for instance, reports any mistyped identifier as a 'lane' that hasn't been defined, and suppresses any stack traces that might help track things down. And heaven forfend that you might want to try to work out where - say - the 'pilot' lane's code starts, so you can work out which of the welter of options enables or disables some random bit of match you want. Ruby has 'pry' and 'pry-nav' to allow single-stepping through, but the nonsense it goes through to examine classes for methods that it might want to run, instead of just... relying on the language to do it?!?!
Pros:
  • I'm boggled that when I debug a rakefile in RubyMine, Ruby programs executed from that rakefile through 'system' calls and similar... are also debugged! Crazy. Impressive. But probably RubyMine's fault.
    • Which said, it's also therefore RubyMine's fault when debugging just plain refuses to work, with some mysterious error about being unable to load the first Gem in the gemfile and no hints, and switching the... oh, gosh, is it Preferences or Default settings Ruby between the 'global' one and the 'rvm' one makes no sodding difference.
  • The package system is pretty good, and RVM eases the pain of Ruby versions. When RubyMine doesn't interfere, and there aren't package clashes, or you're not forced to stay on some particular version because of Monkeypatching. I'd tend to equate the bundler with Maven's repository system: a great idea buried in a polished turd.
  • ... I'm trying to find real pros about Ruby, really I am, but it's a huge uphill struggle. I think we're eventually going to transition to Groovy or Kotlin via JRuby: somethin... anything that has static type checks available if you need them.