Matt Erkkila

digg.git – part 1

Over the last 12-16 months Digg has seen some pretty big growth internally.  Our engineering team has tripled in size and we added both a QA and R&D team.  We went from releasing code once every 3-4 months to releasing code every day.  We encountered a number of roadblocks during the transition, some easy to overcome, others, not so easy.  We had to change some of our processes, adapt some new technology, and change a lot of peoples mindsets.

When I started with the company the entire engineering team would work on the same project together, everyone in the company would get together in a conference room for an afternoon or two and do QA, and when it was all done, we released it and moved on to the next project.  Fast-forward to today, we have multiple projects, all being developed in parallel, with QA testing each feature as the engineer writes it.  And as soon as it is finished and signed off on, it’s released.

In the beginning we used Subversion.  We checked out, branched, committed, merged, and tagged.  It was far from perfect, it didn’t make anyone happy, but it’s what we had.  And it worked, mostly.  It was simple to understand and pretty easy to use.  But it didn’t scale.  As our engineering team grew, more and more work was being done in the same code base, we started stepping on each others toes more and more.

The integration process (merging a feature into trunk for release) would sometime take days and result in numerous conflicts in the code.  It was up to the person doing the merging to resolve it himself or try and track down the original developer to determine which pieces needed to be kept.  This resulted in both a large number of bugs as well as code just simply disappearing.  Something that was costing us time, money, and impacting our users experience.  All of which we take very seriously.

We knew there must be a better way, so we started looking for it.  We tried upgrading from subversion 1.4 to 1.5, which has “merge tracking”.  We hoped it would make our integration process easier.  But after working with it for a few weeks we realized that the “merge tracking” feature was akin to putting lipstick on a pig.  And as nice as it sounded, it just wasn’t going to cut it.

Then everyone started talking about Git.  A few of us had heard about it, only a couple had ever used it, and mostly just for personal projects.  But it was the next cool thing.  It was GREAT!  It’s going to make our engineers more productive and make their lives so much easier.  It was like someone was handing us a nice tall cold glass of cherry kool-aid and telling us everything was going to be OK.  Would you drink it?  Well, we did.  Not the entire glass at first, just a little sip.  Then we realized, it wasn’t so bad, it actually felt kind of good.  So what did we do?  In true Digg fashion, we chugged the rest of the glass and asked for seconds.

So why is Git so great?  Well, simply put, it’s powerful and it’s distributed.

It allows you to do complex tasks with only a few commands.  It makes things like branching and merging almost completely thoughtless.  It even allows you, and goes so far as to encourage you, to rewrite history.  It really lets each engineer work the way they want to.  And when they’re ready, allows them to push their code upstream for others to see.  It manages code and changesets so drastically different from Subversion that it truly takes a complete mental shift about how you think about source control to even begin to understand it.  But believe me, it’s a good thing.

But Git does have it’s downsides.  At first it can be difficult to understand what is really happening behind the scenes when you run a command and if something does go wrong, you really need to understand how Git works internally to correct your mistake.  It also isn’t completely polished yet, there are no good GUI’s for it and the error messages you get when a command fails might as well be written in sanskrit.

Even with all of its downsides, its steep learning curve, confusing error messages, and lack of polished UI’s.  It works.  It works so well in fact that after switching to Git, we were able to reduce the time our integration process took from days to minutes.  Literally from 1-2 days down to about 30 minutes.  And, after it’s done, we have much more confidence that the merge was done correctly.  We’re also able to release code on a daily basis, with very minimal effort, something that was extremely painful to do under Subversion.

Stay tuned for part two.