Your version control system is not a file system

by Julian Simpson on August 22, 2008

If you find yourself needing to check binary files into your Version Control System, something isn’t right. Your VCS is optimised for tracking changes to source files. When you have multiple revisions of a source file, the VCS has stored the original file and the changes between revisions. This is good.

When you check in a binary, it doesn’t really do that. Most systems just keep a separate copy of the binary for each revision. So if you store 10 revisions of a 100 megabyte file, you can kiss a gigabyte goodbye. You might argue that disks are cheap. Unfortunately the cost of storage isn’t the issue. It’s the downtime to upgrade the server, it’s the admin overhead and risk of moving all of your data to a new disk. Sure you can do it.

Or you could stop using your VCS as the most expensive file system in your organisation.

(image from D. Meutia’s photostream)

Update: I wrote this in response to a contractor putting a 325mb file into my previous employer’s Perforce repository. I should qualify some of the statements in the post – for example there’s every reason to put small binary files in as part of your app. I think most people choose to check in binary dependencies into their projects rather than take the Maven/Ivy route.

Share with the group:
  • Digg
  • del.icio.us
  • Facebook
  • DZone
  • LinkedIn
  • Slashdot
  • StumbleUpon

Related posts:

  1. Versioning Derivative Artifacts Versioning the wrong things is an antipattern of software...

Related posts brought to you by Yet Another Related Posts Plugin.

  • Claudio Bezerra
    Hi, first I'd like to praise a good iniative. I've not seen much material on the web about software building seen from a perspective of software engineering.
    I saw that one visitor, Fabrizio Dutra, mentioned that versioning derivative files is a bad practice and I agree. However not everyone at my office agrees. Do you know of books or articles that confirm this assumption?
    Thanks in advance!
  • simpsonjulian
    Dear Visitor,

    I love comment and debate on my blog. If you'd only make a comment that wasn't abusive, we'd talk about it and I'd probably update the post.

    Can I suggest that if you're going to make abusive comments, you don't do it from [presumably] your employer's netblock?

    J.
  • a visitor
    You're an idiot. Do you know how many kinds of binary files exist in organisations that need to be worked upon by a team, with all the changes tracked over time? What about creatives with images like PSD's? How do you propose tracking changes to these files? Create a new folder on the filesystem for each changed version? Oh no, wait - that's exactly what you're saying is a bad idea. Or how about just don't maintain old versions... no whoops that won't work either because then there's no versioning at all. Or how about... you have a version control system that just uses the binary deltas to track changes in binary files? Wow, guess you never thought of that before writing an article and bothering to publish it.

    douchebag
  • Fabrizio Dutra
    Excell and Word files are note derivate files and must be consider as source files and it is normal.
    The problem is versioning derivate files… (this can be also a JSP files in some systems)
    Versioning derivate files is a bad practice can leave your versions not-repeatable and with difficult to maintain.
  • Douglas Squirrel
    Hear hear! See my response post
    for a cautionary tale of woe caused by big binary files. But what are you supposed to do with Excel or Word files, which don't have a plain-text form but still may need versioning?
blog comments powered by Disqus

Previous post: Roll back a submitted Perforce changelist easily and quickly

Next post: Continuous Integration – in a Box