Version control for binaries?

Most VC systems just punt when it comes to binary files -- they can't be diffed for a human to compare, and they can't be merged. The best feature to have is binary-delta storage of changes, so a few small tweaks to a 100MB file doesn't become 300MB in your repository I found this post on storing binary files in Bazaar. In summary, it's difficult to store binary files without massive storage use because most formats can't be effectively diffed: Short answer, sure... we store binary deltas, but I wouldn't call them optimal binary deltas ... If you are having problems with SVN, then I don't think binary diffs would help you much anyway, considering SVN has binary diffs ... But the truth is, (most?

, many? ) binary files don't binary diff that well anyway. Frequently they are compressed, which means a modification near the beginning tends to have a chain reaction over a large distance (possibly the whole rest of the file) So if the files are small or don't change much, go ahead and use VCS.

If they're large and change often, find or write a specialized tool for managing them.

Most VC systems just punt when it comes to binary files -- they can't be diffed for a human to compare, and they can't be merged. The best feature to have is binary-delta storage of changes, so a few small tweaks to a 100MB file doesn't become 300MB in your repository. I found this post on storing binary files in Bazaar.In summary, it's difficult to store binary files without massive storage use because most formats can't be effectively diffed: Short answer, sure... we store binary deltas, but I wouldn't call them optimal binary deltas.

... If you are having problems with SVN, then I don't think binary diffs would help you much anyway, considering SVN has binary diffs. ... But the truth is, (most? , many?

) binary files don't binary diff that well anyway. Frequently they are compressed, which means a modification near the beginning tends to have a chain reaction over a large distance (possibly the whole rest of the file). So if the files are small or don't change much, go ahead and use VCS.

If they're large and change often, find or write a specialized tool for managing them.

My answer below @Owen discusses a special tool I wrote to handle versioning a binary. Could be helpful. – Owen Sep 19 '08 at 19:22 doh - I thought putting @Owen will automatically link to my answer.

Guess not. – Owen Sep 19 '08 at 19:24 +1 For the quote and the wisdom. I'd think just like that (Though I have VCS'd a 1G disk image from time to time and it was painful (although I could get back previous revisions without fear which was nice to know) – Adam Hawes Jan 30 '09 at 7:59 1 "binary files don't binary diff that well anyway": this wasn't mentioned in the original post and may not be a requirement.

One can't apply general source diff-ing to binaries because of the diverse things they represent: images, audio, proprietary file types. If diff-ing of the binary is required, the ultimate option would be to see what the tool that generates the file can do. Some diff-ing tools such as Beyond Compare will diff images and other binary format.

– Rob Dec 17 '10 at 10:57.

To store large files in Git, look at git-bigfiles. To version and propagate binary files without actually storing them in git, try git-annex. To have git diff binary files by exporting a text version, see defining an external diff driver.

1 for good, terse answer to this question and a few other related ones: stackoverflow. Com/questions/799507/… and superuser. Com/questions/105048/version-control-for-binary-files – toolbear Apr 21 at 22:16.

Most VCS just store the current version, partly because binary diffs can be very large, so the cost of reconstituting a binary file from deltas can make it not worth storing the deltas. Things are different nowadays with super-fast CPUs, and subversion now stores binary files as deltas. The rsync algorithm tends to work well with text files that change, but binary files (eg zipped) do not 'compress' nearly so well.

I don't know how well the subversion algorithm works, but they say it works equally well on binary as on text.

1 +1 for not making a big deal of storing binaries – Rob Dec 17 '10 at 9:34 I was slightly wrong above: svn does a good job of storing binaries and stores efficient deltas - you can see how by looking directly in the repo files, each rev is a new file, so you can see the size of each. – gbjbaanb Dec 21 '10 at 12:48.

Try Git. It won't store reverse-delta, but chances are you're really not wanting to. It will allow your repository to contain binaries, and will track when you put a new one in.It's not trying to be space-efficient, but it is effective.

1 for this option, again not making a big deal of storing binaries – Rob Dec 17 '10 at 9:34.

You might want to take a look at Boar: "Simple version control and backup for photos, videos and other binary files": code.google.com/p/boar.

Subversion could do it. The file size of the repository may get big rather quickly though.

For generic binaries, as @John Millikin said, it's basically a punt. Some systems, such as Perforce integrate (perforce.com/perforce/products/integrati...) with several third-party products, such as image manipulation programs.

I store all my pictures in Git. They don't change much, so space is not as much of an issue as it would be for someone who edits heavily. My favourite features are that Git stores blobs by their hash and that it does diffs efficiently (where possible) over (effectively) all files in the repository.

This means that I can move files around, make copies and change metadata without bloating my repository.

A few years ago my project was using SVN to version control an app we wrote in VBA for Microsoft Access. So, all of the code was inside a Access database which is a binary file. Not good to have all your code inside that.As pointed out, SVN doesn't do a great job of handling binary files.

You can certainly check out and commit but there's no diff or merging. I ended up writing a custom program in VB6 that extracted all the forms, reports, and code from the binary Access file and then versioning that. Then I had to write another custom program in VB to piece it all together back into a function Access file to be deployed.

If you're working with MP3 files maybe you can do something similar to extract the ID3 info to text files and then version those text files.

You could check out bsdiff, which claims to support executable diffs (specificly) well. I am still testing integrating it into a custom ASP mini-VCS, so not much experience using it - but from the paper, it looks like it would handle most non-stream compressed binary files well.

Even for text documents, Word documents, etc., a source control system will keep track of your versions. As for which one to use, there are a number of free ones available, that require different levels of administration and expertise. If you're mostly comfortable with Windows programs, SourceGear licenses their Vault product free for single users.

You can try to use an own defined subversion property to store the timestamp. If you change this timestamp you have a "changed" state of the file and can commit it. You can access the contents of the properties by using the commandline.

Also you have different times of each file However, you may need some minor additional scripts to set/read the properties.

You can try to use an own defined subversion property to store the timestamp. If you change this timestamp you have a "changed" state of the file and can commit it. You can access the contents of the properties by using the commandline.

Also you have different times of each file. However, you may need some minor additional scripts to set/read the properties.

In git, a "commit" points to a single tree, marking it as what the project looked like at a certain point in time. It contains meta-information about that point in time, such as a timestamp, the author of the changes since the last commit, a pointer to the previous commit(s), etc.

Subversion has use-commit-times option which makes local copy use timestamps of last commit time of every file. Also svn export always sets last commit time.

Unfortunately this would set the modified time to the date of the commit instead of keeping the original timestamp. – chris Mar 2 '09 at 16:11.

This is one of those requests that the subversion developers have on the radar, but have not yet implemented. As 'Nerdling' suggests, git or mercurial might be a better option, but you have a couple options in Subversion (since you are looking for something like SVN): There was a discussion and perl script on the SVN website that had some workarounds. It preserved the modified date on the initial commit... thereafter, it uses the commit time, but you could probably modify the script to use the modified time.It is perl, so your mileage on Windows may vary (I've not used cygwin and/or perl on windows): svn.haxx.se/users/archive-2006-10/1345.s... You can change the svn:date property of each revision after the commit to match the modified time.

This will change the modified time of every file in the commit, though, so if that's undesirable, be sure to commit the binaries on their own: svnbook.red-bean.com/en/1.5/svn.ref.prop... and svnbook.red-bean.com/en/1.5/svn.advanced... I think Tortoise SVN on windows may have some options to handle setting the date property on a commit. I've only heard this rumour, and not used it, so this is hearsay :-) None of these are particularly elegant, and require some extra effort on your part, but they will work if you want to stick with SVN, which is certainly a good product. If you decide to explore another SCM like git or mercurial, note that you CAN use them just like SVN and ignore their other features to help ease the transition.

Thanks - your first solutions might actually work but we have hundreds of files and if I do a commit for each one of them it will get really messy. – chris Mar 2 '09 at 16:09 In that case, you may which to write a custom script (perhaps a variant of the perl script above) to look for binary files in your commit by file type, and then have the script set the svn:date for them. You can tie this script to the SVN post_commit hook so it runs automatically after each commit.

– Jarret Hardie Mar 2 '09 at 18:57.

I need a version control system that works like Subversion but is able to keep the 'modified' timestamp (date) of each file. We need to version our setup projects. In this case it is imporatant that the input files (dll/exe's) keep their timestamp.

What is the best tool to do this?

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions