Is using 2 different hash functions a good way to check for file integrity?

MD5 is probably safe for what you're doing, but there's no reason to continue to use a hash with known flaws. In fact, there's no reason you shouldn't be usign SHA256 or SHA512, unless you have some known major performance bottleneck Edit: To clarify, there's no reason to use two algorithms; just use one that fits what you need. If you're worried about people doing MD5 collisions on you (as in, is this a security threat?), then use an algorithm that isn't as weak, such as SHA256 Edit 2: To address an apparently still common misunderstanding: Finding a random collision on a hash is not a 1/2^n probability.

It's closer to 1/2^(n/2). So a 128-bit hash can probably be collided with 2^64 attempts. See birthday attack for details.

MD5 is probably safe for what you're doing, but there's no reason to continue to use a hash with known flaws. In fact, there's no reason you shouldn't be usign SHA256 or SHA512, unless you have some known major performance bottleneck. Edit: To clarify, there's no reason to use two algorithms; just use one that fits what you need.

If you're worried about people doing MD5 collisions on you (as in, is this a security threat? ), then use an algorithm that isn't as weak, such as SHA256. Edit 2: To address an apparently still common misunderstanding: Finding a random collision on a hash is not a 1/2^n probability.It's closer to 1/2^(n/2).

So a 128-bit hash can probably be collided with 2^64 attempts. See birthday attack for details.

Also, using two hashes doesn't add the number of bits getting exponentially harder. Cracking two hashes at the same time is just as easy (or hard) as cracking each separately. Ditch MD5, go for SHA-x alone.

– Javier Feb 11 '09 at 18:35 "Finding a random collision on a hash is not a 1/2^n probability" True if the goal is to come up with a pair of collided documents. False if the goal is to come up with a document that collides with a given document. – Jason S Feb 11 '09 at 18:48 @Jason S: This doesn't rule out scenario 1) generate two collided documents, 2) write one 3) overwrite with the other 4) profit.(assuming a vulnerable function, e.g. MD5) – Piskvor Feb 11 '09 at 18:57 let me rephrase: "given" means out of your control.

– Jason S Feb 11 '09 at 19:01.

Checking the MD5 hash by itself is sufficient for most purposes. Although if you must, there is no harm in checking the SHA1 in addition. Keep in mind the possibility of catching something you would miss with just the MD5 check is extremely remote.

Note that in terms of scalability, the additional check adds unnecessary load on your server.

For file integrity (e.g. Accidental/random corruption), one hash should suffice. 128 bits = 2-128 probability of an undetected error, which is for all practical purposes small enough. For file cryptographic integrity (e.g. Assurance that someone hasn't maliciously substituted an alternate file), I think you're talking about a belt-and-suspenders approach.

MD5 is considered "weak" in the sense that it is possible to construct two documents with the same hash with a much lower amount of CPU time needed than it would take for a brute-force search ("collision resistance" of MD5 has been broken). But it's not (as far as I know) "weak" from the standpoint of, if you have an arbitrary document X, someone else can create a document Y with the same hash with a much easier time than a brute-force search (MD5 still has "preimage resistance"). (The distinction is like the difference between going to a party and finding two people with the same birthday, vs. finding another person with the same birthday as yours.

) Even if MD5 is broken in that regard, it's improbable that someone can come up with an algorithm to create documents to match an arbitrary MD5 hash and an arbritrary SHA1 hash. This sounds kind of like the tension between the two maxims "don't put all your eggs in one basket" vs. "put all your eggs in one basket, and watch the basket". Or like spending money on two deadbolt locks vs. one deadbolt lock which is twice as good and costs twice as much.

Ideally it would be best to spend CPU time calculating one secure 256-bit hash instead of two less secure 128-bit hashes using different algorithms.(yes I know SHA1 is 160bit, this is just an illustration) You're more likely to get better performance this way for a desired level of security -- that is, if the 256-bit hash isn't broken. If it is broken, you may be better off with the two-algorithm approach just to hedge your bets. But again if this is just integrity to protect against errors, one MD5 hash is fine.

Edit: to cite some useful sources: 1 2 3, "MD5 considered harmful today", RFC4270, NIST's latest update on the SHA-3 competition, and "The SHA-3 Zoo".

Exactly. MD5 is safe for some things, but without details, there's no reason to use it. – MichaelGG Feb 11 '09 at 18:19.

In general, if the MD5 hashes don't match, the SHA1 (or any other similar hash) won't match either. I'm not going to say there aren't possible cases where it couldn't happen (because we all know there are collisions in both algorithms), but I would say it will probably never happen in your situation. My thoughts are that providing one hash is probably sufficient enough; more than one hash becomes arduous to verify (having to verify one is bad enough, depending upon available utilities for the platform), and I seriously doubt you're going to see such amazing corruption of a file as to lead to a perfect collision.

Note: Ignore the stuff about verification being a pain; upon re-reading the question, I revised this - I took the original meaning to be hash verification for users downloading the file. If, of course, that is what was meant, then what I said still applies, I think.

Not what I had in mind originally, but good point: should the files be downloadable, I can provide checksums for the users as they are pre-computed. Nice idea! – Piskvor Feb 11 '09 at 18:18.

Because the two hashes are calculated differently, two files with the same MD5 hash are no more likely to have the same SHA-1 hash than two random files. If your chance of random collision with either hash is (ballpark) 2^128, your chance of random collision in both will be 2^256. In effect, you go from extremely low to extremely, extremely low.It's the equivilent of going from 128-bit to 256-bit encryption in order to avoid having someone randomly guess your 128-bit key.

As a rough estimate, chance for a MD5 false positive is 1/(2^128), chance for a SHA-1 false positive is 1/(2^160), so the chance for a false positive for both algorithms is between 1/(2^128) and 1/(2^288), but you can be pretty sure that it is near 1/(2^288) as both algorithms have been thoroughly tested statistically. At least, when using two different hashes, you are protected very well against intentional attacks at one of the algorithms. EDIT: After some research, I stumbled upon this Wikipedia Note that MD5 birthday attacks can be done in under 1 minute, so it seems better to use a different algorithm as MD5 together with SHA-1 here.

Birthday attacks for SHA-1 take 2^69 operations at the moment.

Except MD5 has known problems and has already led to real-world collision attacks. – MichaelGG Feb 11 '09 at 18:16 Additionally, a random collision is not 1/2^n, it's roughly 1/2^(n/2). See birthday paradox.

– MichaelGG Feb 11 '09 at 18:18.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions