What is a performant string hashing function that results in a 32 bit integer with low collision rates?

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs.

If you're going to use FNV, stick to FNV-1a, since it has acceptable results on the avalanche test (see home.comcast. Net/~bretm/hash/6. Html).

Or just use MurmurHash2, which is better in both speed and distribution (murmurhash.googlepages. Com). – Steven Sudit Jul 10 '09 at 3:38 5 @Steven: MurmurHash hash has only been analzyed by its author.

I've used it in a few different scenarios and the newer version of FNV seems to do a better job. – Matthieu N. Jan 25 at 22:03 @sonicoder: While I'm not going to oversell MurmurHash, plain FNV is downright terrible and FNV-1a is only passable.As it happens, MurmurHash has been extensively analyzed and found useful.

It's still not a cryptographic hash and there are going to be collisions no matter what, but it's still a huge improvement over any type of FNV. – Steven Sudit Jan 26 at 20:50 5 @Steven Sudit: As I said, it was analyzed "only" by its author and no one else. Hence the results of the "analysis" aren't really objective.

– Matthieu N. Jan 30 at 12:07 @sonicoder: I'll speak more bluntly: no, you're mistaken.It was analyzed by a number of third parties, including academic ones. Visit Wikipedia for links.

What's more important is that, not only did it do well in general, but the specific flaws that were found were addressed through the creation of MurmurHash3. – Steven Sudit Feb 3 at 7:25.

For a fixed string-set use gperf. If your string-set changes you have to pick one hash function. That topic has been discussed before: stackoverflow.com/questions/98153.

A perfect hash is a very elegant solution, when available. – Steven Sudit Jan 27 at 18:23.

3 Yes, this is the current leading general purpose hash function for hash tables. It's non-crypto, of course, with a pair of obvious differential. – obecalp Feb 16 '09 at 19:21.

Yay for perfect hash generators! – Chris Jester-Young Sep 22 '08 at 10:08 3 Perfect hashing is NOT appropriate for this application, since the set of names is unknown and changes. Therefore, gperf won't work for this.

– TimB Sep 24 '08 at 4:43.

Another solution that could be even better depending on your use-case is interned strings. This is how symbols work e.g. In Lisp.An interned string is a string object whose value is the address of the actual string bytes. So you create an interned string object by checking in a global table: if the string is in there, you initialize the interned string to the address of that string.

If not, you insert it, and then initialize your interned string. This means that two interned strings built from the same string will have the same value, which is an address. So if N is the number of interned strings in your system, the characteristics are: Slow construction (needs lookup and possibly memory allocation) Requires global data and synchronization in the case of concurrent threads Compare is O(1), because you're comparing addresses, not actual string bytes (this means sorting works well, but it won't be an alphabetic sort).

Cheers, Carl.

Their hashing function is simple to use and most of the stuff in Boost will soon be part of the C++ standard. Some of it already is. Boost hash is as easy as #include int main() { boost::hash string_hash; std::size_t h = string_hash("Hash me"); } You can find boost at boost.org.

Both STL and boost tr1 has extremely weak hash function for strings. – obecalp Feb 16 '09 at 19:19.

There's also a nice article at eternallyconfuzzled.com. Jenkins' One-at-a-Time hash for strings should look something like this: #include uint32_t hash_string(const char * s) { uint32_t hash = 0; for(; *s; ++s) { hash += *s; hash += (hash > 6); } hash += (hash > 11); hash += (hash.

CRC-32. There is about a trillion links on google for it.

5 CRCs are designed for error detection and correction. Their distribution characteristics are typically not very good. – Nick Johnson Sep 22 '08 at 10:09 1 Arachnid has obviously never tried CRC32 as hashes.

They work well. – Nils Pipenbrinck Sep 22 '08 at 10:11 5 "CRC32 was never intended for hash table use. There is really no good reason to use it for this purpose." cf.

Home.comcast. Net/~bretm/hash/8. Html – obecalp Feb 16 '09 at 21:24.

The Hsieh hash function is pretty good, and has some benchmarks/comparisons, as a general hash function in C. Depending on what you want (it's not completely obvious) you might want to consider something like cdb instead.

There is some good discussion in this previous question And a nice overview of how to pick hash functions, as well as statistics about the distribution of several common ones here.

Bob Jenkins has many hash functions available, all of which are fast and have low collision rates.

1 The hashes are quite solid, and technically interesting, but not necessarily fast. Consider that One-at-a-Time hash processes input byte by byte, where other hashes take 4 or even 8 bytes at a time. The speed differnece is substantial!

– Steven Sudit Jul 23 '09 at 15:23 Bob's hashes are very fast: azillionmonkeys. Com/qed/hash. Html – sixlettervariables Jul 23 '09 at 20:20.

You can see what . NET uses on the String.GetHashCode() method using Reflector. I would hazard a guess that Microsoft spent considerable time optimising this.

They have printed in all the MSDN documentation too that it is subject to change all the time. So clearly it is on their "performance tweaking radar" ;-) Would be pretty trivial to port to C++ too I would have thought.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions