I'm trying to hash some strings between 0 and a very low n in order to give one color per user.
Here is my (working) code:
function nameToColor(name) {
var colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple'];
var hash = hashStr(name);
var index = hash % colors.length;
return colors[index];
}
//djb2 hash
function hashStr(str) {
var hash = 5381;
for (var i = 0; i < str.length; i++) {
var charCode = str.charCodeAt(i);
hash = ((hash << 5) + hash) + charCode; /* hash * 33 + c */
}
return hash;
}
Unfortunately the low numbers are massively over-represented.
Question:
How can I write a deterministic javascript function that takes any string as argument and returns with a good (as uniform as possible) distribution a number between 0 and n?
I'm trying to hash some strings between 0 and a very low n in order to give one color per user.
Here is my (working) code:
function nameToColor(name) {
var colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple'];
var hash = hashStr(name);
var index = hash % colors.length;
return colors[index];
}
//djb2 hash
function hashStr(str) {
var hash = 5381;
for (var i = 0; i < str.length; i++) {
var charCode = str.charCodeAt(i);
hash = ((hash << 5) + hash) + charCode; /* hash * 33 + c */
}
return hash;
}
Unfortunately the low numbers are massively over-represented.
Question:
How can I write a deterministic javascript function that takes any string as argument and returns with a good (as uniform as possible) distribution a number between 0 and n?
Share Improve this question edited Nov 18, 2013 at 18:48 George Stocker 57.9k29 gold badges181 silver badges238 bronze badges asked Jun 13, 2013 at 9:12 L. SannaL. Sanna 6,5527 gold badges35 silver badges47 bronze badges 11- 2 Hashing is well understood and way out of scope of this question. Maybe just use one of the gazillion hash functions that already exist. – Hogan Commented Jun 13, 2013 at 9:14
- Also Benford's law deals with leading digits (the leftmost) you aren't using the leading digit, you are using the modulus of the hash result. – Hogan Commented Jun 13, 2013 at 9:19
- 1 A "deterministic javascript function that takes any string as argument and returns with a good distribution a number between 0 and n" is a hashing function. This is the definition of a hashing function. You are asking how to write a good hashing function, that question is the same. – Hogan Commented Jun 13, 2013 at 9:24
- 1 Here is another link for you erlycoder./49/… – Hogan Commented Jun 13, 2013 at 9:35
- 1 This question was not a duplicate. He clearly asks for a uniform hash function. The other question does not specify this criterion. – MRocklin Commented Oct 17, 2013 at 17:40
3 Answers
Reset to default 11Hogan gave in ment a link to several hash implementation in javascript. It turns out that the most simple is the most appropriate:
function nameToColor(name) {
var colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple'];
var hash = hashStr(name);
var index = hash % colors.length;
return colors[index];
}
//very simple hash
function hashStr(str) {
var hash = 0;
for (var i = 0; i < str.length; i++) {
var charCode = str.charCodeAt(i);
hash += charCode;
}
return hash;
}
I think it works well because it only uses the addition (no shift or multiplications) which leave the modulo unchanged, so the initial quality of distribution is conserved.
I also found this on wikipedia, but did not have to use it:
In many applications, the range of hash values may be different for each run of the program, or may change along the same run (for instance, when a hash table needs to be expanded). In those situations, one needs a hash function which takes two parameters—the input data z, and the number n of allowed hash values.
A mon solution is to pute a fixed hash function with a very large range (say, 0 to 232 − 1), divide the result by n, and use the division's remainder. If n is itself a power of 2, this can be done by bit masking and bit shifting. When this approach is used, the hash function must be chosen so that the result has fairly uniform distribution between 0 and n − 1, for any value of n that may occur in the application. Depending on the function, the remainder may be uniform only for certain values of n, e.g. odd or prime numbers.
We can allow the table size n to not be a power of 2 and still not have to perform any remainder or division operation, as these putations are sometimes costly. For example, let n be significantly less than 2b. Consider a pseudo random number generator (PRNG) function P(key) that is uniform on the interval [0, 2b − 1]. A hash function uniform on the interval [0, n-1] is n P(key)/2b. We can replace the division by a (possibly faster) right bit shift: nP(key)>> b.
The following hash function, by Brian White, is very generic, use any kind of input (including strings), es with simple examples, and is written for Javascript node.js.
https://npmjs/package/xxhash
Hope this helps
Here is a variation of the code above:
function hashValue(theString,size){
var sum = 0;
for(i=0;i<theString.length;i++){
sum += theString[i].charCodeAt(0) * 3;
}
return sum % size;
}
Simply pass a string and the size you want it to have, for example, 36 if you want it to return numbers 0 through 36. The * 3 can add variation but can be whatever number you want. I repurposed this idea from here (Hash function that can return a integer range based on string) by M_callens