I have a problem where I need to be able to generate identical uniformly distributed numeric hashs for a GUID in both javascript and C#. I guess that would prevent me from using Guid.GetHashCode()
in C#, since I can't reproduce the behavior in JS without reverse engineering the C#.
Is there a fast way to produce hashes from guids/strings in JS? Are all digits of the string uniformly distributed in a .NET generated GUID? Should I just cast/convert the trailing chars into an int?
I have a problem where I need to be able to generate identical uniformly distributed numeric hashs for a GUID in both javascript and C#. I guess that would prevent me from using Guid.GetHashCode()
in C#, since I can't reproduce the behavior in JS without reverse engineering the C#.
Is there a fast way to produce hashes from guids/strings in JS? Are all digits of the string uniformly distributed in a .NET generated GUID? Should I just cast/convert the trailing chars into an int?
Share Improve this question asked Aug 18, 2010 at 5:44 Andrew MatthewsAndrew Matthews 3,1662 gold badges30 silver badges47 bronze badges3 Answers
Reset to default 5The bytes are apparently not evenly distributed.
I put together some code to sample the .NET Guids and plot the distribution:
First of all the test code, this creates one million Guids and counts the number of different values for each byte in the byte array. It outputs it all into a matrix that I plot in Scilab.
int[,] counter = new int[16, 256];
for (int i = 0; i < 1000000; i++)
{
var g = Guid.NewGuid();
var bytes = g.ToByteArray();
for (int idx = 0; idx < 16; idx++)
{
counter[idx, bytes[idx]]++;
}
}
StringBuilder sb = new StringBuilder();
sb.AppendLine("x = [");
for (int idx = 0; idx < 16; idx++)
{
for (int b = 0; b < 256; b++)
{
sb.Append(counter[idx, b]);
if (idx != 255)
{
sb.Append(" ");
}
}
if (idx != 15)
{
sb.AppendLine(";");
}
}
sb.AppendLine("]");
File.WriteAllText("plot.sce", sb.ToString());
Here are the distributions, the graphs plot the number of each distinct value for each of the positions in the byte array:
The value distribution for the positions 0-6 in the byte array:
The value distribution for the position 7 in the byte array:
The value distribution for the position 8 in the byte array:
The value distribution for the positions 9-15 in the byte array:
For byte positions 0-6 and 9-15 the distribution of values seems to be even, but for byte position 7 and 8 the distribution is fairly limited.
That is, for the guid (with the beginning of the byte positions below, note strange ordering)
{1369ea05-b9f9-408b-ac7c-7ebd0f35d562}
1 1 1 1 1 1
3 2 1 0 5 4 7 6 8 9 0 1 2 3 4 5
The position 7 can take the values from 64 (0x40) to 79 (0x4F).
The position 8 can take the values from 128 (0x80) to 191 (0xBF).
The rest of the bytes are evenly distributed.
Note: The tests was run on .NET4 on a 32 bit Windows 7 machine.
Lesson: don't assume stuff, test.
Answer: To use the .NET Guids for calculating your load balancing, you can use any part except the positions marked 7 and 8 in the Guid above.
Question: Does anybody know WHY the distribution is not evenly spread?
you can create a web service to generate the hash value on the server side, use whatever language you want. on client side, a simple web service call will do the trick.
Reflector says the .NET Guid.GetHashCode() is implemented like this
public override int GetHashCode()
{
return ((this._a ^ ((this._b << 0x10) | ((ushort) this._c))) ^ ((this._f << 0x18) | this._k));
}
_a, _b, _c and _f is defined in the constructor taking a byte[16] array
public Guid(byte[] b)
{
if (b == null)
{
throw new ArgumentNullException("b");
}
if (b.Length != 0x10)
{
throw new ArgumentException(Environment.GetResourceString("Arg_GuidArrayCtor", new object[] { "16" }));
}
this._a = (((b[3] << 0x18) | (b[2] << 0x10)) | (b[1] << 8)) | b[0];
this._b = (short) ((b[5] << 8) | b[4]);
this._c = (short) ((b[7] << 8) | b[6]);
this._d = b[8];
this._e = b[9];
this._f = b[10];
this._g = b[11];
this._h = b[12];
this._i = b[13];
this._j = b[14];
this._k = b[15];
}