I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for pression.
I was thinking of implementing a base 62 number.toString(62) and parseInt(pressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.
The basic specs are: - Compress large number arrays into strings for JSONP transfer (So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip pression either.
Any ideas would be greatly appreciated.
Thanks
Guido Tapia
I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for pression.
I was thinking of implementing a base 62 number.toString(62) and parseInt(pressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.
The basic specs are: - Compress large number arrays into strings for JSONP transfer (So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip pression either.
Any ideas would be greatly appreciated.
Thanks
Guido Tapia
Share Improve this question edited Apr 8, 2010 at 0:26 Justin Johnson 31.3k7 gold badges66 silver badges89 bronze badges asked Apr 7, 2010 at 23:03 gatapiagatapia 3,6625 gold badges42 silver badges51 bronze badges 12- Why not gzip? What's wrong with that? (or more likely, DEFLATE) You can also do something like just Huffman, or just LZW, if you want to keep it simpler. Huffman coding in Javascript: tom-ash/blogs/Blog.aspx?File=Programming/… – Cheeso Commented Apr 7, 2010 at 23:05
- Just to be clear - were sending data from the client to the server via javascript and ajax/jsonp ? – James Westgate Commented Apr 7, 2010 at 23:11
- @James: Correct we are sending a very large number array as a string to a server. The array is not in JSON (so no silly xml tags - edit oops I meant curly braces) just a string.join('.') between the ints. This string is then also stored in a database (which is the real size issue). – gatapia Commented Apr 7, 2010 at 23:23
- @Cheeso:I don't want to put the pression / depression load on the server. I want the client (browser) to incur this cost so gzip is out unless u can think of a way of a way of using a plex pression algorithm in JS. – gatapia Commented Apr 7, 2010 at 23:27
- 1 @nick: We are talking requests of ~200k each. And we are talking ~100,000 requests a day and we keep 2 weeks of data, however that 100,000 requests / day are increasing :) If I can get those requests to 100k it will let me live with my current hardware set up for at least another 6 months. – gatapia Commented Apr 8, 2010 at 21:25
3 Answers
Reset to default 5Another way of doing this might be to encode to binary types such as signed/unsigned ints, and manually decode as at http://snippets.dzone./posts/show/685 which would require server side code to create the binary data.
You could then huffman pression or something similar like RLE (see http://rosettacode/wiki/Run-length_encoding#JavaScript for an implementation, though it may have some issues in IE without modifying) to press the data further.
EDIT: Alternatively, you could convert the numbers themselves to a base (radix) in the unencoded URI character range (see http://en.wikipedia/wiki/Percent-encoding) which should work well if many of the numbers are larger than 2 digits. I converted the code at http://code.activestate./recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python to do this.
It currently doesn't handle floats, but it could be done fairly easily:
function get_map(s) {
d = {}
for (var i=0; i<s.length; i++) {
d[s.charAt(i)] = i}
d.length = s.length
d._s = s
return d}
var separate_with = '~';
var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P
var base10 = get_map('0123456789')
// UNCOMMENT ME for length/speed testing in a wider base!
// You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P
/*var encodable = ''
for (var i=1; i<128; i++) {
encodable += String.fromCharCode(i)
}
encodable = get_map(encodable)*/
function baseconvert(number, fromdigits, todigits) {
var number = String(number)
if (number.charAt(0) == '-') {
number = number.slice(1, number.length)
neg=1}
else {
neg=0}
// make an integer out of the number
var x = 0
for (var i=0; i<number.length; i++) {
var digit = number.charAt(i)
x = x*fromdigits.length + fromdigits[digit]
}
// create the result in base 'todigits.length'
res = ""
while (x>0) {
remainder = x % todigits.length
res = todigits._s.charAt(remainder) + res
x = parseInt(x/todigits.length)
}
if (neg) res = "-"+res
return res
}
function encodeNums(L) {
var r = []
for (var i=0; i<L.length; i++) {
r.push(baseconvert(L[i], base10, encodable))
}
return r.join(separate_with)
}
function decodeNums(s) {
var r = []
var s = s.split(separate_with)
for (var i=0; i<s.length; i++) {
r.push(parseInt(baseconvert(s[i], encodable, base10)))
}
return r
}
var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543]
alert(encodeNums(test))
alert(decodeNums(encodeNums(test)))
Options
- Use a js library (see answer from Josh) but watch for script timeouts
- Use some kind of activex or silverlight uploader that also does zip
- Use a java plug-in
- Use a flash based pressing uploader
I'm now toying with the idea of encoding the length of the number into the number itself. I still have not perfected this algorithm but will post it once done. But roughly this is what I am currently trying to achieve:
Boundaries:
- Loss of precision is allowed (+- 3)
- Max number will be 3200
- Min is 0 (so no negatives)
- No decimal - floats
So now given my max allowed number I know that the length of the encoded digit in base 62 will have a max length of 2. So any encoded number is either 1 or 2 characters in length. Awesome. So now I'm going to make the number odd or even depending if its 1 or 2 characters (remember I can handle loss of precission). This removes the need for separators.
Now I'm seeing about 70%-80% pression with this, its very buggy at the moment but I'm excited about it, so the post to encourage discussion around this methodology.