javascript - Storing and sending raw file data within a JSON object

I'm looking for a way to transfer the raw file data of any file-type with any possible content (By that I mean files and file-content are all user generated) both ways using xhr/ajax calls in a Backbone front-end against a Django back-end.

EDIT: Maybe the question is still unclear...

If you open a file in an IDE (such as Sublime), you can view and edit the actual code that comprises that file. I'm trying to put THAT raw content into a JSON so I can send to the browser, it can be modified, and then sent back.

I posted this question because I was under the impression that because the contents of these files can effectively be in ANY coding language that just stringify-ing the contents and sending it seems like a brittle solution that would be easy to break or exploit. Content could contain any number of ', ", { and } chars that would seem to break JSON formatting, and escaping those characters would leave artifacts within the code that would effectively break them (wouldn't it?).

If that assumption is wrong, THAT would also be an acceptable answer (so long as you could point out whatever it is I'm overlooking).

The project I'm working on is a browser-based IDE that will receive a complete file-structure from the server. Users can add/remove files, edit the content of those files, then save their changes back to the server. The sending/receiving all has to be handled via ajax/xhr calls.

Within Backbone, each "file" is instantiated as a model and stored in a Collection. The contents of the file would be stored as an attribute on the model.
Ideally, file content would still reliably throw all the appropriate events when changes are made.
Fetching contents should not be broken out into a separate call from the rest of the file model. I'd like to just use a single save/fetch call for sending/receiving files including the raw content.

Solutions that require Underscore/jQuery are fine, and I am able to bring in additional libraries if there is something available that specializes in managing that raw file data.

EDIT: Maybe the question is still unclear...

If that assumption is wrong, THAT would also be an acceptable answer (so long as you could point out whatever it is I'm overlooking).

Within Backbone, each "file" is instantiated as a model and stored in a Collection. The contents of the file would be stored as an attribute on the model.
Ideally, file content would still reliably throw all the appropriate events when changes are made.
Fetching contents should not be broken out into a separate call from the rest of the file model. I'd like to just use a single save/fetch call for sending/receiving files including the raw content.

Solutions that require Underscore/jQuery are fine, and I am able to bring in additional libraries if there is something available that specializes in managing that raw file data.

Share Improve this question edited Sep 17, 2015 at 0:06 asked Sep 8, 2015 at 20:17 relic 1,7041 gold badge16 silver badges26 bronze badges

What is the question? You can easily store your files as {data: 'stringyfied_data'} model – Lesha Ogonkov Commented Sep 8, 2015 at 20:25
1 @LeshaOgonkov - My assumption (which might be wrong?) is that there would inevitably be file types/formats or content within these files that could break or exploit the interface if I stringified everything. Is that wrong then? Additionally, how reliable would it be to stringify everything across potentially any code language, without modifying or corrupting the contents of those files. – relic Commented Sep 8, 2015 at 20:28
You can parse and escape model data, it's up to your implementation, Backbone just helps you to store your ideas in JS – Lesha Ogonkov Commented Sep 8, 2015 at 20:30
My question is more about the sending/receiving/storing of the raw data. The context just happens to be within Backbone, but I'll reword the question title to make that a little more clear. – relic Commented Sep 8, 2015 at 20:34
1 It's a huge topic. You can update raw data by calculating changes and sending them to server, your received data could be AST, or some special format, for keeping markup. There is a lot of options. But, i think Python is not best option for managing highload project like that. May be for 1-2 persons prototype only. – Lesha Ogonkov Commented Sep 8, 2015 at 20:50

| Show 10 more comments

3 Answers 3

Sorted by: Reset to default 10 +100

Interesting question. The code required to implement this would be quite involved, sorry that I'm not providing examples, but you seem like a decent programmer and should be able to implement what's mentioned below.

Regarding the sending of raw data through JSON, all you would need to do to make it JSON-safe and not break your code is to escape the special characters by stringyfying using Python's json.dumps & JavaScript's JSON.stringyfy. [1]

If you are concerned about some form of basic tamper-proofing, then light encoding of your data will fit the purpose, in addition to having the client and server pass a per-session token back and forth with JSON transfers to ensure that the JSON isn't forged from a malicious address.

If you want to check the end-to-end integrity of the data, then generate an md5 checksum and send it inside your JSON and then generate another md5 on arrival and compare with the one inside your JSON.

Base64 encoding: The size of your data would grow by 33% as it encodes four characters to represent three bytes of data.

Base85: Encodes four bytes as five characters and will grow your data by 25%, but uses much more processing overhead than Base64 in Python. That's a 8% improvement in data size, but at the expense of processing overhead. Also it's not string safe as double & single quotation marks, angle brackets, and ampersands cannot be used unescaped inside JSON, as it uses all 95 printable ASCII characters. Needs to be stringyfied before JSON transport. [2]

yEnc has as little as 2-3% overhead (depending on the frequency of identical bytes in the data), but is ruled out by impractical flaws (see [3]).

ZeroMQ Base-85, aka Z85. It's a string-safe variant of Base85, with a data overhead of 25%, which is better than Base64. No stringyfying necessary for sticking it into JSON. I highly recommended this encoding algorithm. [4] [5] [6]

If you're sending only small files (say a few KB), then the overhead of binary-to-text conversion will be acceptable. With files as large as a few Mbs, it might not be acceptable to have them grow by 25-33%. In this case you can try to compress them before sending. [7]

You can also send data to the server using multipart/form-data, but I can't see how this will work bi-directionally.

UPDATE

In conclusion, here's my solution's algorithm:

Sending data

Generate a session token and store it for the associated user upon login (server), or retrieve from the session cookie (client)
Generate MD5 hash for the data for integrity checking during transport.
Encode the raw data with Z85 to add some basic tamper-proofing and JSON-friendliness.
Place the above inside a JSON and send POST when requested.

Reception

Grab JSON from POST
Retrieve session token from storage for the associated user (server), or retrieve from the session cookie (client).
Generate MD5 hash for the received data and test against MD5 in received JSON, reject or accept conditionally.
Z85-decode the data in received JSON to get raw data and store in file or DB (server) or process/display in GUI/IDE (client) as required.

References

[1] How to escape special characters in building a JSON string?

[2] Binary Data in JSON String. Something better than Base64

[3] https://en.wikipedia.org/wiki/YEnc

[4] http://rfc.zeromq.org/spec:32

[5] Z85 implementation in C/C++ https://github.com/artemkin/z85

[6] Z85 Python implementation of https://gist.github.com/minrk/6357188

[7] JavaScript zip library http://stuk.github.io/jszip/

[8] JavaScript Gzip SO JavaScript implementation of Gzip

AFAI am concerned a simple Base64 conversion will do it. Stringify, convert to base64, then pass it to the server and decode it there. Then you won't have the raw file transfer and you will still maintain your code simple.

I know this solution could seem a bit too simple, but think about it: many cryptographics algorithms can be broken given the right hardware. One of the most secure means would be through a digital certificate and then encrypt data with the private key and then send it over to the server. But, to reach this level of security every user of your application would have to have a digital certificate, which I think would be an excessive demand to your users.

So ask yourself, if implementing a really safe solution adds a lot of hassle to your users, why do you need a safe transfer at all? Based on that I reaffirm what I said before. A simple Base64 conversion will do. You can also use some other algotithms like SHA256 ou something to make it a litter bit safer.

If the only concern here is that the raw content of your code files (the "data" your model is storing), will cause some type of issue when stored in JSON, this is easily availed by escaping your data.

Stringifying your raw code file contents can cause issues as anything resembling JavaScript or JSON will be parsed into an actual JSON object. Your code file data can and should be stored simply as an esacaped string. Your fear here is that said string may contain characters that could break being stored in JavaScript inside a string, this is alleviated by escaping the entire string, and thus double, triple, quadruple, etc. escaping anything already escaped in the code file.

In essence it is important to remember here that raw code in a file is nothing but a glorified string when stored in a database, unless you are adding in-line metadata dynamically. It's just text, and doing standard escaping will make it safe to store in whatever format as a string (inside "" or '') in JSON.

I recommend reading this SO answer, as I also referenced it to verify what I already thought was correct: How To Escape a JSON string containing newline characters using JavaScript

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Storing and sending raw file data within a JSON object - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)