最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Use File Content to Determine MIME Type with Node JS - Stack Overflow

programmeradmin3浏览0评论

It seems all of the popular MIME type libraries for node.js just use the file name extension rather than peeking into the file to determine the MIME type.

Is there a good way to use Node to jump into the file and intelligently determine the file's MIME type in case an extension is not present?

It seems all of the popular MIME type libraries for node.js just use the file name extension rather than peeking into the file to determine the MIME type.

Is there a good way to use Node to jump into the file and intelligently determine the file's MIME type in case an extension is not present?

Share Improve this question asked Jul 9, 2014 at 20:10 Kirk OuimetKirk Ouimet 28.4k44 gold badges130 silver badges182 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 11

That indeed feels like a pity, that most popular MIME modules are just mapping extension to the type.

After searching deeper, I found the module called mmmagic, it seems to be doing exactly what you want.

Be aware, that from working with MIME I was left with a taste, that MIME detection is in principle not pletely reliable, and there is a rare chance of false detections.

Example of usage (taken from their site):

  var mmm = require('mmmagic'),
      Magic = mmm.Magic;

  var magic = new Magic(mmm.MAGIC_MIME_TYPE);
  magic.detectFile('node_modules/mmmagic/build/Release/magic.node', function(err, result) {
      if (err) throw err;
      console.log(result);
      // output on Windows with 32-bit node:
      //    application/x-dosexec
  });

Since MIME does not at all dictate anything about the file contents format, you can only employ heuristics to guess what is going on in a file:

  1. Some binary formats have something called a magic number, but those can be wrong or ambiguous. See this wikipedia article for more info.

  2. Many text file formats contain grammar constructs that you can use for a simple pattern matching test. E.g. xml, csv or json. However some formats (e.g. HTML), have a rather "evolved" syntax definition making it ambiguous and thus hard to pattern match.

To better illustrate the issue of ambiguity, here is an example: Browsers have developed a very very high tolerance, and accept anything that remotely resembles HTML thus a HTML (or even XHTML) file format is hard to identify. Not to mention the fact that HTML files could actually be non-HTML template languages (such as jade, handlebars, angular templates etc...). This is just one of many examples where things get very ambiguous.

发布评论

评论列表(0)

  1. 暂无评论