最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Is there any way for me to work with this 100,000 item new-line separated string of words? - Stack Overflow

programmeradmin0浏览0评论

I've got a 100,000+ long list of English words in plain text. I want to use split() to convert the list into an array, which I can then convert to an associative array, giving each list item a key equal to its own name, so I can very efficiently check whether or not a string is an English word.

Here's the problem:

The list is new-line separated.

aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals

This means that var list = ' <copy/paste list> ' isn't going to work, because JavaScript quotes don't work multi-line.

Is there any way for me to work with this 100,000 item new-line separated string?

I've got a 100,000+ long list of English words in plain text. I want to use split() to convert the list into an array, which I can then convert to an associative array, giving each list item a key equal to its own name, so I can very efficiently check whether or not a string is an English word.

Here's the problem:

The list is new-line separated.

aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals

This means that var list = ' <copy/paste list> ' isn't going to work, because JavaScript quotes don't work multi-line.

Is there any way for me to work with this 100,000 item new-line separated string?

Share Improve this question asked Jul 22, 2014 at 22:56 user3818284user3818284 5
  • 2 Are you working in the browser, in nodejs, or some other environment? My initial thought is just to read it in from a file rather than as a string literal, but the details of how to do that vary depending on the environment you're in. – Chris Tavares Commented Jul 22, 2014 at 22:58
  • Generate the correct array literal from the start. You might find this useful: stackoverflow./q/4833480/139010 – Matt Ball Commented Jul 22, 2014 at 22:58
  • @ChrisTavares I can use either. – user3818284 Commented Jul 22, 2014 at 22:59
  • Why do you think splitting into an array, then converting to an object is more efficient than say String.prototype.indexOf on the original string? – RobG Commented Jul 22, 2014 at 23:03
  • Look at what other people have done with scrabble dictionaries. ejohn/blog/dictionary-lookups-in-javascript – epascarello Commented Jul 22, 2014 at 23:10
Add a ment  | 

7 Answers 7

Reset to default 5

replace the newlines with mas in any texteditor before copying to your js file

One workaround would be to use paste the list into notepad++. Then select all and Edit>Line Operations>Join lines.

This removes new lines and replaces them with spaces.

If you're doing this client side, you can use jQuery's get function to get the words from a text file and do the processing there:

jQuery.get('wordlist.txt', function(results){
    //Do your processing on results here
});

If you're doing this in Node.js, follow the guide here to see how to read a file into memory.

You can use notepad++ or any semi-advanced text editor.

  1. Go to notepad++ and push Ctrl+H to bring up the Replace dialog.

  2. Towards the bottom, select the "Extended" Search Mode

  3. You want to find "\r\n" and replace it with ", "

This will remove the newlines and replace it with mas

jsfiddle Demo

Addressing this purely from having a string and trying to work with it in JavaScript through copy paste. Specifically the issues regarding, "This means that var list = ' ' isn't going to work, because JavaScript quotes don't work multi-line.", and "Is there any way for me to work with this 100,000 item new-line separated string?".

You can treat the string like a string in a ment in JavaScript . Although counter-intuitive, this is an interesting approach. Here is the main function

function convertComment(c) {
 return c.toString().
  replace(/^[^\/]+\/\*!?/, '').
  replace(/\*\/[^\/]+$/, '');
}

It can be used in your situation as follows:

var s = convertComment(function() {
 /*

 aa
 aah
 aahed
 aahing
 aahs
 aal
 aalii
 aaliis
 aals

 */
});

At which point you may do whatever you like with s. The demo simply places it into a div for displaying.


jsFiddle Demo

Further, here is an example of taking the list of words, getting them into an array, and then referencing a single word in the array.

//previously shown code
var all = s.match(/[^\r\n]+/g);
var rand = parseInt(Math.random() * all.length);

document.getElementById("random").innerHTML = "Random index #"+rand+": "+all[rand];

If the words are in a separate file, you can load them directly into the page and go from there. I've used a script element with a MIME type that should mean browsers ignore the content (provided it's in the head):

<script type="text/plain" id="wordlist">
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
</script>

<script>

var words = (function() {
  var words = '\n' + document.getElementById('wordlist').textContent + '\n';
  return {
    checkWord: function (word) {
      return words.indexOf('\n' + word + '\n') != -1;
    }
  }
}());

console.log(words.checkWord('aaliis')); // true
console.log(words.checkWord('ahh'));    // false

</script>

The result is an object with one method, checkWord, that has access to the word list in a closure. You could add more methods like addWord or addVariant, whatever.

Note that textContent may not be supported in all browsers, you may need to feature detect and use innerText or an alternative for some.

For variety, another solution is to put the unaltered content into

  1. A data attribute - HTML attributes can contain newlines
  2. or a "non-script" script - eg. <SCRIPT TYPE="text/x-wordlist">
  3. or an HTML ment node
  4. or another hidden element that allows content

Then the content could be read and split/parsed. Since this would be done outside of JavaScript's string literal parsing it doesn't have the issue regarding embedded newlines.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论