最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - for loop string each word - Stack Overflow

programmeradmin8浏览0评论

if this type character '這' = NonEnglish each will take up 2 word space, and English will take up 1 word space, Max length limit is 10 word space; How to get the first 10 space.
for below example how to get the result This這 is?
I'm trying to use for loop from first word but I don't know how to get each word in string...

string = "This這 is是 English中文 …";

var NonEnglish = "[^\u0000-\u0080]+",
    Pattern = new RegExp(NonEnglish),
    MaxLength = 10,
    Ratio = 2;

if this type character '這' = NonEnglish each will take up 2 word space, and English will take up 1 word space, Max length limit is 10 word space; How to get the first 10 space.
for below example how to get the result This這 is?
I'm trying to use for loop from first word but I don't know how to get each word in string...

string = "This這 is是 English中文 …";

var NonEnglish = "[^\u0000-\u0080]+",
    Pattern = new RegExp(NonEnglish),
    MaxLength = 10,
    Ratio = 2;
Share Improve this question edited Feb 27, 2014 at 5:21 user1775888 asked Feb 27, 2014 at 5:19 user1775888user1775888 3,31314 gold badges49 silver badges67 bronze badges 5
  • Do you need to get first 10 symbols of string or what? – Y.Puzyrenko Commented Feb 27, 2014 at 5:29
  • If it's a mixed of english & non-english, cant you just remove non-english since you don't need them? then do a split after that – fedmich Commented Feb 27, 2014 at 5:29
  • @Good.luck I need to get first 10 symbols but if there is 1 non english word will equal 2 symbol – user1775888 Commented Feb 27, 2014 at 5:30
  • @fedmich ?? the words just for example the string maybe will be th中文isisiisi – user1775888 Commented Feb 27, 2014 at 5:32
  • @user1775888 Are we supposed to use the same regex you provide or something of our own ? – HighBoots Commented Feb 27, 2014 at 5:38
Add a ment  | 

2 Answers 2

Reset to default 8

If you mean you want to get that part of the string where it's length has reached 10, here's the answer:

var string = "This這 is是 English中文 …";

function check(string){
  // Length of A-Za-z characters is 1, and other characters which OP wants is 2
  var length = i = 0, len = string.length; 

  // you can iterate over strings just as like arrays
  for(;i < len; i++){

    // if the character is what the OP wants, add 2, else 1
    length += /\u0000-\u0080/.test(string[i]) ? 2 : 1;

    // if length is >= 10, e out of loop
    if(length >= 10) break;
  }

  // return string from the first letter till the index where we aborted the for loop
  return string.substr(0, i);
}

alert(check(string));

Live Demo

EDIT 1:

  1. Replaced .match with .test. The former returns a whole array while the latter simply returns true or false.
  2. Improved RegEx. Since we are checking only one character, no need for ^ and + that were before.
  3. Replaced len with string.length. Here's why.

I'd suggest something along the following lines (assuming that you're trying to break the string up into snippets that are <= 10 bytes in length):

string = "This這 is是 English中文 …";

function byteCount(text) {
    //get the number of bytes consumed by a string
    return encodeURI(text).split(/%..|./).length - 1;
}

function tokenize(text, targetLen) {
    //break a string up into snippets that are <= to our target length
    var result = [];

    var pos = 0;
    var current = "";
    while (pos < text.length) {
        var next = current + text.charAt(pos);

        if (byteCount(next) > targetLen) {
            result.push(current);
            current = "";
            pos--;
        }
        else if (byteCount(next) == targetLen) {
            result.push(next);
            current = "";
        }
        else {
            current = next;
        }

        pos++;
    }
    if (current != "") {
       result.push(current);
    }

    return result;
};

console.log(tokenize(string, 10));

http://jsfiddle/5pc6L/

发布评论

评论列表(0)

  1. 暂无评论