最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Filtering a list of strings based on user locale - Stack Overflow

programmeradmin1浏览0评论

When working on a JavaScript project with AngularJS 1.6, I have a list of strings which I'd like to filter. For instance, assume my list contains árbol, cigüeña, nido and tubo.

When filtering strings in Spanish, if I filtered for "u", I'd expect both cigüeña and tubo to appear, which would be the most natural result for a Spaniard. However, this is not the case in German - u and ü are different letters and thus a German will not want to see cigüeña on the list. So I am looking for a way to make my list filtering aware of the user's locale.

I happen to have an object containing lots of diacritics, such that:

diacritics["á"] = "a";
diacritics["ü"] = "u";
// and so on...

This is what my filtering code looks like:

function matches(word, search) {
    var cleanWord = removeDiacritics(word.toLowerCase());
    var cleanSearch = removeDiacritics(search.toLowerCase());
    return cleanWord.indexOf(cleanSearch) > -1;
}

function removeDiacritics(word) {
    function match(a) {
        return diacritics[a] || a;
    }
    return text.replace(/[^\u0000-\u007E]/g, match);
}

The above code just removes all diacritics, so I thought to make it aware of the user's locale. Thus, I changed the match() function to this:

function match(a) {
    if (diacritics[a] && a.localeCompare(diacritics[a] === 0) {
        return diacritics[a];
    }
    return a;
}

Unfortunately, this doesn't work. The localeCompare function returns the same values when paring "u" and "ü" with the German and Spanish locales, so that was not the answer here. I've gone over the reference for the localeCompare method and tried the usage and sensitivity options, but they don't seem to help much here.

How could I tweak my code for this to work? Is there any library which can handle this properly for me?

When working on a JavaScript project with AngularJS 1.6, I have a list of strings which I'd like to filter. For instance, assume my list contains árbol, cigüeña, nido and tubo.

When filtering strings in Spanish, if I filtered for "u", I'd expect both cigüeña and tubo to appear, which would be the most natural result for a Spaniard. However, this is not the case in German - u and ü are different letters and thus a German will not want to see cigüeña on the list. So I am looking for a way to make my list filtering aware of the user's locale.

I happen to have an object containing lots of diacritics, such that:

diacritics["á"] = "a";
diacritics["ü"] = "u";
// and so on...

This is what my filtering code looks like:

function matches(word, search) {
    var cleanWord = removeDiacritics(word.toLowerCase());
    var cleanSearch = removeDiacritics(search.toLowerCase());
    return cleanWord.indexOf(cleanSearch) > -1;
}

function removeDiacritics(word) {
    function match(a) {
        return diacritics[a] || a;
    }
    return text.replace(/[^\u0000-\u007E]/g, match);
}

The above code just removes all diacritics, so I thought to make it aware of the user's locale. Thus, I changed the match() function to this:

function match(a) {
    if (diacritics[a] && a.localeCompare(diacritics[a] === 0) {
        return diacritics[a];
    }
    return a;
}

Unfortunately, this doesn't work. The localeCompare function returns the same values when paring "u" and "ü" with the German and Spanish locales, so that was not the answer here. I've gone over the reference for the localeCompare method and tried the usage and sensitivity options, but they don't seem to help much here.

How could I tweak my code for this to work? Is there any library which can handle this properly for me?

Share Improve this question edited Nov 16, 2017 at 12:32 unpollito asked Nov 16, 2017 at 12:23 unpollitounpollito 1,0192 gold badges13 silver badges32 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 4 +100

I'd go about getting the user's locale directly from the browser via navigator (src), an object representing the user agent:

var language = navigator.language;

This will assign language the locale code of the user's browser, in my case en-US. I found this site helpful for finding locale code's to test other regions of the world.

My strFromLocale function is parable to your removeDiacritics function:

function strFromLocale(str) {
    function match(letter) {
        function letterMatch(letter, normalizedLetter) {
            var location = new Intl.Collator(language, {usage: 'search', sensitivity: 'base' }).pare(letter, normalizedLetter);
            return (location == 0)
        }
        normalizedLetter = letter.normalize('NFD').replace(/[\u0300-\u036f]/gi, "")
        if ( letterMatch(letter, normalizedLetter) ) {
            return normalizedLetter;
        } else {
            return letter;
        }
    }
    return str.replace(/[^\u0000-\u007E]/g, match);
}

Note the line with Intl.Collator (src). This line pares the diacritic with the normalized letter of the diacritic and checks the given language's alphabet for positional differences. Therefore:

/* English */
new Intl.Collator('en-US', {usage: 'search', sensitivity: 'base' }).pare('u', 'ü');
>>> 0

/* Swedish */
new Intl.Collator('sv', {usage: 'search', sensitivity: 'base' }).pare('u', 'ü');
>>> -1

/* German */
new Intl.Collator('de', {usage: 'search', sensitivity: 'base' }).pare('u', 'ü');
>>> -1

As you can see in the letterMatch function, it returns true if and only if the result of Intl.Collator is 0, indicating that there are no positional differences of the letter within the alphabet of that language meaning it is safe to replace.

With that, here are some tests of the strFromLocale function:

var language = navigator.language; // en-US
strFromLocale("cigüeña");
>>> ciguena

var language = 'sv' // Swedish
strFromLocale("cigüeña");
>>> cigüena

var language = 'de' // German
strFromLocale("cigüeña");
>>> cigüena

var language = 'es-mx' // Spanish - Mexico
strFromLocale("cigüeña");
>>> cigueña

You are probably looking for the ECMA 6 Intl library. This will allow you to adjust sort order based on locale e.g.:

// in German, ä sorts with a
console.log(new Intl.Collator('de').pare('ä', 'z'));
// → a negative value

// in Swedish, ä sorts after z
console.log(new Intl.Collator('sv').pare('ä', 'z'));
// → a positive value

The sensitivity: 'base' option will automatically sort with/without diacritics.

// in German, ä has a as the base letter
console.log(new Intl.Collator('de', { sensitivity: 'base' }).pare('ä', 'a'));
// → 0

// in Swedish, ä and a are separate base letters
console.log(new Intl.Collator('sv', { sensitivity: 'base' }).pare('ä', 'a'));
// → a positive value

You can then sort your list into the correct order prior to populating your UI Widget.

发布评论

评论列表(0)

  1. 暂无评论