Is there a way to make Lodash's orderBy function support accented characters?
Like á, é, ñ, etc. These are moved to the end of the array when the sort is performed.
Is there a way to make Lodash's orderBy function support accented characters?
Like á, é, ñ, etc. These are moved to the end of the array when the sort is performed.
Share Improve this question edited Jun 24, 2017 at 23:18 Peter Mortensen 31.6k22 gold badges110 silver badges133 bronze badges asked Jun 24, 2017 at 20:17 Omar CardonaOmar Cardona 1371 silver badge8 bronze badges2 Answers
Reset to default 13The Problem
It sounds like it doesn't use localeCompare
, defaulting instead to the equivalent of using <
or >
, which pares by UTF-16 code unit numeric values, not locale-aware collation (ordering).
Controlling Comparison Method
You can convert to array (if needed) and then use the native sort
with localeCompare
. For instance, instead of:
const result = _.orderBy(theArray, ["value"]);
you can do:
const result = theArray.slice().sort((a, b) => a.value.localeCompare(b.value));
or to sort in-place:
theArray.sort((a, b) => a.value.localeCompare(b.value));
localeCompare
uses the default collation order for the default locale. Using Intl.Collator
, you can have more control over the collation (like case-insensitivity, the handling of accents, the relative position of upper- and lower-case characters, etc.). For instance, if you wanted the default collation for the default locale but with upper-case characters first:
const collator = new Intl.Collator(undefined, {caseFirst: "upper"});
const result = theArray.slice().sort((a, b) => collator.pare(a.value, b.value));
Live Example:
const theArray = [
{value: "n"},
{value: "N"},
{value: "ñ"},
{value: "á"},
{value: "A"},
{value: "a"},
];
const lodashResult = _.orderBy(theArray, ["value"]);
const localeCompareResult = theArray.slice().sort((a, b) => a.value.localeCompare(b.value));
const collator = new Intl.Collator(undefined, {caseFirst: "upper"});
const collatorResult = theArray.slice().sort((a, b) => collator.pare(a.value, b.value));
show("unsorted:", theArray);
show("lodashResult:", lodashResult);
show("localeCompareResult:", localeCompareResult);
show("collatorResult:", collatorResult);
function show(label, array) {
console.log(label, "[");
for (const element of array) {
console.log(` ${JSON.stringify(element)}`);
}
console.log("]");
}
.as-console-wrapper {
max-height: 100% !important;
}
<script src="https://cdnjs.cloudflare./ajax/libs/lodash.js/4.17.21/lodash.min.js"></script>
Stable vs Unstable Sort
When I first wrote this answer, there was a slight difference between _.orderBy
and the native sort
: _.orderBy
, like _.sortBy
, always does a stable sort, whereas at the time of the original answer JavaScript's native sort
was not guaranteed to be stable. Since then, though, the JavaScript specification has been modified to require a stable sort (ES2019). So both _.orderBy
/_.sortBy
and native sort
are stable now.
If "stable" vs. "unstable" sort aren't familiar terms: A "stable" sort is one where two elements that are considered equivalent for sorting purposes are guaranteed to remain in the same position relative to each other; in an "unstable" sort, their positions relative to to each other might be swapped (which is allowed because they're "equivalent" for sorting purposes). Consider this array:
const theArray = [
{value: "x", id: 27},
{value: "z", id: 14},
{value: "x", id: 12},
];
If you do an unstable sort that sorts ascending on just value
(disregarding id
or any other properties the objects might have), there are two valid results:
// Valid result 1: id = 27 remained in front of id = 12
[
{value: "x", id: 27},
{value: "x", id: 12},
{value: "z", id: 14},
]
// Valid result 2: id = 27 was moved after id = 12
[
{value: "x", id: 12},
{value: "x", id: 27},
{value: "z", id: 14},
]
With a stable sort, though, only the first result is valid; the positions of equivalent elements relative to each other remains unchanged.
But again, that distinction no longer matters, since JavaScript's sort
is stable now too.
I've solved it by paring a sanitized element.
theArray.sort(function(a, b) {
return a.toLowerCase().removeAccents().localeCompare(b.toLowerCase().removeAccents());
});
The removeAccents function:
String.prototype.removeAccents = function () {
return this
.replace(/[áàãâä]/gi,"a")
.replace(/[éè¨ê]/gi,"e")
.replace(/[íìïî]/gi,"i")
.replace(/[óòöôõ]/gi,"o")
.replace(/[úùüû]/gi, "u")
.replace(/[ç]/gi, "c")
.replace(/[ñ]/gi, "n")
.replace(/[^a-zA-Z0-9]/g," ");
}