最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

url - Javascript Slug working for non latin characters also - Stack Overflow

programmeradmin1浏览0评论

Basically I found a slug function which looks like this:

function slug(string) => { 
    return string.toString().toLowerCase()
        .replace(/\s+/g, '-')
        .replace(/[^\w\-]+/g, '')
        .replace(/\-\-+/g, '-')
        .replace(/^-+/, '')
        .replace(/-+$/, '');
};

However, it doesn't seem to work for Russian, Greek, ... characters. Basically they are removed at this step .replace(/[^\w\-]+/g, '') which I don't want but I also want to remove other special characters which do not represent normal letters in some countries.

Example:

English | Do you know it rains? | do-you-know-it-rains

Czech | víš, že prší? | vis-ze-prsi

Romanian | Ști că plouă? | sti-ca-ploua

Russian | ты знаешь, что идет дождь? | ты-знаешь-что-идет-дождь

Note:

Basically for latin alphabet I will keep the letters but remove the diacritics, but for non-latin alphabet I will keep the letters as they are (I don't want to convert them into latin characters)

Basically I found a slug function which looks like this:

function slug(string) => { 
    return string.toString().toLowerCase()
        .replace(/\s+/g, '-')
        .replace(/[^\w\-]+/g, '')
        .replace(/\-\-+/g, '-')
        .replace(/^-+/, '')
        .replace(/-+$/, '');
};

However, it doesn't seem to work for Russian, Greek, ... characters. Basically they are removed at this step .replace(/[^\w\-]+/g, '') which I don't want but I also want to remove other special characters which do not represent normal letters in some countries.

Example:

English | Do you know it rains? | do-you-know-it-rains

Czech | víš, že prší? | vis-ze-prsi

Romanian | Ști că plouă? | sti-ca-ploua

Russian | ты знаешь, что идет дождь? | ты-знаешь-что-идет-дождь

Note:

Basically for latin alphabet I will keep the letters but remove the diacritics, but for non-latin alphabet I will keep the letters as they are (I don't want to convert them into latin characters)

Share Improve this question edited Feb 18, 2019 at 10:34 paulalexandru asked Feb 18, 2019 at 9:22 paulalexandrupaulalexandru 9,5307 gold badges67 silver badges96 bronze badges 1
  • See also: stackoverflow./questions/13309620/… – cmbuckley Commented Feb 18, 2019 at 10:12
Add a ment  | 

1 Answer 1

Reset to default 9

Here is an pproach that works for special character. Using a set of objects, you categorize every special character you want to replace under the latin character that will replace it.

However, to leave greek and russian untouched, you have to use a regex that considers greek and russian as word characters, so after replacing the special characters using the above trick, you have to remove all non-word characters using the following regex [^-a-zа-я\u0370-\u03ff\u1f00-\u1fff].

This regex includes the dash, the latin characters a-z followed by cyrillic а-я and finally the \u0370-\u03ff\u1f00-\u1fff which is the extended unicode range for greek characters.

You can use this wikipedia language recognition chart to add more special characters to the set.

function slugify(text) {
  text = text.toString().toLowerCase().trim();

  const sets = [
    {to: 'a', from: '[ÀÁÂÃÄÅÆĀĂĄẠẢẤẦẨẪẬẮẰẲẴẶἀ]'},
    {to: 'c', from: '[ÇĆĈČ]'},
    {to: 'd', from: '[ÐĎĐÞ]'},
    {to: 'e', from: '[ÈÉÊËĒĔĖĘĚẸẺẼẾỀỂỄỆ]'},
    {to: 'g', from: '[ĜĞĢǴ]'},
    {to: 'h', from: '[ĤḦ]'},
    {to: 'i', from: '[ÌÍÎÏĨĪĮİỈỊ]'},
    {to: 'j', from: '[Ĵ]'},
    {to: 'ij', from: '[IJ]'},
    {to: 'k', from: '[Ķ]'},
    {to: 'l', from: '[ĹĻĽŁ]'},
    {to: 'm', from: '[Ḿ]'},
    {to: 'n', from: '[ÑŃŅŇ]'},
    {to: 'o', from: '[ÒÓÔÕÖØŌŎŐỌỎỐỒỔỖỘỚỜỞỠỢǪǬƠ]'},
    {to: 'oe', from: '[Œ]'},
    {to: 'p', from: '[ṕ]'},
    {to: 'r', from: '[ŔŖŘ]'},
    {to: 's', from: '[ߌŜŞŠȘ]'},
    {to: 't', from: '[ŢŤ]'},
    {to: 'u', from: '[ÙÚÛÜŨŪŬŮŰŲỤỦỨỪỬỮỰƯ]'},
    {to: 'w', from: '[ẂŴẀẄ]'},
    {to: 'x', from: '[ẍ]'},
    {to: 'y', from: '[ÝŶŸỲỴỶỸ]'},
    {to: 'z', from: '[ŹŻŽ]'},
    {to: '-', from: '[·/_,:;\']'}
  ];

  sets.forEach(set => {
    text = text.replace(new RegExp(set.from,'gi'), set.to)
  });

  return text
    .replace(/\s+/g, '-')    // Replace spaces with -
    .replace(/[^-a-zа-я\u0370-\u03ff\u1f00-\u1fff]+/g, '') // Remove all non-word chars
    .replace(/--+/g, '-')    // Replace multiple - with single -
    .replace(/^-+/, '')      // Trim - from start of text
    .replace(/-+$/, '')      // Trim - from end of text
}

console.log(slugify('Do you know it rains?'));
console.log(slugify('víš, že prší?'));
console.log(slugify('Ști că plouă?'));
console.log(slugify('ты знаешь, что идет дождь?'));
console.log(slugify('ἀεὶ Λιβύη φέρει τι καινόν'));

发布评论

评论列表(0)

  1. 暂无评论