最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex for a URL Connection String - Stack Overflow

programmeradmin10浏览0评论

Is there a known JavaScript regular expression to match an entire URL Connection String?

protocol://user:password@hostname:12345/segment1/segment2?p1=val1&p2=val2

I'm looking for a single regular expression that would help me translate such a connection string into an object:

{
    protocol: 'protocol',
    user: 'user',
    password: 'password',
    host: 'hostname:12345',
    hostname: 'hostname',
    port: 12345,
    segments: ['segment1', 'segment2'],
    params: {
        p1: 'val1',
        p2: 'val2'
    }
}

Also, I want every single part of the connection string to be optional, so the missing parameters can be filled by values from the environment.

examples:

  • protocol://
  • server:12345
  • :12345 - for the port only
  • user:password@
  • user@
  • :password@
  • /segment1
  • ?p1=val1
  • and so on...

Standard RFC 3986 rules should apply to all the parts when it es to the valid symbols.

I'm looking for something that would work in both Node.js and all browsers.

I've done a separate parsing piece-by-piece within connection-string, but the problem with that - it doesn't allow to validate, i.e. to tell if the whole thing is valid.

Is there a known JavaScript regular expression to match an entire URL Connection String?

protocol://user:password@hostname:12345/segment1/segment2?p1=val1&p2=val2

I'm looking for a single regular expression that would help me translate such a connection string into an object:

{
    protocol: 'protocol',
    user: 'user',
    password: 'password',
    host: 'hostname:12345',
    hostname: 'hostname',
    port: 12345,
    segments: ['segment1', 'segment2'],
    params: {
        p1: 'val1',
        p2: 'val2'
    }
}

Also, I want every single part of the connection string to be optional, so the missing parameters can be filled by values from the environment.

examples:

  • protocol://
  • server:12345
  • :12345 - for the port only
  • user:password@
  • user@
  • :password@
  • /segment1
  • ?p1=val1
  • and so on...

Standard RFC 3986 rules should apply to all the parts when it es to the valid symbols.

I'm looking for something that would work in both Node.js and all browsers.

I've done a separate parsing piece-by-piece within connection-string, but the problem with that - it doesn't allow to validate, i.e. to tell if the whole thing is valid.

Share Improve this question edited Jul 13, 2017 at 7:39 vitaly-t asked Jul 13, 2017 at 6:50 vitaly-tvitaly-t 26k17 gold badges127 silver badges150 bronze badges 8
  • 1 A dupe of How to parse a URL? – Wiktor Stribiżew Commented Jul 13, 2017 at 6:53
  • @WiktorStribiżew there is no answer there that would support all parts of the URL being optional, as per my example. – vitaly-t Commented Jul 13, 2017 at 6:56
  • 1 I don't think regex is a good idea for this problem. Why don't you just manually parse the URL and then construct the required object? – Dat Nguyen Commented Jul 13, 2017 at 7:27
  • Why do you want to use a regular expression for this case? Why not use the function, for example this one: locutus.io/php/url/parse_url ? – Sergey Khalitov Commented Jul 13, 2017 at 7:28
  • @SergeyKhalitov I don't know if it works, and if it does work with the conditions I described, it would make an answer, not a question why I don't use it - as I've never seen it before, obviously. – vitaly-t Commented Jul 13, 2017 at 7:33
 |  Show 3 more ments

3 Answers 3

Reset to default 8

Something like this ?

function url2obj(url) {
    var pattern = /^(?:([^:\/?#\s]+):\/{2})?(?:([^@\/?#\s]+)@)?([^\/?#\s]+)?(?:\/([^?#\s]*))?(?:[?]([^#\s]+))?\S*$/;
    var matches =  url.match(pattern);
    var params = {};
    if (matches[5] != undefined) { 
       matches[5].split('&').map(function(x){
         var a = x.split('=');
         params[a[0]]=a[1];
       });
    }

    return {
        protocol: matches[1],
        user: matches[2] != undefined ? matches[2].split(':')[0] : undefined,
        password: matches[2] != undefined ? matches[2].split(':')[1] : undefined,
        host: matches[3],
        hostname: matches[3] != undefined ? matches[3].split(/:(?=\d+$)/)[0] : undefined,
        port: matches[3] != undefined ? matches[3].split(/:(?=\d+$)/)[1] : undefined,
        segments : matches[4] != undefined ? matches[4].split('/') : undefined,
        params: params 
    };
}

console.log(url2obj("protocol://user:password@hostname:12345/segment1/segment2?p1=val1&p2=val2"));
console.log(url2obj("http://hostname"));
console.log(url2obj(":password@"));
console.log(url2obj("?p1=val1"));
console.log(url2obj("ftp://usr:pwd@[FFF::12]:345/testIP6"));

A test for the regex pattern here on regex101

Java datasource connection URL pattern sample if needed:

^(?:(?:(jdbc)\:{1})?(?:(\w+):/{2})?(?:([^@\/?!\"':#\s]+(?::\w+)?)@)?)?(?:([^@\/?!\"':#\s]+(?::\d+)?)(?=(?:$)|(?:/)))?(?:/([^@?!\"':#\s]*)(?=(?:$)|(?:\?)))?(?:[?]([^#?!\s]+))?\S*$

Online Demo

I have two regex, one without capturing and one with capturing. I was looking at URI for postgres, which also allows multiple hosts separated by ma. So the connection string postgresql://host1:123,host2:456/somedb?target_session_attrs=any&application_name=myapp is valid. With the lookahead ipv6 addresses are also handled.

/^[^:/?#\s]+:\/\/(?:[^@/?#:\s]+(?::[^@/?#\s]+)?@)?(?:[^/?#\s]+)?(?:\/[^?#\s]+)?(?:[?][^#\s]+)?$/

This regex captures a single host into hostname and port, multiple hosts into hostnamesAndPorts.

/^(?<protocol>[^:/?#\s]+):\/\/(?:(?<user>[^@/?#:\s]+)(?::(?<password>[^@/?#\s]+))?@)?(?:(?<hostname>[^/?,#\s]+?(?=(:\d+|\/|$)))?(?::(?<port>\d+))?|(?<hostnamesAndPorts>[^/?#\s]+)?)(?:\/(?<segments>[^?#\s]+))?(?:[?](?<parameters>[^#\s]+))?$/
发布评论

评论列表(0)

  1. 暂无评论