最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Extract links in a string and return an array of objects - Stack Overflow

programmeradmin0浏览0评论

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).

Example:

"simple text simple text simple text domain.ext/subdir again text text text youbank/transfertomealltheirmoney/witharegex text text text and again text"

I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:

[{link:"/dir",startsAt:25,endsAt:47},
{link:"/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]

Is this possible? How?

EDIT: @Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)" var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = []; while ( (result = regex.exec(str)) ) { indices.push({startsAt:result.index}); }; console.log(indices[0].link);console.log(indices[1].link);

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).

Example:

"simple text simple text simple text domain.ext/subdir again text text text youbank./transfertomealltheirmoney/witharegex text text text and again text"

I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:

[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]

Is this possible? How?

EDIT: @Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)" var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = []; while ( (result = regex.exec(str)) ) { indices.push({startsAt:result.index}); }; console.log(indices[0].link);console.log(indices[1].link);

Share Improve this question edited Apr 18, 2015 at 16:25 asked Apr 18, 2015 at 14:58 user1094081user1094081 5
  • 2 Yes, it's possible ! – adeneo Commented Apr 18, 2015 at 15:01
  • Please show an example of the input – mplungjan Commented Apr 18, 2015 at 15:19
  • Now that you know it's possible, it would be nice if you tried it. If it doesn't work when you do, you can post your code and we'll explain why. (hint: use indexOf or match string methods) – Touffy Commented Apr 18, 2015 at 15:22
  • @Touffy: you're right, I'll do it as soon as possible – user1094081 Commented Apr 18, 2015 at 15:24
  • @mplungjan: "simple text simple text simple text domain.ext/subdir again text text text yourbank./transfertomealltheirmoney/witharegex text text text and again text". SO eliminates h t t p: / / from my strings – user1094081 Commented Apr 18, 2015 at 15:27
Add a ment  | 

2 Answers 2

Reset to default 14

One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like

 var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
 var matches = input.match(expression);

Then, you can iterate through the matches to discover there starting and ending points with the use of indexOf

for(match in matches)
    {
        var result = {};
        result['link'] = matches[match];
        result['startsAt'] = input.indexOf(matches[match]);
        result['endsAt'] = 
            input.indexOf(matches[match]) + matches[match].length;
     }

Of course, you may have to tinker with the regular expression itself to suit your specific needs.

You can see the results logged by console in this fiddle

const getLinksPool = (links) => { //you can replace the https with any links like http or www

const linksplit = links.replace(/https:/g, " https:");
let linksarray = linksplit.split(" ");
let linkspools = linksarray.filter((array) => {
return array !== "";
});

return linkspools;

};

发布评论

评论列表(0)

  1. 暂无评论