I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).
Example:
"simple text simple text simple text domain.ext/subdir again text text text youbank/transfertomealltheirmoney/witharegex text text text and again text"
I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:
[{link:"/dir",startsAt:25,endsAt:47},
{link:"/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]
Is this possible? How?
EDIT: @Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)"
var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push({startsAt:result.index});
}; console.log(indices[0].link);console.log(indices[1].link);
I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).
Example:
"simple text simple text simple text domain.ext/subdir again text text text youbank./transfertomealltheirmoney/witharegex text text text and again text"
I need a JS function that does the following: - finds all the links (no matter if there are duplicates); - returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:
[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]
Is this possible? How?
EDIT: @Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)"
var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push({startsAt:result.index});
}; console.log(indices[0].link);console.log(indices[1].link);
- 2 Yes, it's possible ! – adeneo Commented Apr 18, 2015 at 15:01
- Please show an example of the input – mplungjan Commented Apr 18, 2015 at 15:19
-
Now that you know it's possible, it would be nice if you tried it. If it doesn't work when you do, you can post your code and we'll explain why. (hint: use
indexOf
ormatch
string methods) – Touffy Commented Apr 18, 2015 at 15:22 - @Touffy: you're right, I'll do it as soon as possible – user1094081 Commented Apr 18, 2015 at 15:24
- @mplungjan: "simple text simple text simple text domain.ext/subdir again text text text yourbank./transfertomealltheirmoney/witharegex text text text and again text". SO eliminates h t t p: / / from my strings – user1094081 Commented Apr 18, 2015 at 15:27
2 Answers
Reset to default 14One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like
var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
var matches = input.match(expression);
Then, you can iterate through the matches to discover there starting and ending points with the use of indexOf
for(match in matches)
{
var result = {};
result['link'] = matches[match];
result['startsAt'] = input.indexOf(matches[match]);
result['endsAt'] =
input.indexOf(matches[match]) + matches[match].length;
}
Of course, you may have to tinker with the regular expression itself to suit your specific needs.
You can see the results logged by console in this fiddle
const getLinksPool = (links) => { //you can replace the https with any links like http or www
const linksplit = links.replace(/https:/g, " https:");
let linksarray = linksplit.split(" ");
let linkspools = linksarray.filter((array) => {
return array !== "";
});
return linkspools;
};