I searched high and low but cannot aeem to find a definitve answer to this. As is often the case with regexps. So I thought I'd ask here.
I'm trying to put together a regular expression i can use in JavaScript to replace all instances of URLs and email addresses (does'nt need to be ever so strict) with anchor tags pointing to them.
Obviously this is something usually done very simply on the server-side, but in this case it is necessary to work with plain text so an elegant JavaScript solution to perfom the replaces at runtime would be perfect.
Onl problem is, as I've stated before, I have a huge regular expression shaped gaping hole in my skill set :(
I know that one of you has the answer at the tip of your fingers though :)
I searched high and low but cannot aeem to find a definitve answer to this. As is often the case with regexps. So I thought I'd ask here.
I'm trying to put together a regular expression i can use in JavaScript to replace all instances of URLs and email addresses (does'nt need to be ever so strict) with anchor tags pointing to them.
Obviously this is something usually done very simply on the server-side, but in this case it is necessary to work with plain text so an elegant JavaScript solution to perfom the replaces at runtime would be perfect.
Onl problem is, as I've stated before, I have a huge regular expression shaped gaping hole in my skill set :(
I know that one of you has the answer at the tip of your fingers though :)
Share Improve this question edited Apr 3, 2009 at 22:07 Chad Birch 74.7k23 gold badges155 silver badges150 bronze badges asked Feb 23, 2009 at 21:00 jthompsonjthompson 9311 gold badge13 silver badges20 bronze badges5 Answers
Reset to default 4Well, blindly using regexps from http://www.osix/modules/article/?id=586
var emailRegex =
new RegExp(
'([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}' +
'\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.' +
')+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)',
"gi");
var urlRegex =
new RegExp(
'((https?://)' +
'?(([0-9a-z_!~*\'().&=+$%-]+: )?[0-9a-z_!~*\'().&=+$%-]+@)?' + //user@
'(([0-9]{1,3}\.){3}[0-9]{1,3}' + // IP- 199.194.52.184
'|' + // allows either IP or domain
'([0-9a-z_!~*\'()-]+\.)*' + // tertiary domain(s)- www.
'([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.' + // second level domain
'[a-z]{2,6})' + // first level domain- . or .museum
'(:[0-9]{1,4})?' + // port number- :80
'((/?)|' + // a slash isn't required if there is no file name
'(/[0-9a-z_!~*\'().;?:@&=+$,%#-]+)+/?))',
"gi");
then
text.replace(emailRegex, "<a href='mailto::$1'>$1</a>");
and
text.replace(urlRegex, "<a href='$1'>$1</a>");
might to work
Not a canned solution, but this will point you in the right direction.
I use Regex Coach to build and test my regexes. You can find plentiful examples of regexes for urls and email addresses online.
Here's a good article for urls...
https://blog.codinghorror./the-problem-with-urls/
emails are more straight forward since they have to end in a .tld You don't need to get fancy with that one since you're not validating, just matching, so off the top of my head...
[^\s]+@\w[\w-.]*.[a-zA-Z]+
As always, this ("this" being "processing HTML with regex") is going to be difficult and error-prone. The following will work on reasonably well-formed input only, but here's what I would do:
- find the element you want to process, take it's
innerHTML
property value - iteratively find everything that already is a link (
/(<a\b.+?</a>/ig
) - based on that, cut your string into "this isn't a link"- and "this is a link"-bits, appending all of them them to a neatly orderd array
- process the "non-link" bits only (those that don't begin with
"<a "
), looking for URL- or e-mail-address patterns - wrap every address you find in
<a>
tags join()
the array back to a string- set the
innerHTML
property to your new value
I am sure you will find regular expression examples that match e-mail addresses and URLs. Take the ones that suit you most, and use them in step 4.).
Just adding a bit of information on email regexps: Most of them seems to ignore that domain names can have the characters 'åäö' in them. So if your care about that, make sure that the solution you are using has åäöÅÄÖ in the domain part of the regexp.