regex - Regular Expression to find URLs in block of Text (Javascript)

I need a Javascript regular expression that scans a block of plain text and returns the text with the URLs as links.

This is what i have:

findLinks: function(s) {
          var hlink = /\s(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g;
          return (s.replace(hlink, function($0, $1, $2) {
              s = $0.substring(1, $0.length);
              while (s.length > 0 && s.charAt(s.length - 1) == '.') s = s.substring(0, s.length - 1);

              return ' ' + s + '';
          }));
      }

the problem is that it will only match and NOT google/adsense

How could I acplish both?

I need a Javascript regular expression that scans a block of plain text and returns the text with the URLs as links.

This is what i have:

findLinks: function(s) {
          var hlink = /\s(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g;
          return (s.replace(hlink, function($0, $1, $2) {
              s = $0.substring(1, $0.length);
              while (s.length > 0 && s.charAt(s.length - 1) == '.') s = s.substring(0, s.length - 1);

              return ' ' + s + '';
          }));
      }

the problem is that it will only match http://www.google. and NOT google./adsense

How could I acplish both?

Share Improve this question edited Nov 18, 2009 at 14:57 FrustratedWithFormsDesigner 27.5k31 gold badges149 silver badges210 bronze badges asked Nov 18, 2009 at 14:51 Theofanis Pantelides 4,8547 gold badges31 silver badges49 bronze badges

Add a ment |

4 Answers 4

Sorted by: Reset to default 6

I use this a as reference all the time. This guy has 8 regex's you should know.

http://net.tutsplus./tutorials/other/8-regular-expressions-you-should-know/

Here is what he uses to look for URL's

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

He also breaks down what each part does. Very useful for learning regex's and not just getting an answer that works for reasons you don't understand.

This is a non-trivial task. To match any URI that is valid according to the relevant RFCs you need a monumentally plex regular expression, and even then that won't filter out URIs with invalid top-level domains (e.g. http://brussels.sprout/). So, you have to promise. Determine what's important to you (examples: are false positives or false negatives more acceptable? Do you want to limit top-level domains to only those that currently exist? Do you allow non-Latin characters in matched URIs?) You should decide what you need you regular expression to do and design it accordingly rather than blindly copying and pasting an example from the web.

You could make the protocol part optional:

/\s((ht|f)tp:\/\/)?([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g

Try this (works with your sample text)

\S+\.\S+

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

regex - Regular Expression to find URLs in block of Text (Javascript) - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)