最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - Regular Expression to find URLs in block of Text (Javascript) - Stack Overflow

programmeradmin5浏览0评论

I need a Javascript regular expression that scans a block of plain text and returns the text with the URLs as links.

This is what i have:

findLinks: function(s) {
          var hlink = /\s(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g;
          return (s.replace(hlink, function($0, $1, $2) {
              s = $0.substring(1, $0.length);
              while (s.length > 0 && s.charAt(s.length - 1) == '.') s = s.substring(0, s.length - 1);

              return ' ' + s + '';
          }));
      }

the problem is that it will only match and NOT google/adsense

How could I acplish both?

I need a Javascript regular expression that scans a block of plain text and returns the text with the URLs as links.

This is what i have:

findLinks: function(s) {
          var hlink = /\s(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g;
          return (s.replace(hlink, function($0, $1, $2) {
              s = $0.substring(1, $0.length);
              while (s.length > 0 && s.charAt(s.length - 1) == '.') s = s.substring(0, s.length - 1);

              return ' ' + s + '';
          }));
      }

the problem is that it will only match http://www.google. and NOT google./adsense

How could I acplish both?

Share Improve this question edited Nov 18, 2009 at 14:57 FrustratedWithFormsDesigner 27.5k31 gold badges149 silver badges210 bronze badges asked Nov 18, 2009 at 14:51 Theofanis PantelidesTheofanis Pantelides 4,8547 gold badges31 silver badges49 bronze badges
Add a ment  | 

4 Answers 4

Reset to default 6

I use this a as reference all the time. This guy has 8 regex's you should know.

http://net.tutsplus./tutorials/other/8-regular-expressions-you-should-know/

Here is what he uses to look for URL's

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/ 

He also breaks down what each part does. Very useful for learning regex's and not just getting an answer that works for reasons you don't understand.

This is a non-trivial task. To match any URI that is valid according to the relevant RFCs you need a monumentally plex regular expression, and even then that won't filter out URIs with invalid top-level domains (e.g. http://brussels.sprout/). So, you have to promise. Determine what's important to you (examples: are false positives or false negatives more acceptable? Do you want to limit top-level domains to only those that currently exist? Do you allow non-Latin characters in matched URIs?) You should decide what you need you regular expression to do and design it accordingly rather than blindly copying and pasting an example from the web.

You could make the protocol part optional:

/\s((ht|f)tp:\/\/)?([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g

Try this (works with your sample text)

\S+\.\S+
发布评论

评论列表(0)

  1. 暂无评论