开发者

parse text with multiple links using regex in javascript

开发者 https://www.devze.com 2023-02-01 13:21 出处:网络
Hi i m having a text having multiple links wrapped inside text... i want a regex(i m using javascript) which can parse the text and return a array of the links...

Hi i m having a text having multiple links wrapped inside text...

i want a regex(i m using javascript) which can parse the text and return a array of the links...

for example for the text...

http://www.youtube.com/watch?v=-LiPMxFBLZY
testing
http://www.youtube.com/watch?v=Q3-l22b_Qg8&feature=related

the regex would parse the text and return a array of the links

arr[0] = "http://www.youtube.com/watch?v=-LiPMxFBLZY"
arr[1] = "http://www.youtube.com/watch?v=Q3-l22b_Qg8&feature=related"

i m trying to do so with the code...

var ytre =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig ;
var matches = new Array();

                    matches = ytre.exec(text);
                    var jm;
                    if (matches !=null )
                    {
                        for (jm=0; jm<matches.length; jm++)
 开发者_JS百科                       {
                            console.log(matches[jm]);
                        }
                    }

but its not returning the appropriate results...

please help

thanks


How about:

var text = 'http://www.youtube.com/watch?v=-LiPMxFBLZY testing http://www.youtube.com/watch?v=Q3-l22b_Qg8&feature=related http://yahoo.com';

var ytre = /(\b(https?|ftp|file):\/\/[\-A-Z0-9+&@#\/%?=~_|!:,.;]*[\-A-Z0-9+&@#\/%=~_|])/ig;

var resultArray = text.match(ytre);

See it


To parse URLs, using regexs, look at the RFC that defines URLs.

So to find regular expressions, use a variant that makes the protocol and authority non-optional, like /\b(([^:\/?#]+):)(\/\/([^\/?#]*))([^?#]*)(\?([^#]*))?(#(.*))?/gi.

http://www.ietf.org/rfc/rfc3986.txt says

Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9

The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each
paired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to

  http://www.ics.uci.edu/pub/ietf/uri/#Related

results in the following subexpression matches:

  $1 = http:
  $2 = http
  $3 = //www.ics.uci.edu
  $4 = www.ics.uci.edu
  $5 = /pub/ietf/uri/
  $6 = <undefined>
  $7 = <undefined>
  $8 = #Related
  $9 = Related

where indicates that the component is not present, as is
the case for the query component in the above example. Therefore, we
can determine the value of the five components as

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号