开发者

Rubular/Ruby discrepancy in captured text

开发者 https://www.devze.com 2023-03-20 23:56 出处:网络
I\'ve carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet

I've carefully cut and pasted from this Rubular window http://rubular.com/r/YH8Qj2EY9j to my code, yet I get different results. The Rubular match capture is what I want. Yet

desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
    puts description = $1 
end

only gets me the first line,开发者_运维问答 i.e.

<DD>@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />

I don't think it's my test data, but that's possible. What am I missing?

(ruby 1.9 on Ubuntu 10.10(


Paste your test data into an editor that is able to display control characters and verify your line break characters. Normally it should be only \n on a Linux system as in your regex. (I had unusual linebreaks a few weeks ago and don't know why.)

The other check you can do is, change your brackets and print your capturing groups. so that you can see which part of your regex matches what.

/^<DD>(.*)\n?(.*)\n/

Another idea to get this to work is, change the .*. Don't say match any character, say match anything, but \n.

^<DD>([^\n]*\n?[^\n]*)\n


I believe you need the multiline modifier in your code:

/m Multiline mode: dot matches newlines, ^ and $ both match line starts and endings.


The following:

#!/usr/bin/env ruby

desc= '<DD>@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377
<DT>la la this should not be matched oh good'
desc_pattern = /^<DD>(.*\n?.*)\n/
if desc =~ desc_pattern
    puts description = $1 
end

prints

@mathpunk Griefing (i.e. trolling) as Play: http://t.co/LwOH1Vb<br />
– Johnny Badhair (8spiders) http://twitter.com/8spiders/status/92876473853157377

on my system (Linux, Ruby 1.8.7).

Perhaps your line breaks are really \r\n (Windows style)? What if you try:

desc_pattern = /^<DD>(.*\r?\n?.*)\r?\n/
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号