开发者

In Yahoo-Pipes, how to use regex when you can't see non-printable characters and html tags?

开发者 https://www.devze.com 2022-12-20 05:44 出处:网络
I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might开发者_如何学Python be some newlines, spaces, html tags, etc in the string, bu

I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might开发者_如何学Python be some newlines, spaces, html tags, etc in the string, but is there anyway to actually see what is in the string, the debugger seems to show only the real text. How do you deal with this?


If the content of the string is HTML then debugger gives you a choice of viewing "HTML" or "Source". Source should show you any HTML tags that are there.

However if your concern is white space, this may not be enough. Your only option is to "view source" on the original page.

The best course of action is to explicitly handle these possibilities in your regex. For example, if you think you might be getting white space in your target string, use the \s* pattern in the critical positions. That will match zero or more spaces, tabs, and new lines (you must also have the "s" option checked in the regex panel for new lines).

However, without specific examples of source text and the regex you are using - advice can only be generic.


What I do is use a regex tester (whichever uses the same regex engine that you are using) and I test my pattern on it. I've tried using text editors that display invisible characters but to me they only add to the confusion.

So I just go by trial and error. For instance, if a line ends in:

</a>

Then I'll try the following patterns on the regex tester until I find one that works:

</a>.
</a>..
</a>\s
</a>\s*
</a>\n
</a>\r
</a>\r\n

Etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消