开发者

Need assistance with Regular Expressions in Qt (QRegExp) [bad repetition syntax?]

开发者 https://www.devze.com 2023-01-31 22:59 出处:网络
void MainWindow::whatever(){ QRegExp rx (\"<span(.*?)>\"); //QString line = ui->txtNet1->toHtml();
void MainWindow::whatever(){
    QRegExp rx ("<span(.*?)>");
    //QString line = ui->txtNet1->toHtml();
    QString line = "<span>Bar</span><span style='baz'>foo</span>";
    while(line.contains(rx)){
        qDebug()<<"Found rx!";
        line.remove (rx);
    }
}

I've tested the regular expression online using this tool. With the given regex string and a sample text of <span style="foo">Bar</span> the tool says that it the regular expression should be found in the string. In my Qt code, however, I'm never getting into my while loop.

I've really never used regex before, in Qt or any other language. Can 开发者_StackOverflowsomeone provide some help? Thanks!

[edit] So I just found that QRegExp has a function errorString() to use if the regex is invalid. I output this and see: "bad repetition syntax". Not really sure what this means. Of course, googling for "bad repetition syntax" brings up... this post. Damn google, you fast.


The problem is that QRegExp only supports greedy quantifiers. More precisely, it supports either greedy or reluctant quantifiers, but not both. Thus, <span(.*?)> is invalid, since there is no *? operator. Instead, you can use

QRegExp rx("<span(.*)>");
rx.setMinimal(true);

This will give every *, +, and ? in the QRegExp the behavior of *?, +?, and ??, respectively, rather than their default behavior. The difference, as you may or may not be aware, is that the minimal versions match as few characters as possible, rather than as many.

In this case, you can also write

QRegExp rx("<span([^>]*)>");

This is probably what I would do, since it has the same effect: match until you see a >. Yours is more general, yes (if you have a multi-character ending token), but I think this is slightly nicer in the simple case. Either will work, of course.

Also, be very, very careful about parsing HTML with regular expressions. You can't actually do it, and recognizing tags is—while (I believe) possible—much harder than just this. (Comments, CDATA blocks, and processing instructions throw a wrench in the works.) If you know the sort of data you're looking at, this can be an acceptable solution; even so, I'd look into an HTML parser instead.


What are you trying to achieve? If you want to remove the opening tag and its elements, then the pattern

<span[^>]*>

is probably the simplest.

The syntax .*? means non-greedy match which is widely supported, but may be confusing the QT regex engine.

0

精彩评论

暂无评论...
验证码 换一张
取 消