开发者

Can I use variables in pattern in Regex (C#)

开发者 https://www.devze.com 2023-01-23 07:33 出处:网络
I have some HTML-text, where I need to replace words to links on them. For example, I have text with word \"PHP\", and want to replace it with <a href=\"glossary.html#php\">PHP</a>. And th

I have some HTML-text, where I need to replace words to links on them. For example, I have text with word "PHP", and want to replace it with <a href="glossary.html#php">PHP</a>. And there are many words that I need to replace.

My code:

public struct GlossaryReplace
{
    public string word; // here the words, e.g. PHP
    public string link; // here the links to replace, e.g. glossary.html#php
}
public static GlossaryReplace[] Replaces = null;    

IHTMLDocument2 html_doc = webBrowser1.Document.DomDocument as IHTMLDocument2;
string html_content = html_doc.body.outerHTML;

for (int i = 0; i < Replaces.Length; i++)
{
    String substitution = "<a class=\"glossary\" href=\"" + Replaces[i].link + "\">" + Replaces[i].word + "</a>";
    html_content = Regex.Replace(html_content, @"\b" + Replaces[i].word + "\b", substitution);
}
html_doc.bod开发者_C百科y.innerHTML = html_content;

The trouble is - this is not working :( But,

html_content = Regex.Replace(html_content, @"\bPHP\b", "some replacement");

this code works well! I can't understand my error!


The @ prefix for strings only apply to the immediately following string, so when you concatenate strings you may have to use it on each string.

Change this:

html_content = Regex.Replace(html_content, @"\b" + Replaces[i].word + "\b", substitution);

to:

html_content = Regex.Replace(html_content, @"\b" + Replaces[i].word + @"\b", substitution);

In a regular expression \b means a word boundary, but in a string it means a backspace character (ASCII 8). You get a compiler error if you use an escape code that doesn't exist in a string (e.g. \s), but not in this case as the code exist both in strings and regular expressions.

On a side note; a method that is useful when creating regular expression patterns dynamically is the Regex.Escape method. It escapes characters in a string to be used in a pattern, so @"\b" + Regex.Escape(Replaces[i].word) + @"\b" would make the pattern work even if the word contains characters that have a special meaning in a regular expression.


You forgot a @ here:

@"\b" + Replaces[i].word + "\b"

Should be:

@"\b" + Replaces[i].word + @"\b"

I'd also recommend that you use an HTML parser if you are modifying HTML. HTML Agility Pack is a useful library for this purpose.

0

精彩评论

暂无评论...
验证码 换一张
取 消