开发者

C# fast way to replace text in a html file

开发者 https://www.devze.com 2023-01-23 16:57 出处:网络
I want to replace text from a certain range in my HTML file (like from position 1000 to 200000) with text from another HTML file开发者_运维百科. Can someone recommend me the best way to do this?Pieter

I want to replace text from a certain range in my HTML file (like from position 1000 to 200000) with text from another HTML file开发者_运维百科. Can someone recommend me the best way to do this?


Pieter's way will work, but it does involve loading the whole file into memory. That may well be okay, but if you've got particularly large files you may want to consider an alternative:

  • Open a TextReader on the original file
  • Open a TextWriter for the target file
  • Copy blocks of text by calling Read/Write repeatedly, with a buffer of say 8K characters until you've read the initial amount (1000 characters in your example)
  • Write the replacement text out to the target writer by again opening a reader and copying blocks
  • Skip the text you want to ignore in the original file, by repeatedly reading into a buffer and just ignoring it (incrementing a counter so you know how much you've skipped, of course)
  • Copy the rest of the text from the original file in the same way.

Basically it's just lots of copying operations, including one "copy" which doesn't go anywhere (for skipping the text in the original file).


Try this:

string input = File.ReadAllText("<< input HTML file >>");
string replacement = File.ReadAllText("<< replacement HTML file >>");

int startIndex = 1000;
int endIndex = 200000;

var sb = new StringBuilder(
    input.Length - (endIndex - startIndex) + replacement.Length
);

sb.Append(input.Substring(0, startIndex));
sb.Append(replacement);
sb.Append(input.Substring(endIndex));

string output = sb.ToString();


The replacement code Pieter posted does the job, and using the StringBuilder with the known resulting length is a clever way to save performance.

Should do what you asked, but sometimes when working with structured data like html, it is preferable to load it as XML (I have used the HtmlAgilityPack for that). Then you could use XPath to find the node you want to replace, and work with it. It might be slower, but as I said, you can work with the structure then.

0

精彩评论

暂无评论...
验证码 换一张
取 消