开发者

large string data parsing causing high-cpu usage

开发者 https://www.devze.com 2023-02-18 21:58 出处:网络
My application need to parse some large string data. Which means I am heavily using Split, IndexOf and SubString method of string class. I am trying to use StringBuilder class whereever I have to do a

My application need to parse some large string data. Which means I am heavily using Split, IndexOf and SubString method of string class. I am trying to use StringBuilder class whereever I have to do any concatenation. However when application is doing this parsing, app cpu usage goes high (60-70%). I am guessing that calling these string APIs is what's causin开发者_JS百科g cpu usage to go high, speically the size of data is big (typical Length of string is 400K). Any ideas how can I verify what is causing cpu usage to go that high and also if there are any suggestion on how to bring cpu usage down?


One thing to check is that you are passing the StringBuilder around as much as possible, rather than creating a new one and then returning it's ToString() needlessly.

A much bigger gain though can be made if you process the data as smaller strings, read from a stream. Of course, this depends on just what sort of manipulation you are doing, but if at all possible, read your data from a StreamReader (or similar depending on the source) in small chunks, and then write it to a StreamWriter.

Often changes are only applicable within a given line of text, which makes the following pattern immediately useful:

using(StreamReader sr = new StreamReader(sourceInfo))
using(StreamWriter sw = new StreamWriter(destInfo))
  for(string line = sr.ReadLine(); line != null; line = sr.ReadLine())
    sw.WriteLine(ManipulateString(line));

In other cases where this doesn't apply, there are still ways to chunk the string to be processed up.


To find out where the CPU usage is coming from: see What Are Some Good .NET Profilers?

To reduce CPU usage: it depends, of course, on what's actually taking the time. You might, for instance, consider working not with actual substrings but with little objects encoding where they are within the big strings they came from. (There is no guarantee that this will actually be an improvement.) Very likely, when you profile your code there will be a few things that jump out at you as problems; they may well be things you'd never have guessed, and they may be very easy to fix as soon as you know they need fixing.


Further to Jon's answer if your parser does not need to do back-tracking i.e. it always reads through the sting in a forward direction and the source of the string is not a file/network stream that you can use a StreamReader with just wrap your String in a StringReader instead e.g.

//Create a StringReader using the String variable data which has your String in it
//A StringReader is just a TextReader implementation for Strings
StringReader reader = new StringReader(data);

//Now do whatever manipulation on the string you want...


In your case you are using typically very large string (Length of string is 400K).. For operations on large string we can use "ROPE" data structure, which is very efficient for your case

Please refer below links for more information

https://iq.opengenus.org/rope-data-structure/

https://www.geeksforgeeks.org/ropes-data-structure-fast-string-concatenation/

STL ropes in c++ : https://www.geeksforgeeks.org/stl-ropes-in-c/

large string data parsing causing high-cpu usage

0

精彩评论

暂无评论...
验证码 换一张
取 消