开发者

Text reformatter gradually slows with each iteration

开发者 https://www.devze.com 2023-02-09 16:19 出处:网络
开发者_StackOverflow中文版EDIT 2 Okay, I\'ve posted a copy of my source at gist.github and I only have one lingering problem that I can\'t resolve.

开发者_StackOverflow中文版EDIT 2

Okay, I've posted a copy of my source at gist.github and I only have one lingering problem that I can't resolve.

FindLine() always returns -1. I've narrowed the cause down to the if statement, but I can't figure out why. I know that symbol and symbolList are both getting passed good data.

/EDIT 2

I have a fairly simple C# program that looks for a .csv file, reads the text in that file, reformats it (and includes some information from a SQL query loaded into a DataTable), and saves it to a .tsv file for later use by another program.

My problem is that sometimes the source .csv file is over 10,000 lines and the program slows gradually as it iterates through the lines. If the .csv file is ~500 lines it takes about 45 seconds to complete, and this time get exponentially worse as the .csv file gets larger.

The SQL query returns 37,000+ lines, but is only requested once and is sorted the same way the .csv file is, so normally I won't notice it running through that file unless it can't find the corresponding data, in which case it makes it all the way through and returns the appropriate error text. I'm 99% sure it's not the cause of the slowdown.

The y and z for loops need to be exactly as long as they are.

If it's absolutely necessary I can scrub some data from the initial .csv file and post an example, but I'm really hoping I'm just missing something really obvious.

Thanks in advance guys!

Here's my source:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Linq;
using System.Text;

namespace MoxySectorFormatter
{
    class Program
    {
        static void Main(string[] args)
        {
            DataTable resultTable = new DataTable();
            double curLine = 1;
            double numLines = 0;
            string ExportPath = @"***PATH***\***OUTFILE***.tsv";
            string ImportPath = @"***PATH***\***INFILE***.csv";
            string NewText = "SECURITY\r\n";
            string OrigText = "";
            string QueryString = "SELECT DISTINCT UPPER(MP.Symbol) AS Symbol, LOWER(MP.SecType) AS SecType, MBI.Status FROM MoxySecMaster AS MP LEFT JOIN MoxyBondInfo AS MBI ON MP.Symbol = MBI.Symbol AND MP.SecType = MBI.SecType WHERE MP.SecType <> 'caus' AND MP.SecType IS NOT NULL AND MP.Symbol IS NOT NULL ORDER BY Symbol ASC;";
            SqlConnection MoxyConn = new SqlConnection("server=***;database=***;user id=***;password=***");
            SqlDataAdapter adapter = new SqlDataAdapter(QueryString, MoxyConn);

            MoxyConn.Open();
            Console.Write("Importing source file from \"{0}\".", ImportPath);
            OrigText = File.ReadAllText(ImportPath);
            OrigText = OrigText.Substring(OrigText.IndexOf("\r\n", 0) + 2);
            Console.WriteLine("\rImporting source file from \"{0}\".  Done!", ImportPath);
            Console.Write("Scanning source report.");
            for (int loop = 0; loop < OrigText.Length; loop++)
            {
                if (OrigText[loop] == '\r')
                    numLines++;
            }
            Console.WriteLine("\rScanning source report.  Done!");
            Console.Write("Downloading SecType information.");
            resultTable = new DataTable();
            adapter.Fill(resultTable);
            MoxyConn.Close();
            Console.WriteLine("\rDownloading SecType information.  Done!");

            for (int lcv = 0; lcv < numLines; lcv++)
            {
                int foundSpot = -1;
                int nextStart = 0;
                Console.Write("\rGenerating new file... {0} / {1} ({2}%)  ", curLine, numLines, System.Math.Round(((curLine / numLines) * 100), 2));
                for (int vcl = 0; vcl < resultTable.Rows.Count; vcl++)
                {
                    if (resultTable.Rows[vcl][0].ToString() == OrigText.Substring(0, OrigText.IndexOf(",", 0)).ToUpper() && resultTable.Rows[vcl][1].ToString().Length > 0)
                    {
                        foundSpot = vcl;
                        break;
                    }
                }
                if (foundSpot != -1 && foundSpot < resultTable.Rows.Count)
                {
                    NewText += resultTable.Rows[foundSpot][1].ToString();
                    NewText += "\t";
                    NewText += OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart));
                    NewText += "\t";
                    nextStart = OrigText.IndexOf(",", nextStart) + 1;
                    for (int y = 0; y < 142; y++)
                        NewText += "\t";
                    if(resultTable.Rows[foundSpot][2].ToString() == "r")
                        NewText += @"PRE/ETM";
                    else if (OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart)) == "Municipals")
                    {
                        NewText += "Muni - ";
                        nextStart = OrigText.IndexOf(",", nextStart) + 1;
                        if (OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart)).Length > 0)
                            NewText += OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart));
                        else
                            NewText += "(Orphan)";
                    }
                    else if (OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart)) == "Corporates")
                    {
                        NewText += "Corporate - ";
                        nextStart = OrigText.IndexOf(",", nextStart) + 1;
                        nextStart = OrigText.IndexOf(",", nextStart) + 1;
                        if (OrigText.Substring(nextStart, (OrigText.IndexOf("\r\n", nextStart) - nextStart)).Length > 0)
                            NewText += OrigText.Substring(nextStart, (OrigText.IndexOf("\r\n", nextStart) - nextStart));
                        else
                            NewText += "(Unknown)";
                    }
                    else
                        NewText += OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart));
                    for (int z = 0; z < 17; z++)
                        NewText += "\t";
                    NewText += "\r\n";
                    resultTable.Rows.RemoveAt(foundSpot);
                }
                else
                    Console.WriteLine("\r  Omitting {0}: Missing Symbol or SecType.", OrigText.Substring(nextStart, (OrigText.IndexOf(",", nextStart) - nextStart)));
                OrigText = OrigText.Substring(OrigText.IndexOf("\r\n", 0) + 2);
                curLine++;
            }
            Console.Write("Exporting file to \"{0}\".", ExportPath);
            File.WriteAllText(ExportPath, NewText);
            Console.WriteLine("\rExporting file to \"{0}\".  Done!\nPress any key to exit.", ExportPath);
            Console.ReadLine();
        }
    }
}


Instead of using the += operator for concatenation, use a System.Text.StringBuilder object and it's Append() and AppendLine Methods

Strings are immutable in C#, so every time you use += in your loop, a new string is created in memory, likely causing the eventual slowdown.


While this cries out for StringBuilder I don't believe that's the primary culprit here. Rather, we have a piece of code with an exponential runtime.

The culprit I have in mind is the code that calculates foundSpot. If I'm reading the code correctly this is O(n^2) while everything else is O(n).

Three pieces of advise:

1) Refactor! This routine is WAY too long. I shouldn't have to refer to "the code that calculates foundSpot", that should be a routine with a name. I see a minimum of 4 routines here, maybe more.

2) Stringbuilder.

3) That search routine has to be cleaned up. You're doing a lot of repeated calculations each time around the loop and unless there's some reason against it (I'm not going to try to figure out the tests you are applying) it needs to be done with something with search performance better than O(n).


You should write each line to the output file as you create it instead of appending all lines to the end of your output string (NewText) and writing them out at the end.

Every time that the program appends something to the end of the output string, C# creates a new string that's big enough for the old string plus the appended text, copies the old contents into the target string, then appends the new text to the end.

Assuming 40 characters per line and 500 lines, the total string size will be ~ 20K, at which point the overhead of all of those 20K copies is slowing the program WAY down.


NewText is only appended to, right? So why not just write out to the file stream? Also don't forget to add a try catch around it, so if your app blows up, you can close the file stream.

Also, the second loop would probably be faster if you pulled out the SubString call. There is no reason to be doing that over and over again.

string txt = OrigText.Substring(0, OrigText.IndexOf(",", 0)).ToUpper()
for (int vcl = 0; vcl < resultTable.Rows.Count; vcl++)
{
  if (resultTable.Rows[vcl][0].ToString() == txt && resultTable.Rows[vcl][1].ToString().Length > 0)
  {
      foundSpot = vcl;
      break;
  }
}

These tab loops are ridiculous. They are essentially constant strings that get built each time. Replace them with formatting vars that are declared at the start of your app.

string tab17 = "\t\t\t\t\t\t\t\t\t"
string tab142 = "\t\t\t\t\t...etc." 

//bad
for (int z = 0; z < 17; z++)
  NewText += "\t";
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号