开发者

CSV Validation in C# - Making sure each row has the same number of commas

开发者 https://www.devze.com 2023-02-16 03:32 出处:网络
I wish to implement a fairly simple CSV checker in my C#/ASP.NET application - my project automatically generates CSV\'s from GridView\'s for users, but I want to be able to quickly run through each l

I wish to implement a fairly simple CSV checker in my C#/ASP.NET application - my project automatically generates CSV's from GridView's for users, but I want to be able to quickly run through each line and see if they have the s开发者_StackOverflow中文版ame amount of commas, and throw an exception if any differences occur. So far I have this, which does work but there are some issues I'll describe soon:

 int? CommaCount = null;

StringBuilder sb = new StringBuilder();
            StringWriter sw = new StringWriter(sb);
            String Str = null;

            //This loops through all the headerrow cells and writes them to the stringbuilder
            for (int k = 0; k <= (grd.Columns.Count - 1); k++)
            {
                sw.Write(grd.HeaderRow.Cells[k].Text + ",");    
            }

            sw.WriteLine(",");


            //This loops through all the main rows and writes them to the stringbuilder
            for (int i = 0; i <= grd.Rows.Count - 1; i++)
            {
                StringBuilder RowString = new StringBuilder();
                for (int j = 0; j <= grd.Columns.Count - 1; j++)
                {
                    //We'll need to strip meaningless junk such as <br /> and &nbsp;
                    Str = grd.Rows[i].Cells[j].Text.ToString().Replace("<br />", "");
                    if (Str == "&nbsp;")
                    {
                        Str = "";
                    }

                    Str = "\"" + Str + "\"" + ",";

                    RowString.Append(Str);
                    sw.Write(Str);
                }
                sw.WriteLine();

                //The below code block ensures that each row contains the same number of commas, which is crucial
                int RowCommaCount = CheckChar(RowString.ToString(), ',');
                if (CommaCount == null)
                {
                    CommaCount = RowCommaCount;
                }
                else
                {
                    if (CommaCount!= RowCommaCount)
                    {
                        throw new Exception("CSV generated is corrupt - line " + i + " has " + RowCommaCount + " commas when it should have " + CommaCount);
                    }
                }
            }

            sw.Close();

And my CheckChar method:

protected static int CheckChar(string Input, char CharToCheck)
    {
        int Counter = 0;
        foreach (char StringChar in Input)
        {
            if (StringChar == CharToCheck)
            {
                Counter++;
            }
        }
        return Counter;
    }

Now my problem is, if a cell in the grid contains a comma, my check char method will still count these as delimiters so will return an error. As you can see in the code, I wrap all the values in " characters to 'escape' them. How simple would it be to ignore commas in values in my method? I assume I'll need to rewrite the method quite a lot.


var rx = new Regex("^  (  ( \"[^\"]*\" )  |  (  (?!$)[^\",]  )+  |  (?<1>,)  )*  $", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
var matches = rx.Matches("Hello,World,How,Are\nYou,Today,This,Is,\"A beautiful, world\",Hi!");

for (int i = 1; i < matches.Count; i++) {
    if (matches[i].Groups[1].Captures.Count != matches[i - 1].Groups[1].Captures.Count) {
        throw new Exception();
    }
}


You could just use a regular expression that matches one item and count the number of matches in your line. An example of such a regex is the following:

var itemsRegex =
    new Regex(@"(?<=(^|[\" + separator + @"]))((?<item>[^""\" + separator +
        @"\n]*)|(?<item>""([^""]|"""")*""))(?=($|[\" + separator + @"]))");


Just do something like the following (assuming you don't want to have " inside your fields (otherwise these need some extra handling)):

protected static int CheckChar(string Input, char CharToCheck, char fieldDelimiter)
{
    int Counter = 0;
    bool inValue = false;
    foreach (char StringChar in Input)
    {
        if (StringChar == fieldDelimiter)
            inValue = !inValue;
        else if (!inValue && StringChar == CharToCheck)
            Counter++;
    }
    return Counter;
}

This will cause inValue to be true while inside fields. E.g. pass '"' as fieldDelimiter to ignore everything between "...". Just note that this won't handle escaped " (like "" or \"). You'd have to add such handling yourself.


Instead of checking the resulting string (the cake) you should check the fields (ingredients) before you concatenate (mix) them. That would give you the change to do something constructive (escaping/replacing) and throwing an exception only as a last resort.

In general, "," are legal in .csv fields, as long as the string fields are quoted. So internal "," should not be a problem, but the quotes may well be.

0

精彩评论

暂无评论...
验证码 换一张
取 消