I am not good in regex. Can some one help me out to write regex for me?
I may have values like this while reading csv file.
"Artist,Name",Album,12-SCS "val""u,e1",value2,value3
Output:
Artist,Name Album 12-SCS Val"u,e1 Value2 Value3
Update: I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file 开发者_如何学编程using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.
Just adding the solution I worked on this morning.
var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");
foreach (Match m in regex.Matches("<-- input line -->"))
{
var s = m.Value;
}
As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.
This is still a work in progress, but it happily parses CSV strings like:
2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D
Actually, its pretty easy to match CVS lines with a regex. Try this one out:
StringCollection resultList = new StringCollection();
try {
Regex pattern = new Regex(@"
# Parse CVS line. Capture next value in named group: 'val'
\s* # Ignore leading whitespace.
(?: # Group of value alternatives.
"" # Either a double quoted string,
(?<val> # Capture contents between quotes.
[^""]*(""""[^""]*)* # Zero or more non-quotes, allowing
) # doubled "" quotes within string.
""\s* # Ignore whitespace following quote.
| (?<val>[^,]*) # Or... zero or more non-commas.
) # End value alternatives group.
(?:,|$) # Match end is comma or EOS",
RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
Match matchResult = pattern.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups["val"].Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)
Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.
Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.
Give CsvHelper a try (a library I maintain). It's available via NuGet.
You can easily read a CSV file into a custom class collection. It's also very fast.
var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();
Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.
"hello,this",is,"a ""test"""
...split...
"hello | this" | is | "a ""test"""
...iterate and merge 'til you've an even number of double quotes...
"hello,this" - even number of quotes (note comma removed by split inserted between bits)
is - even number of quotes
"a ""test""" - even number of quotes
...then strip of leading and trailing quote if present and replace "" with ".
It could be done using below code:
using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
}
parser.Close();
精彩评论