Please help me find a regular expression to parse the data like this:
'EBB112' '0 23923 12272 7' Carrots 'C' 'O' 'A' 'B' 'C' '12/128ml' '$9.65' '$0.80'
'EBB211' '0 23923 12266 6' 'Vegetables & Turkey' 'C' 'O' 'A' 'B' 'C' '12/128m开发者_JAVA百科l' '$9.65' '$0.80'
I have these 11 fields (shown in single quotes) and I need to parse them field by field and save them into a .csv file.Have about more then 3000 such lines.
Any help would be highly appreciated. Thanks
I would recommend not trying to figure out a regex yourself - try to use an appropriate library to handle stuff like that.
Take a good look at FileHelpers - it's a great, free C# library to handle any kind of delimited (e.g. CSV, tab-delimited) or fixed-width import files.
You basically define the structure of your import file in a class that represents the data (something like this: I don't know what your field are called - so I'm just guessing :-)
using System;
using FileHelpers;
namespace ReadDataFromFile
{
[DelimitedRecord(" ")]
public class DataClass
{
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string EbbField;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string CompoundField;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string VegiField;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string C1Field;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string O1Field;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string A1Field;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string B1Field;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string C2Field;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string MlField;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string DollarField1;
[FieldQuoted('\'', QuoteMode.OptionalForBoth)]
public string DollarField2;
}
}
and then the FileHelpers library handles all the rest for you:
using FileHelpers;
...
FileHelperEngine engine = new FileHelperEngine(typeof(DataClass));
DataClass[] res = engine.ReadFile(@"D:\test.data") as DataClass[];
Now, your array res
contains one entry for each line in your data file - pretty slick!
No fuss, no muss, no regexes.
Use text2re, a free web-based "regex by example" generator. This will help you varieties of regular expression to test.
You can split the string instead by ' '
.
var array = Regex.Split(line.Substring(1, line.Length - 2), "' '");
I've removed the first and the last chars because they will not be removed by the split method.
Start with this
system.text.regularexpressions.regex.matches(Str,@""\'(.*?)\'"))
That will grab all the stuff between single quotes but you need to look at each match and this won't handle weird cases with nested delimiters or other such nonsense.
('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.'?)\s('?.*'?)
I think basicly ('?.'?) means a group of characters that might start with ' and end with ' and then any number of \s or simply speaking whitespaces
I might be wrong thou
Try this(this won't handle single quotes within data):
string[] entries =
{
"'EBB112' '0 23923 12272 7' 'Carrots' 'C' 'O' 'A' 'B' 'C' '12/128ml' '$9.65' '$0.80'",
"'EBB211' '0 23923 12266 6' 'Vegetables & Turkey' 'C' 'O' 'A' 'B' 'C' '12/128ml' '$9.65' '$0.80' "
};
var newEntries = entries.Select(a=> Regex.Replace(a, "'\\s+'", "','")).ToList();
newEntries.ForEach(
a=> <YOUR_FILE_STREAM>.WriteLine(a)
);
you don't need regex for that as far as i can see ...
split into lines ... strip the first, and the last single quote, and split by this string "' '"
//edit:
ah... those whitspaces weren't there some time ago ;-)
the line with "Carrots" (the field without single quotes) makes this a bit painfull
lets try this regex as a split token on all lines:
/'\W+'/
ignore empty fields in the results, and mark all lines that don't produce 11 non-empty fields for further processing ... you will need another regex for those ...
精彩评论