开发者

A regex to match a tab that isn't surrounded by quotes

开发者 https://www.devze.com 2023-01-29 19:03 出处:网络
i have following string: ID Table 1 Table 2 1 \"Column 1 Column 2 Column 3 1 2 3 4 5 6 7 8 9\" \"Column A Column B Column C

i have following string:

ID Table 1 Table 2
1 "Column 1 Column 2 Column 3
1 2 3
4 5 6
7 8 9" "Column A Column B Column C
a b c
d e f
g h i"

The first row contains the columnheaders (ID, Table1 ,Table 2). The secound row the data.

The string is copied via the clipboard from this Excel-Sheet: http://i.stack.imgur.com/5lwaT.png

Columns are separated by \t, line by \r\n. B2 and C2 are tables. Her Columns and Row are seprareted by \t and \r\n, too开发者_高级运维. Each Table are surrounded by Quotes.

Now i split the Row:

Dim rows() as String
Regex = New Regex("\r\n")
rows = Regex.Split(MyString)

That returns:

ID Table 1 Table 2

and

1 "Column 1 Column 2 Column 3
1 2 3
4 5 6
7 8 9" "Column A Column B Column C
a b c
d e f
g h i"

Now i need to split the Lines, but i need a pattern that returns every tab that isn't surrounded by quotes.

Can anybody help me with the regex?

Thanks :)


Because I'm too tired to think of a good answer, here's a hack one instead. If you can be sure that the quotes are paired, you could hack this easily in three steps:

  1. Find the tabs that ARE in the quotes and swap them out.
  2. Split on tabs
  3. Put the real tabs back in.

Like so:

// JS psuedo-code
str = str.replace( /("[^"]*)\t([^"]*")/g, '$1ëïÒ$2' );
pieces = str.split( /\t/ );
for (var i=0,len=pieces.length;i<len;++i){
  pieces[i] = pieces[i].replace( /ëïÒ/g, "\t" );
}

The horrible hack part of this is using a replacement string that you can hope will never occur naturally.


What you are trying to do is creating your CSV parser (replace comma with tab in your case). There is great article about why you should not do this: http://secretgeek.net/csv_trouble.asp I once tried to write my own parser but then stopped because it is really not that easy. Check this free one. It saved couple of hours for me.


I use this for my CSV files, but should, with some minor tweaking, get it to work with tab-delimited as well:

Regex rExp = new Regex(@"(?:^|\x09)(\""(?:[^\""]+|\"\")*\""|[^\x09]*)");

And for reference, CSV Regex:

Regex rExp = new Regex(@"(?:^|,)(\""(?:[^\""]+|\""\"")*\""|[^,]*)");

Please not this will capture the surrounding quotes as well.

EDIT

Maybe I'm presuming too much, but it seems like you're trying to get the values and are getting caught up on the delimiter. This will capture the values within the delimiters.

EDITv2

Used verbatim strings

0

精彩评论

暂无评论...
验证码 换一张
取 消