Regular Expression to break row with comma separated values into distinct rows_问答_开发者

Regular Expression to break row with comma separated values into distinct rows

开发者 https://www.devze.com 2022-12-29 21:46 出处：网络

I have a file with many rows. Each row has a column which may contain comma separated values. I need each row to be distinct (ie no comma separated values).

相关专题：csv regex

I have a file with many rows. Each row has a column which may contain comma separated values. I need each row to be distinct (ie no comma separated values).

Here is an example row:

AB  AB10,AB11,AB12,AB15,AB16,AB21,AB22,AB23,AB24,AB25,AB99  ABERDEEN    Aberdeenshire

The columns are comma separated (Postcode area, Postcode districts, Post town, Former postal county).

So the above row would get turned into:

AB  AB10    ABERDEEN    Aberdeenshire
AB  AB11    ABERDEEN    Aberdeenshire
AB  AB12    ABERDEEN    Aberdeenshire
...
...

I tried the following but it didn't work...

(.+)\t(([0-9A-Z]+),)+\t(.+)\t(.+开发者_开发百科)

I agree that RegEx are not be the best way but this should work hopefully if that's all you have available to you. (Done repeatedly until there are no more matches)

Edit

Updated with the OP's final solution from the comments.

Find: (.+)\t([^,\s]+),([^\t]+)\t(.+)
Replace: \1\t\2\t\4\r\1\t\3\t\4

I agree with stakx that this doesn't sound like a good place for regexes.

I would write a small program instead which read each line, split the line into columns, split each relevant column into a list of values, and then iterated over all combinations of those, outputting a line each time.

Assuming it's only that one column which can have multiple tokens, it would basically look like this:

while not InputFile.EndOfFile:
  line = InputFile.readline();
  columns = line.split('\t'); //Assuming 1-based array, so indexes 1-4
  col2values = columns[2].split(',');
  for each value in col2values:
    OutputFile.WriteLine(columns[1]+'\t'+value+'\t'+columns[3]+'\t'+columns[4]);

If multiple columns can have multiple values, simply put another loop inside the for each.