开发者

Reading in a CSV file with multi-line records in Delphi

开发者 https://www.devze.com 2022-12-30 01:19 出处:网络
Normally I just use TStringList.CommaText, but this wont work when a given field has multiple lines.Basi开发者_如何转开发cally I need a csv processor that conforms to rfc4180.I\'d rather not have to i

Normally I just use TStringList.CommaText, but this wont work when a given field has multiple lines. Basi开发者_如何转开发cally I need a csv processor that conforms to rfc4180. I'd rather not have to implement the RFC myself.


Do you really need full RFC support? I can't count the number of times I've written a "csv parser" in perl or something similar. Split on comma's and be done. The only problem comes when you need to respect quotes. If you do, write a "quotesplit" routine that looks for quotes and ensures they're balanced. Unless this csv processor is the meat and potatoes of some application, I'm not sure it'll really be a problem.

On the other hand, I really don't think fully implementing the rfc is that complex. That's a relatively short rfc in comparison to things like... HTTP, SMTP, IMAP, ...

In perl, a decent quotesplit() I wrote is:

sub quotesplit {
    my ($regex, $s, $maxsplits) = @_;
    my @split;
    my $quotes = "\"'";
    die("usage: quotesplit(qr/.../,'string...'), // instead of qr//?\n")
        if scalar(@_) < 2;

    my $lastpos;
    while (1) {
        my $pos = pos($s);

        while ($s =~ m/($regex|(?<!\\)[$quotes])/g) {
            if ($1 =~ m/[$quotes]/) {
                $s =~ m/[^$quotes]*/g;
                $s =~ m/(?<!\\)[$quotes]/g;
            }
            else {
                push @split, substr($s,$pos,pos($s) - $pos - length($1));
                last;
            }
        }

        if (defined(pos($s)) and $lastpos > pos($s)) {
            errorf('quotesplit() issue: lastpos %s > pos %s',
                $lastpos, pos($s)
            );
            exit;
        }
        if ((defined($maxsplits) && scalar(@split) == ($maxsplits - 1))) {
            push @split, substr($s,pos($s));
            last;
        }
        elsif (not defined(pos($s))) {
            push @split, substr($s,$lastpos);
            last;
        }

        $lastpos = pos($s);
    }

    return @split;
}


did you tried to use Delimiter := ';' and DelimiterText := instead CommaText?

btw, that RFC has no sense at all... it's absurd to Request For Comments on CSV...


Here is my CSV parser (not maybe to the RFC but it works fine). Keep calling it on a supplied string, each time it gives you the next CSV field. I dont believe it has any problems with multiple line.

function CSVFieldToStr(
           var AStr : string;
               ADelimChar : char = Comma ) : string;
{ Returns the next CSV field str from AStr, deleting it from AStr,
  with delimiter }
var
  bHasQuotes : boolean;

  function HandleQuotes( const AStr : string ) : string;
  begin
    Result := Trim(AStr);
    If bHasQuotes then
      begin
      Result := StripQuotes( Result );
      ReplaceAllSubStrs( '""', '"', Result );
      end;
  end;

var
  bInQuote    : boolean;
  I           : integer;
  C           : char;
begin
  bInQuote   := False;
  bHasQuotes := False;
  For I := 1 to Length( AStr ) do
    begin
    C := AStr[I];
    If C = '"' then
      begin
      bHasQuotes := True;
      bInQuote := not bInQuote;
      end
     else
      If not bInQuote then
       If C = ADelimChar then
          begin
          Result := HandleQuotes( Copy( AStr, 1, I-1 ));
          AStr   := Trim(Copy( AStr, I+1, MaxStrLEn ));
          Exit;
          end;
    end;
  Result := HandleQuotes(AStr);
  AStr := '';
end;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号