开发者

Databricks Autoloader - dealing with combined files

开发者 https://www.devze.com 2022-12-07 22:08 出处：网络

I\'m working with some files that have some complexities multiple tab files concatenated into 1 csv files with some meta data prior to the csv data

相关专题：databricks databricks-autoloader

I'm working with some files that have some complexities

multiple tab files concatenated into 1
csv files with some meta data prior to the csv data
csv files with an extra row after the header that should be ignored
csv files with log information interspersed into the file

My q开发者_StackOverflowuestion relates to whether autoloader can split the stream (ie 1 input file to 2 or more output files) based on pattern matching or has some other mechanism for dealing with these scenarios

Ignoring the metadata using skipRows isn't an option as I want to retain the metadata in a separate output file The RescuedDataColumn option doesn't appear to be a valid approach as the data doesn't fall into the 3 identified scenarios (from the docs). ie.