I use SQL Server to locally store data for statistical analysis. I create my tables from csv files that typically have hundreds of columns. Manually prescribing column names and types would be tedious, so I use "suggest types" in the Import Wizard to pick the correct type.
This works most of the time, but frequently my csv files have a "ragged top edge" (i.e., a lot of the columns are empty for the first thousand or million lines -- a lot of these files are 1+ gb, thus the need for SQL Server to facilitate subsetting joining). Because the Import Wizard only looks at up to the first 1000 rows, this can fail my import. Is there a clever workaround?
The solution I can think of is to move the last thousand or so rows to the top of the file, just below 开发者_运维百科the header row. But since some of these files are 1+ gb with millions of rows, I can't do in a text editor. Is there a way that I can outsmart the Import Wizard? Or read backwards? Or move the last thousand or so lines?
Thanks!
It looks like you can't set in the wizard but you can if you do it as a full SSIS solution. And/or the row limit of 1000 is a bug in SQL Server 2008 (not sure about R2)
See this for more
I have set this sample rows myself at some point but can't remember when and what version/SP level I used.
You can specify the header rows to skip until you get the expected result.
I'd be incrementing the Header rows to skip by 1000 or so then reduce it to make sure I haven't missed anything until I can get a decent preview of the data. Not tried this on such a large file, but can't see why it wouldn't work on a decent machine.
If you want something less tedious, a simple batch file or console app could remove whitespace from these files. Just make sure it uses the correct textreader approach to ensure it's reading single lines instead of trying to load the whole file in memory.
精彩评论