I've inherited a .NET app to support / enhance which reads in a couple of files of high hundreds of thousands of rows, and one of millions of row.
The original developer left me code like :-
For Each ModelListRow As String In ModelListDataArray
If ModelListRow.Trim.Length = 0 Or ModelListRow.Contains(",") = False Then
GoTo SKIP_ROW
End If
Dim ModelInfo = ModelListRow.Split(",")
Dim ModelLocation As String = UCase(ModelInfo(0))
Dim ModelCustomer As String = UCase(ModelInfo(1))
Dim ModelNumber As String = UCase(ModelInfo(2))
If ModelLocation = "LOCATION" Or Mod开发者_如何学编程elNumber = "MODEL" Then
GoTo SKIP_ROW
End If
Dim MyDataRow As DataRow = dsModels.Tables(0).NewRow
MyDataRow.Item("location") = ModelLocation.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
MyDataRow.Item("model") = ModelNumber.Replace(vbCr, "").Replace(vbLf, "").Replace(vbCrLf, "")
dsModels.Tables(0).Rows.Add(MyDataRow)
SKIP_ROW:
Next
and it takes an age (well, nearly half an hour) to import these files.
I suspect there's a MUCH better way to do it. I'm looking for suggestions.
Thanks in advance.
Take a look at BULK INSERT.
http://msdn.microsoft.com/en-us/library/ms188365(v=SQL.90).aspx
Basically you point SQL Server at a text file in CSV format and it does all the logic of pulling the data into a table. If you need to massage it more than that, you can pull the text file into a staging location in SQL Server, and then run a stored proc to massage it into the format you are looking for.
The main options (apart from writing your own code from scratch) are:
- BULK INSERT or bcp.exe, which work well if your data is cleanly formatted
- SSIS, if you need workflow, data type transformations, data cleansing etc.
- .NET SqlBulkCopy API
jkohlhepp's suggestion about pulling data into a staging table then cleaning it is a good one and a very common pattern in ETL processes. But if your "massaging" isn't easy to do in TSQL then you will probably need some .NET code anyway, whether it's in SSIS or in a CLR procedure.
Personally I would use SSIS in your case, because it looks like the data is not cleanly formatted so you will probably need some custom code to clean/re-format the data on its way to the database. However it does depend on what you're most comfortable/productive with and what existing tools and standards you have in place.
Dim ExcelConnection As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcelSpreadsheet.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=Yes""")
ExcelConnection.Open()
精彩评论