开发者

Import huge 550000+ row CSV file into Access

开发者 https://www.devze.com 2022-12-18 18:34 出处:网络
I have a CSV file with 550,000+ 开发者_如何学Crows. I need to import this data into Access, but when I try it throws an error that the file is too large (1.7GB).

I have a CSV file with 550,000+ 开发者_如何学Crows.

I need to import this data into Access, but when I try it throws an error that the file is too large (1.7GB).

Can you recommend a way to get this file into Access?


Try linking instead of importing ("get external data" -> "link table" in 2003), that leaves the data in the CSV-file and reads from the file directly and in-place. It doesn't limit size (at least not anywhere near 1.7 GB). It may limit some of your read/update operations, but it will at least get you started.


I'd either try the CSV ODBC connector, or otherwise import it first in a less limited database (MySQL, SQL Server) and import it from there.

It seems that some versions of access have a hard 2GB limit on MDB files so you might get into trouble with that anyway.

Good luck.


You can also use an ETL tool. Kettle is an open source one (http://kettle.pentaho.org/) and really quite easy to use. To import a file into a database requires a single transformation with 2 steps: CSV Text Input and Table Output.


why do you using access for huge files ? use sqlexpress or firebird instead


I remember that Access has some size limitation around 2 Go. Going to free SQLExpress (limited to 4 Go) or free MySQL (no size limitation) could be easier.


Another option would be to do away with the standard import functions and write your own. I have done this one time before when some specific logic needed to be applied to the data before import. The basic structure is……

Open then file Get the first line Loop through until the end of the line If we find a comma then move onto the next field Put record into database Get the next line repeat etc

I wrapped it up into a transaction that committed every 100 rows as I found that improved performance in my case but it would depend on your data if that helped.

However I would say that linking the data as others have said is the best solution, this is just an option if you absolutely have to have the data in access


Access creates a lot of overhead so even relatively small data sets can bloat the file to 2GB, and then it will shut down. Here are a couple of straightforward ways of doing the import. I didn't test this on huge files, but these concepts will definitely work on regular files.

Import data from a closed workbook (ADO)

If you want to import a lot of data from a closed workbook you can do this with ADO and the macro below. If you want to retrieve data from another worksheet than the first worksheet in the closed workbook, you have to refer to a user defined named range. The macro below can be used like this (in Excel 2000 or later):
GetDataFromClosedWorkbook "C:\FolderName\WorkbookName.xls", "A1:B21", ActiveCell, False
GetDataFromClosedWorkbook "C:\FolderName\WorkbookName.xls", "MyDataRange", Range ("B3"), True

Sub GetDataFromClosedWorkbook(SourceFile As String, SourceRange As String, _
    TargetRange As Range, IncludeFieldNames As Boolean)
' requires a reference to the Microsoft ActiveX Data Objects library
' if SourceRange is a range reference:
'   this will return data from the first worksheet in SourceFile
' if SourceRange is a defined name reference:
'   this will return data from any worksheet in SourceFile
' SourceRange must include the range headers
'
Dim dbConnection As ADODB.Connection, rs As ADODB.Recordset
Dim dbConnectionString As String
Dim TargetCell As Range, i As Integer
    dbConnectionString = "DRIVER={Microsoft Excel Driver (*.xls)};" & _
        "ReadOnly=1;DBQ=" & SourceFile
    Set dbConnection = New ADODB.Connection
    On Error GoTo InvalidInput
    dbConnection.Open dbConnectionString ' open the database connection
    Set rs = dbConnection.Execute("[" & SourceRange & "]")
    Set TargetCell = TargetRange.Cells(1, 1)
    If IncludeFieldNames Then
        For i = 0 To rs.Fields.Count - 1
            TargetCell.Offset(0, i).Formula = rs.Fields(i).Name
        Next i
        Set TargetCell = TargetCell.Offset(1, 0)
    End If
    TargetCell.CopyFromRecordset rs
    rs.Close
    dbConnection.Close ' close the database connection
    Set TargetCell = Nothing
    Set rs = Nothing
    Set dbConnection = Nothing
    On Error GoTo 0
    Exit Sub
InvalidInput:
    MsgBox "The source file or source range is invalid!", _
        vbExclamation, "Get data from closed workbook"
End Sub


Another method that doesn't use the CopyFromRecordSet-method

With the macro below you can perform the import and have better control over the results returned from the RecordSet.

Sub TestReadDataFromWorkbook()
' fills data from a closed workbook in at the active cell
Dim tArray As Variant, r As Long, c As Long
    tArray = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:B21")
    ' without using the transpose function
    For r = LBound(tArray, 2) To UBound(tArray, 2)
        For c = LBound(tArray, 1) To UBound(tArray, 1)
            ActiveCell.Offset(r, c).Formula = tArray(c, r)
        Next c
    Next r
    ' using the transpose function (has limitations)
'    tArray = Application.WorksheetFunction.Transpose(tArray)
'    For r = LBound(tArray, 1) To UBound(tArray, 1)
'        For c = LBound(tArray, 2) To UBound(tArray, 2)
'            ActiveCell.Offset(r - 1, c - 1).Formula = tArray(r, c)
'        Next c
'    Next r
End Sub

Private Function ReadDataFromWorkbook(SourceFile As String, SourceRange As String) As Variant
' requires a reference to the Microsoft ActiveX Data Objects library
' if SourceRange is a range reference:
'   this function can only return data from the first worksheet in SourceFile
' if SourceRange is a defined name reference:
'   this function can return data from any worksheet in SourceFile
' SourceRange must include the range headers
' examples:
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:A21")
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:B21")
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "DefinedRangeName")
Dim dbConnection As ADODB.Connection, rs As ADODB.Recordset
Dim dbConnectionString As String
    dbConnectionString = "DRIVER={Microsoft Excel Driver (*.xls)};ReadOnly=1;DBQ=" & SourceFile
    Set dbConnection = New ADODB.Connection
    On Error GoTo InvalidInput
    dbConnection.Open dbConnectionString ' open the database connection
    Set rs = dbConnection.Execute("[" & SourceRange & "]")
    On Error GoTo 0
    ReadDataFromWorkbook = rs.GetRows ' returns a two dim array with all records in rs
    rs.Close
    dbConnection.Close ' close the database connection
    Set rs = Nothing
    Set dbConnection = Nothing
    On Error GoTo 0
    Exit Function
InvalidInput:
    MsgBox "The source file or source range is invalid!", vbExclamation, "Get data from closed workbook"
    Set rs = Nothing
    Set dbConnection = Nothing
End Function

For really large files, you can try something like this . . .

INSERT INTO [Table] (Column1, Column2)
SELECT *
FROM [Excel 12.0 Xml;HDR=No;Database=C:\your_path\excel.xlsx].[SHEET1$];

OR

SELECT * INTO [NewTable]
FROM [Excel 12.0 Xml;HDR=No;Database=C:\your_path\excel.xlsx].[SHEET1$];
0

精彩评论

暂无评论...
验证码 换一张
取 消