开发者

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

开发者 https://www.devze.com 2023-03-10 20:36 出处:网络
I am trying to import data from a utf-8 enc开发者_运维百科oded flat file into SQL Server 2008 using SSIS. This is what the end of the row data looks like in Notepad++:

I am trying to import data from a utf-8 enc开发者_运维百科oded flat file into SQL Server 2008 using SSIS. This is what the end of the row data looks like in Notepad++:

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

I have a couple more images showing what the file connection manager looks like:

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

You can see that the data shows correctly in the file connection manager preview. When I try to import this data, no rows import. I get an error message indicating that the row delimiter was not found. You can see in the file connection manager images that the header row delimiter and the row delimiter are both set to {LF}. This was sufficient to generate the correct preview, so I am lost to why it did not work to import. I have tried a number of things that have brought zero results:

  • Tried using the Wizard import in SSMS...same results
  • Tried using data conversion, no impact
  • Tried setting the row delimiter to (0a), same results

[Flat File Source [582]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct.

Thanks for looking at this and I really appreciate any help you can offer.


Cause:

SSIS fails to read the file and displays the below warning due to the column delimiter Ç ("c" with cedilla) and not due to the line delimiter {LF} (Line Feed).

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

Here is a sample SSIS package that shows how to resolve the issue using Script Component and at the end there is another example that simulates your issue.

Resolution:

Below sample package is written in SSIS 2008 R2. It reads a flat file with row delimiter {LF} as a single column value; then splits the data using Script Component to insert the information into a table in SQL Server 2008 R2 database.

Use Notepad++ to create a simple flat file with few rows. The below sample file has Product Id and List Price information on each row separated by Ç as column delimiter and each row ends with {LF} delimiter.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Notepad++, click Encoding and then click Encoding in UTF-8 to save the flat file in UTF-8 encoding.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

The sample will use an SQL Server 2008 R2 database named Sora. Create a new table named dbo.ProductListPrice using the below given script. SSIS will insert the flat file data into this table.

USE Sora;
GO

CREATE TABLE dbo.ProductListPrice
(
        ProductId   nvarchar(30)    NOT NULL
    ,   ListPrice   numeric(12,2)   NOT NULL
);
GO

Create an SSIS package using Business Intelligence Development Studio (BIDS) 2008 R2. Name the package as SO_6268205.dtsx. Create a data source named Sora.ds to connect to the database Sora in SQL Server 2008 R2.

Right-click anywhere inside the package and then click Variables to view the variables pane. Create a new variable named ColumnDelimiter of data type String in the package scope SO_6268205 and set the variable with the value Ç

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Right-click on the Connection Managers and click New Flat File Connection... to create a connection to read the flat file.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the General page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Connection manager name to ProductListPrice
  • Set Description to Flat file connection manager to read product list price information.
  • Select the flat file path. I have the file in the path C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt
  • Select {LF} from Header Row Delimiter
  • Check Column names in the first data row
  • Click Columns page

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Columns page of the Flat File Connection Manager Editor, verify that the Column delimiter is blank and disabled. Click Advanced page.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions.

  • Set the Name to LineData
  • Verify that the Column delimiter is set to {LF}
  • Set the DataType to Unicode string [DT_WSTR]
  • Set the OutputColumnWidth to 255
  • Click the Preview page.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Preview page of the Flat File Connection Manager Editor, verify that the displayed data looks correct and click OK.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

You will see the data source Sora and the flat file connection manager ProductListPrice on the Connection Managers tab at the bottom of the package.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Drag and drop Data Flow Task onto the Control Flow tab of the package and name it as File to database - Without Cedilla delimiter

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Double-click the Data Flow Task to switch the view to the Data Flow tab on the package. Drag and drop a Flat File Source on the Data Flow tab. Double-click the Flat File Source to open Flat File Source Editor.

On the Connection Manager page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice and click Columns page.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Columns page of the Flat File Source Editor, check the column LineData and click OK.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Drag and drop a Script Component onto the Data Flow tab below the Flat File Source, select Transformation and click OK. Connect the green arrow from Flat File Source to Script Component. Double-click Script Component to open Script Transformation Editor.

Click Input Columns on Script Transformation Editor and select LineData column. Click Inputs and Outputs page.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Inputs and Outputs page of the Script Transformation Editor, perform the following actions.

  • Change the inputs name to FlatFileInput
  • Change the outputs name to SplitDataOutput
  • Select Output Columns and click Add Column. Repeat this again to add another column.
  • Name the first column ProductId
  • Set the DataType of column ProductId to Unicode string [DT_WSTR]
  • Set the Length to 30

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Inputs and Outputs page of the Script Transformation Editor, perform the following actions.

  • Name the second column ListPrice
  • Set the DataType of column ListPrice to numeric [DT_NUMERIC]
  • Set the Precision to 12
  • Set the Scale to 2
  • Click Script page to modify the script

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Script page of the Script Transformation Editor, perform the following actions.

  • Click the ellipsis button against ReadOnlyVariables and select the variable User::ColumnDelimiter
  • Click Edit Script...

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Paste the below C# in the Script Editor. The script performs the following tasks.

  • Using the column delimiter value Ç defined in the variable User::ColumnDelimiter, the method FlatFileInput_ProcessInputRow splits the incoming value and assigns it to the two output columns defined in the Script Component transformation.

Script component code in C#

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void PreExecute()
    {
        base.PreExecute();
    }

    public override void PostExecute()
    {
        base.PostExecute();
    }

    public override void FlatFileInput_ProcessInputRow(FlatFileInputBuffer Row)
    {
        const int COL_PRODUCT = 0;
        const int COL_PRICE = 1;

        char delimiter = Convert.ToChar(this.Variables.ColumnDelimiter);
        string[] lineData = Row.LineData.ToString().Split(delimiter);

        Row.ProductId = String.IsNullOrEmpty(lineData[COL_PRODUCT]) 
                            ? String.Empty 
                            : lineData[COL_PRODUCT];

        Row.ListPrice = String.IsNullOrEmpty(lineData[COL_PRICE]) 
                            ? 0 
                            : Convert.ToDecimal(lineData[COL_PRICE]);
    }
}

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Drag and drop OLE DB Destination onto the Data Flow tab. Connect the green arrow from Script Component to OLE DB Destination. Double-click OLE DB Destination to open OLE DB Destination Editor.

On the Connection Manager page of the OLE DB Destination Editor, perform the following actions.

  • Select Sora from OLE DB Connection Manager
  • Select Table or view - fast load from Data access mode
  • Select [dbo].[ProductListPrice] from Name of the table or the view
  • Click Mappings page

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Click Mappings page on the OLE DB Destination Editor would automatically map the columns if the input and output column names are same. Click OK.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Data Flow tab should look something like this after configuring all the components.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Execute the query select * from dbo.ProductListPrice in the SQL Server Management Studio (SSMS) to find the number of rows in the table. It should be empty before executing the package.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Execute the package. You will notice that the package successfully processed 9 rows. The flat file contains 10 lines but the first row is header with column names.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Execute the query select * from dbo.ProductListPrice in the SQL Server Management Studio (SSMS) to find the 9 rows successfully inserted into the table. The data should match with flat file data.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

The above example illustrated how to manually split the data using Script Component because the Flat File Connection Manager encounters error when configured the column delimiter Ç

Issue Simulation:

This example shows a separate Flat File Connection Manager configured with column delimiter Ç, which executes but encounters a warning and does not process any lines.

Right-click on the Connection Managers and click New Flat File Connection... to create a connection to read the flat file. On the General page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Connection manager name to ProductListPrice_Cedilla
  • Set Description to Flat file connection manager with Cedilla column delimiter.
  • I have the file in the path C:\Siva\StackOverflow\Files\6268205\ProductListPrice.txt Select the flat file path.
  • Select {LF} from Header Row Delimiter
  • Check Column names in the first data row
  • Click Columns page

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Columns page of the Flat File Connection Manager Editor, perform the following actions:

  • Set Row delimiter to {LF}
  • The column delimiter field may be disabled. Click Reset Columns
  • Set Column delimiter to Ç
  • Click Advanced page

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions:

  • Set the Name to ProductId
  • Set the ColumnDelimiter to Ç
  • Set the DataType to Unicode string [DT_WSTR]
  • Set the Length to 30
  • Click column ListPrice

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

On the Advanced page of the Flat File Connection Manager Editor, perform the following actions:

  • Set the Name to ListPrice
  • Set the ColumnDelimiter to {LF}
  • Set the DataType to numeric [DT_NUMERIC]
  • Set the DataPrecision to 12
  • Set the DataScale to 2
  • Click OK

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Drag and drop a Data Flow task onto the Control Flow tab and name it as File to database - With Cedilla delimiter. Disable the first data flow task.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Configure the second data flow task with Flat File Source and OLE DB Destination

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Double-click the Flat File Source to open Flat File Source Editor. On the Connection Manager page of the Flat File Source Editor, select the Flat File Connection Manager ProductListPrice_Cedilla and click Columns page to configure the columns. Click OK.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Execute the package. All the components will display green color to indicate that the process was success but no rows will be processed. You can see that there are no rows numbers indication between the Flat File Source and OLE DB Destination

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?

Click the Progress tab and you will notice the following warning message.

[Read flat file [1]] Warning: The end of the data file was reached while 
reading header rows. Make sure the header row delimiter and the number of 
header rows to skip are correct.

Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?


Answer above seems awfully complicated, just convert the line endings in the file

Dim FileContents As String = My.Computer.FileSystem.ReadAllText("c:\Temp\UnixFile.csv")

Dim NewFileContents As String = FileContents.Replace(vbLf, vbCrLf)

My.Computer.FileSystem.WriteAllText("c:\temp\WindowsFile.csv", NewFileContents, False, New System.Text.UnicodeEncoding)

Rehashed from here


This issue also arises if you are trying to consume FlatFile generated on a different platform like Unix, Mac etc via SSIS on windows

In such a scenario all you need to do is convert the file format from say UNIX to DOS with unix2dos command

unix2dos file-to-convert
0

精彩评论

暂无评论...
验证码 换一张
取 消