I have 1M html files that I need to parse and then insert the extracted information into my sql server. Each file parsed out information end up in multiple tables due to relationships among the objects I have parsed out
I am using Entity Framework right now to do this but adding each piece of my information to the proper object on the EF context takes a long time and not efficient! I need this faster especially that I have so many file to process.
What is the fasted way to parse out a lot of file in parallel and insert it in SQL server where items you are adding ha开发者_开发技巧ve relationships?
Also, is there a better technology for this? Like Informatica?
I think SqlBulkCopy Class will be the best option in this case.
You can make a generic wrapper around SqlBulkCopy
class, which will allow you to use SqlBulkCopy
on any entity. Below is the wrapper for LINQ-to-SQL, but the same idea will work with Entity Framework, with the assumption that your entity mapped to tables one-to-one.
public void BulkInsert<TBusinessObject>(IEnumerable<TBusinessObject> entities, int timeoutInSeconds)
where TBusinessObject : class, IBusinessObject
{
AssertUtilities.ArgumentAllNotNull(entities, "entities");
AssertUtilities.ArgumentNotNegative(timeoutInSeconds, "timeoutInSeconds");
var metaTable = Mapping.GetTable(typeof(TBusinessObject));
if (metaTable == null)
throw new DataAccessException("MetaTable is not found.");
var insertDataMembers = metaTable.RowType.PersistentDataMembers
.Where(arg => !arg.IsDbGenerated)
.OrderBy(arg => arg.Ordinal)
.ToList();
using (var dataTable = new DataTable())
{
dataTable.Locale = CultureInfo.InvariantCulture;
var dataColumns = insertDataMembers
.Select(arg => new DataColumn(arg.MappedName))
.ToArray();
dataTable.Columns.AddRange(dataColumns);
foreach (var entity in entities)
{
var itemArray = insertDataMembers
.Select(arg => arg.StorageAccessor.GetBoxedValue(entity))
.ToArray();
dataTable.Rows.Add(itemArray);
}
try
{
if (Connection.State != ConnectionState.Open)
Connection.Open();
var sqlConnection = (SqlConnection)Connection;
var sqlTransaction = (SqlTransaction)Transaction;
using (var bulkCopy = new SqlBulkCopy(sqlConnection, SqlBulkCopyOptions.Default, sqlTransaction))
{
bulkCopy.BulkCopyTimeout = timeoutInSeconds;
bulkCopy.DestinationTableName = metaTable.TableName;
foreach (var dataColumn in dataColumns)
bulkCopy.ColumnMappings.Add(dataColumn.ColumnName, dataColumn.ColumnName);
bulkCopy.WriteToServer(dataTable);
}
}
catch (Exception exception)
{
throw DataAccessExceptionTranslator.Translate(exception);
}
}
}
精彩评论