Load SQL Server from .NET Much Faster Than With Insert Statements

SQL Server is a developer's friend--it is everywhere, it is easy to use, there are tons of resources available for it, and you can distribute a free version with your apps. Odds are SQL Server is either your primary database platform, or you have some data in it and work with it fairly regularly.

One of the most common questions I hear is: "How can I get data into SQL Server?"

Most of the time the answer lines somewhere between creating insert statements for a few records or developing a DTS package to handle a file import. But every client has different needs, and sometimes you run across a situation where you need a different tool in the toolbox.

A client I work for needed a robust solution to get a lot of data into SQL Server while providing detailed reporting on the process, high availability of the database, and a good deal of fault tolerance. Due to some environmental issues, DTS was not feasible. As a result, I needed a fast way to get a lot of data (hundreds of thousands of rows per load) into the database with essentially just .NET.

Luckily, I happened on the SqlBulkCopy class in System.Data.SqlClient. SqlBulkCopy abstracts the BCP utility that has been around in SQL since version 7. The best part--this class, via BCP can bypass standard SQL overhead like logging, triggers, etc. When loading the data without executing any constraints is acceptable--i.e. when you really just want to bulk load the table--SqlBulkCopy is unbeatable.

The process couldn't be simpler. You just need to load a datatable with data, create a SqlBulkCopy class, set up the column mappings, and execute the copy. Loading up the datatable is a great method because there are many, many ways in .NET to get data into a datatable natively, and you can get data there by hand if necessary (manually adding rows).

Here is some example code. Pass this method a datatable, set up the connection string, and you will get great performance!


Private Sub BcpRecordsToSqlServer(ByRef MyTable As DataTable)

Dim s As New SqlBulkCopy(SQLConnectionString, SqlBulkCopyOptions.Default)

s.DestinationTableName = "MyDestinationTableName"
s.BulkCopyTimeout = 36000

For Each field As DataColumn In MyTable.Columns
s.ColumnMappings.Add(field.ColumnName, field.ColumnName)
Next

s.WriteToServer(MyTable)
s.Close()

End Sub



A few notes about this code. First, you can set a property on the class initialization to determine if triggers will fire or not, etc. Second, you can set a time-out on the copy. Third, I am simply looping through the column in the datatable and setting them to output to the same column in the target table--assuming that you set up the input columns to be named whatever the target table's columns are. This is nice because you only have to populate the fields you want in the destination table.

This process is great for reading chunks of data and uploading them to SQL Server. I originally ran a process pulling data and uploading via individual insert statements. I stopped this process after 6 hours. In contrast, using BCP I was able to upload 100,000 rows in under 10 minutes!

Clearly insert statements versus BCP is an apples to oranges comparision--if data integrity is not certain from your data source, you may find BCP if the wrong choice because constraints are bypassed. These statements are not logged, so you cannot recover them in a partial back-up or log shipping situation. Also, you cannot run the BCP load in a transaction.

But, if you are able to scrub the data first, or even scrub the data as you load the datatable, or if you are certain the data is acceptable, you cannot beat the raw speed of using this process to insert into SQL Server from .NET.

Under certain circumstances, this is an invaluable tool to have in the toolbox!

Good luck!
John Tabernik

0 comments:

Post a Comment