본문 바로가기
MS-SQL

마이그레이션 대용량 sqlbulkcopy

by 캡틴노랑이 2015. 7. 27.
반응형

http://www.sqlteam.com/article/use-sqlbulkcopy-to-quickly-load-data-from-your-client-to-sql-server

 

The .NET Framework 2.0 introduces a very handy new class in the System.Data.SqlClient namespace called SqlBulkCopy that makes it very easy and efficient to copy large amounts of data from your .NET applications to a SQL Server database. You can even use this class to write a short .NET application that can serve as a "middleman" to move data between database servers.

 

Let's take a look at some examples using the AdventureWorks database. (Download AdventureWorks)

The Destination Database

First, choose or create a SQL database to which we will bulk copy our results. We'll call that database the "destination database" from now on. In the destination database, create the following table which will be the recipient of our bulk copy operations:

create table SalesCopy
    (
            SalesOrderID int not null,
            SalesOrderDetailID int not null,
            CarrierTrackingNumber nvarchar(25) null,
            OrderQty smallint not null,
            ProductID int not null,
            SpecialOfferID int not null,
            UnitPrice money not null,
            UnitPriceDiscount money not null,
            LineTotal money not null,
            ModifiedDate datetime not null,
            primary key (SalesOrderID, SalesOrderDetailID)
     )

And, for comparison purposes, create the following stored procedure which we will use to copy data using standard INSERT statements:

create proc SalesCopyInsert
        @SalesOrderID int, 
 @SalesOrderDetailID int,
        @CarrierTrackingNumber nvarchar(25), 
 @OrderQty smallint, 
        @ProductID int,
        @SpecialOfferID int, 
        @UnitPrice money, 
        @UnitPriceDiscount money, 
        @LineTotal money, 
        @ModifiedDate datetime
    as
insert into SalesCopy (SalesOrderID, SalesOrderDetailID,
         CarrierTrackingNumber, OrderQty, ProductID,
         SpecialOfferID, UnitPrice, UnitPriceDiscount, 
         LineTotal, ModifiedDate)
values (@SalesOrderID, @SalesOrderDetailID,
        @CarrierTrackingNumber, @OrderQty, @ProductID,
        @SpecialOfferID, @UnitPrice, @UnitPriceDiscount, 
        @LineTotal, @ModifiedDate)

The Sample Application

Next, create a new C# console application and add "using" references to the System.Data and System.Data.SqlClient namespaces. (You can also use VB.NET, if you prefer, with only minor modifications to the sample code - left as an exercise for the reader, of course!)

Our goal will be to first fill up a local DataTable with 50,000 rows from the SalesOrderDetail table in the AdventureWorks database, and then we'll use some different methods to copy those rows to our destination database. Remember that where the data comes from is really irrelevant to this topic; we are focusing on exporting data from a fully populated .NET DataTable to a SQL Server database, not on filling up that DataTable.

Let's start in your application's Main() method. Add in the following code, substituting in your actual connection strings where indicated:

static void Main(string[] args)
        {
            // establish our connections:

            SqlConnection source = new SqlConnection( <source connect="" string="">);
            SqlConnection dest = new SqlConnection( <destination connect="" string=""> );
            
            source.Open();
            dest.Open();

            // fill up a data table with 50,000 rows from AdventureWorks:

            DataTable tmp = new DataTable();
            SqlCommand cm = new SqlCommand(@"
                select top 50000
                    SalesOrderID, SalesOrderDetailID,
                 CarrierTrackingNumber, OrderQty, ProductID,
                 SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, ModifiedDate 
               from 
                   Sales.SalesOrderDetail", source);

            Console.Write("Loading sql data from AdventureWorks...");
            SqlDataAdapter da = new SqlDataAdapter(cm);
            da.Fill(tmp);
            Console.WriteLine("done.");

            // ensure that our destination table is empty:

            new SqlCommand("delete from SalesCopy", dest).ExecuteNonQuery();

            DateTime start = DateTime.Now;
            Console.WriteLine("Beginning Copy ....");

            // copy the data:
            CopyData(tmp, dest);

            Console.WriteLine("Copy complete in {0}  seconds.", DateTime.Now.Subtract(start).Seconds);
            Console.ReadLine();

            // close connections:

            source.Close();
            dest.Close();
     }</destination></source>

In the code above, we simply open our database connections, create a sqlCommand that pulls 50,000 rows from the SalesOrderDetail table, and use a SqlDataAdapter to execute the command and put the results into a DataTable. Once all of this is executed, we have things "ready to go" and we can call variations of a CopyData() method (shown below) to demonstrate different ways of bulk copying data to our destination database.

Using row-by-row INSERTS to copy data

Before .NET 2.0 and SqlBulkCopy, to move rows from a local DataTable to a SQL database you'd need to copy the rows one by one by executing an INSERT statement for each. Let's write some code to do that, using a parameterized SqlCommand that calls the SalesCopyInsert stored procedure that we created earlier for each row in our DataTable. In addition, for every 10,000 rows copied we will report our progress by displaying the total number of rows copied.

Add the following static method to your application:

static void CopyData(DataTable sourceTable, SqlConnection destConnection)
{
    // old method: Lots of INSERT statements

    int rowscopied = 0;
            
    // first, create the insert command that we will call over and over:
    using (SqlCommand ins = new SqlCommand("SalesCopyInsert", destConnection))
    {
        ins.CommandType = CommandType.StoredProcedure;
        ins.Parameters.Add("@SalesOrderID", SqlDbType.Int);
        ins.Parameters.Add("@SalesOrderDetailID", SqlDbType.Int);
        ins.Parameters.Add("@CarrierTrackingNumber", SqlDbType.NVarChar);
        ins.Parameters.Add("@OrderQty", SqlDbType.Int);
        ins.Parameters.Add("@ProductID", SqlDbType.Int);
        ins.Parameters.Add("@SpecialOfferID", SqlDbType.Int);
        ins.Parameters.Add("@UnitPrice", SqlDbType.Money);
        ins.Parameters.Add("@UnitPriceDiscount", SqlDbType.Money);
        ins.Parameters.Add("@LineTotal", SqlDbType.Money);
        ins.Parameters.Add("@ModifiedDate", SqlDbType.DateTime);

        // and now, do the work:
        foreach (DataRow r in sourceTable.Rows)
        {
            for (int i = 0; i < 10; i++)
                ins.Parameters[i].Value = r[i];

            ins.ExecuteNonQuery();
            if (++rowscopied % 10000 == 0)
                Console.WriteLine("-- copied {0} rows.", rowscopied);

        }
    }
}

As you can see, there's nothing special here, it's a lot of code and a lot of work. Running this on my PC, it took 17 seconds to copy 50,000 rows from the DataTable to the destination table.

Next, we will compare both the code and performance of that technique with using SqlBulkCopy.

Using SQLBulkCopy

Let's replace the previous CopyData() method by building a new one that uses the SqlBulkCopy class:

static void CopyData(DataTable sourceTable, SqlConnection destConnection)
{
    // new method: SQLBulkCopy:
    using (SqlBulkCopy s = new SqlBulkCopy(destConnection))
    {
        s.DestinationTableName = "SalesCopy";
        s.NotifyAfter = 10000;
        s.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied);
        s.WriteToServer(sourceTable);
        s.Close();
    }
}

The code is very short and easy to follow. Let's examine it, step by step.

First, we simply create a new instance of a SqlBulkCopy object and specify via the constructor the destination connection to use. Then, we set the DestinationTableName property equal to the table that we will be bulk copying into.

Since we want to be notified every 10,000 rows as the copy progresses, we simply set the NotifyAfter property equal to 10000 and assign an event handler to the SqlRowsCopied event.

Now that everything is all set up, we simply call the WriteToServer() method, specifying the DataTable to copy, which begins the actual bulk copy process.

Here's the code for the s_SqlRowsCopied method called by the SqlRowsCopied event, it's nothing fancy, we just output the number of rows copied so far which is conveniently provided as the RowsCopied property of the SqlRowsCopiedEventArgs variable.

static void s_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e)
{
    Console.WriteLine("-- Copied {0} rows.", e.RowsCopied);
}

Note that you have access to an "Abort" property in the SqlRowsCopiedEventArgs object that allows you to abort the bulk copy any time this event is raised.

Running the application now resulted in an execution time of only 4 seconds to copy all 50,000 rows! That's more than 4 times as fast as individual INSERTs, and the code is much shorter and cleaner! That's a great improvement on all counts.

Mapping Columns

By default, the SqlBulkCopy class maps matching column names from the Source table to the Destination table, and the copy will not succeed if there are any mismatched columns between the two. Thus, you may encounter errors or problems if your source DataTable is sufficiently different from the destination table. To accommodate this, another option provided by SqlBulkCopy is to specify ColumnMappings to use when copying the data. Note that you must map out all of the columns, not just those that differ, when using this feature; once you indicate at least one ColumnMapping, any columns not explicitly mapped will be ignored.

Let's create an alternate destination table with differing columns from our DataTable:

create table SalesCopy2
(
 ID int not null,
 TrackingNumber nvarchar(25) null,
)

To bulk copy our DataTable to the SalesCopy2 table, we would use the ColumnMappings property of the SqlBulkCopy object like this:

static void CopyData(DataTable sourceTable, SqlConnection destConnection)
{
    // column mappings example:

    using (SqlBulkCopy s = new SqlBulkCopy(destConnection))
    {
        s.DestinationTableName = "SalesCopy2";
        s.NotifyAfter = 10000;
        s.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied);

        s.ColumnMappings.Add("SalesOrderID", "ID");
        s.ColumnMappings.Add("CarrierTrackingNumber", "TrackingNumber");

        s.WriteToServer(sourceTable);
        s.Close();
    }
}

There are various overrides of the Add() method available, but the one shown here allows us to specify the source column name and the matching destination column name. In this example, we are mapping our source DataTable's "SalesOrderID" column to the destination table's "ID" column, and also the "CarrierTrackingNumber" column to "TrackingNumber" column. All other columns in our DataTable are ignored and not exported.

Copying only updated rows

Another very useful feature is that we can bulk copy rows from a DataTable based on their rowState, such as rows that have been modified or not modified since the DataTable was first populated. To do this, there is an override of the WriteToServer() method that accepts both a DataTable and a rowState enumeration. Thus, if we were altering the contents of our DataTable in our application, and then wanted to only output rows that have been modified, we could simply call the WriteToServer() method like this:

s.WriteToServer(sourceTable, DataRowState.Modified);

You can also specifically pass in an array of DataRow objects to explicitly indicate a set of rows to copy.

Using a DataReader to copy rows Server-to-Server

In addition to bulk copying DataTables, we can also efficiently and easily move data directly from one database to another by opening up a DataReader at the source database passing the reader directly to a SqlBulkCopy's WriteToServer() method, bypassing the need for a DataTable completely.

Here's an example application that bypasses the DataTable completely to copy 50,000 rows from the SalesOrderDetail in AdventureWorks directly to our destination table:

static void Main(string[] args)
{           
    SqlConnection source = new SqlConnection(<source connect="" string="">);
    SqlConnection dest = new SqlConnection(<dest connect="" string="">);
            
    source.Open();
    dest.Open();

    SqlCommand sourceCommand = new SqlCommand(@"
          select top 50000
              SalesOrderID, SalesOrderDetailID,
              CarrierTrackingNumber, OrderQty, ProductID,
              SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, ModifiedDate 
          from 
              Sales.SalesOrderDetail", source);

    // ensure that our destination table is empty:
    new SqlCommand("delete from SalesCopy", dest).ExecuteNonQuery();

    DateTime start = DateTime.Now;
    Console.WriteLine("Beginning Copy ....");

    // using SqlDataReader to copy the rows:
    using (SqlDataReader dr = sourceCommand.ExecuteReader())
    {
        using (SqlBulkCopy s = new SqlBulkCopy(dest))
        {
            s.DestinationTableName = "SalesCopy";
            s.NotifyAfter = 10000;
            s.SqlRowsCopied += new SqlRowsCopiedEventHandler(s_SqlRowsCopied);
            s.WriteToServer(dr);
            s.Close();
        }
    }

    Console.WriteLine("Copy complete in {0}  seconds.", DateTime.Now.Subtract(start).Seconds);
    Console.ReadLine();

    // close connections:

    source.Close();
    dest.Close();
}</dest></source>

If there is no need to store the data locally in a DataTable, this is the most efficient way to bulk copy data between SQL Servers using .NET that I have seen so far.

Note that the source DataReader does not have to be a SqlDataReader; it can be any class that supports the IDataReader interface. This means that you can use SqlBulkCopy to efficiently move data from Oracle, Access, or any other OLEDB/ODBC datasource directly to a SQL Server table.

Conclusion

Well, I hope that has given you a nice overview of the SqlBulkCopy class. It is very easy to use, quite flexible, and it operates very efficiently. It is a great way to quickly populate a SQL Server table from your .NET applications.

- See more at: http://www.sqlteam.com/article/use-sqlbulkcopy-to-quickly-load-data-from-your-client-to-sql-server#sthash.P39bkH1i.dpuf 

 

반응형

댓글