Performance of the SQL MERGE vs. INSERT/UPDATE
MERGE is designed to apply both UPDATE and
INSERTs into a target table from a source table. The statement can do
both at once, or simply do INSERTs or only UPDATEs. One might even get
the impression that INSERT and UPDATE are no longer needed. Why not
always use MERGE?
MERGE can also do DELETEs from the target table,
but for today we won’t explore that case.
Sample
Data and Some Basic Examples
To illustrate our case, let’s set up some very
simplistic source and target tables, and populate them with some data that we
can demonstrate with.
CREATE
TABLE #Target
(
ID BIGINT PRIMARY KEY
, Value INT
);
CREATE
TABLE #Source
(
ID BIGINT PRIMARY KEY
, Value INT
);
INSERT
INTO #Target
VALUES ( 1, 2342 ),
( 2, 345 );
INSERT
INTO #Source
VALUES ( 1, 975 ),
( 3, 683 );
When we MERGE into #Target, our matching
criteria will be the ID field, so the normal case is to UPDATE like IDs and
INSERT any new ones like this:
--
Standard MERGE of all #Source rows into #Target
MERGE
#Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
SELECT
*
FROM #Target;
This produces quite predictable results that
look like this:
ID Value
1 975
2 345
3 683
Let’s change the values in our #Source table,
and then use MERGE to only do an UPDATE.
--
Change the values of our source rows
UPDATE
#Source
SET Value = CASE ID
WHEN 1 THEN 555
WHEN 3 THEN 999
END
--
MERGE that only updates
MERGE
#Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value;
SELECT
*
FROM #Target;
The results now in #Target are:
ID Value
1 555
2 345
3 999
Finally, we know we can also use MERGE to
replace INSERT by omitting the MATCHED clause. Let’s INSERT a new row
into #Source and do this.
--
Insert a new row into our source
INSERT
#Source
VALUES ( 4, 242 );
--
MERGE that only inserts
MERGE
#Target t
USING #Source s
ON s.ID = t.ID
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
SELECT
*
FROM #Target;
Unsurprisingly, the results now in #Target are
these:
ID Value
1 555
2 345
3 999
4 242
Sorry if you know all this stuff and I’ve bored
you, but we needed to get these basics out of the way.
Exploring
the Performance of MERGE
To effectively test the performance of our
alternatives, we’ll need to set up a test harness with a non-trivial number of
rows in our #Source and #Target tables. You can open the Test
Harness 1.sql file in the resources section of this article and follow
along. The basic set-up data is as follows.
TRUNCATE
TABLE #Source;
TRUNCATE
TABLE #Target;
WITH
Tally ( n )
AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER
BY ( SELECT NULL
) )
FROM sys.all_columns a
CROSS JOIN
sys.all_columns b
)
INSERT INTO #Target
SELECT 2 * n
, 1 +
ABS(CHECKSUM(NEWID())) % 1000
FROM Tally;
WITH
Tally ( n )
AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER
BY ( SELECT NULL
) )
FROM sys.all_columns a
CROSS JOIN
sys.all_columns b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2 * n -
1
ELSE 2 * n
END
, 1 +
ABS(CHECKSUM(NEWID())) % 1000
FROM Tall
We’ve purposely set up our source table so that
the INSERTs it will do when merged with the target are interleaved with
existing records for the first 500,000 rows.
Using SQL Profiler, we’ll compare two identical
query scripts:
--
MERGE
MERGE
#Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
--
TRUNCATE and re-populate Source and Target tables
--
UPDATE/INSERT
BEGIN
TRANSACTION T1;
UPDATE
t
SET Value = s.Value
FROM #Target t
JOIN #Source s
ON s.ID = t.ID;
INSERT
INTO #Target
SELECT s.ID
, s.Value
FROM #Source s
LEFT JOIN #Target t
ON s.ID = t.ID
WHERE t.ID IS NULL;
COMMIT
TRANSACTION T1;
Both of these INSERT 500,000 rows and UPDATE
500,000 rows, the latter enclosed in a TRANSACTION. We omit error
handling and possible ROLLBACK of the transaction for simplicity. The
results that we obtained running the test harness in SQL Profiler (5 runs) are
as follows:
Query
CPU Reads Writes Duration
MERGE
4492 4513786 2578 4864
INSERT/UPDATE 3588 3072489
5496 3847
MERGE
4820 4514303 2899 5253
INSERT/UPDATE 3572 3072979
4007 4035
MERGE
4462 4513793 2571 4770
INSERT/UPDATE 3635 3072463
5497 3829
MERGE
4524 4513795 2931 4800
INSERT/UPDATE 3588 3072474
5505 3665
MERGE
4648 4513814 2939 4955
INSERT/UPDATE 3479 3072491
5522 3716
These indicate that MERGE took about 28% more
CPU and 29% more elapsed time than the equivalent INSERT/UPDATE. Not
surprising considering all the complexity that MERGE must handle, but possibly
forgivable for the additional safety, convenience and simplicity it represents
(no need for a TRANSACTION with error handling).
Using
MERGE as a Substitute for INSERT
To test the performance of MERGE as a substitute
for INSERT, the same test harness that set up the #Target table can be used,
but we’ll change the set up for the #Source table as follows.
WITH Tally (n) AS
(
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER
BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns
b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END
,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
A quick check of the row counts generated from Test
Harness #2.sql confirms that both MERGE and INSERT insert exactly
1,000,000 rows. The results from a Profiler trace using these queries
give us this:
Query
CPU Reads Writes Duration
MERGE
5054 6024150 2397 5576
INSERT 4992
6248001 7563 5507
MERGE
5226 6024165 3868 5529
INSERT 5383
6248005 7571 6298
MERGE
5257 6023557 3689 5473
INSERT 4851
6247403 7431 5546
MERGE
5273 6023589 2167 5662
INSERT 4914
6247440 7427 5281
MERGE
5179 6024619 1426 5476
INSERT 5039
6248483 6211 5954
In this case, CPU and elapsed time are probably
too close to call. CPU usage was about 3% more for the MERGE and elapsed
time was about 3% less for the MERGE.
Using
MERGE as a Substitute for UPDATE
Once again, to test this we’ll change the test
harness only for the #Source table so that it creates the rows with exactly the
same IDs as the #Target table.
WITH Tally (n) AS
(
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER
BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns
b
)
INSERT INTO #Source
SELECT 2*n
,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
After confirming that both MERGE and UPDATE
update 1,000,000 rows, our Profile trace from running Test Harness #3.sql
gave us these results.
Query
CPU Reads Writes Duration
MERGE
1903 7982 2568 2010
UPDATE 1763
7954 2568 1840
MERGE
1904 7986 2576 2303
UPDATE 1809
7955 2560 1974
MERGE
1903 7968 2576 1951
UPDATE 1763
7940 2568 2005
MERGE
1918 7957 2568 2009
UPDATE 1731
7464 2584 1809
MERGE
1903 8005 2560 2023
UPDATE 1732
7977 2584 2063
Our MERGE was just about 8% more costly in CPU
and 6% more in elapsed time. The advantage probably goes to the UPDATE
for this case.
Conclusions
While the MERGE statement is a wonderfully
useful query syntax that does offer some safety features over the traditional
approach of UPDATE then INSERT, it does come with a slight performance
penalty. Certainly the performance characteristics may vary depending on
your special circumstances, so this article should only be treated as a
guideline. While I will probably continue to use it simply for the
convenience it offers in the INSERT/UPDATE case, it is unlikely I’d consider
using it as a replacement either for INSERT or for UPDATE.
No comments:
Post a Comment