This is one of my favorite SQL tricks.

Maybe you’re building an ETL process, like loading a file, or need to compare two tables? How would you write that update?

One of the daunting parts of writing updates, especially with a large number of columns, is figuring out which records actually changed, and only updating those records.

Of course, there’s always more than one way to bake a cake.


One method is to compare each column in the WHERE clause separating each comparison with an OR

UPDATE c
    SET c.FirstName   = u.FirstName,
        c.LastName    = u.LastName,
        c.MiddleName  = u.MiddleName,
        c.DateOfBirth = u.DateOfBirth
FROM #Customer c
    JOIN #Updates u ON u.CustomerID = c.CustomerID
WHERE c.FirstName    <> u.FirstName
    OR c.LastName    <> u.LastName
    OR c.MiddleName  <> u.MiddleName
    OR c.DateOfBirth <> u.DateOfBirth;

This works fine, as long as every column isn’t nullable. But what if MiddleName and DateOfBirth allows NULLs?

You could do something like this…

UPDATE c
    SET c.FirstName   = u.FirstName,
        c.LastName    = u.LastName,
        c.MiddleName  = u.MiddleName,
        c.DateOfBirth = u.DateOfBirth
FROM #Customer c
    JOIN #Updates u ON u.CustomerID = c.CustomerID
WHERE c.FirstName <> u.FirstName
    OR c.LastName <> u.LastName
    OR CASE WHEN c.MiddleName = u.MiddleName                     THEN 0
            WHEN c.MiddleName  IS NULL AND u.MiddleName  IS NULL THEN 0
            ELSE 1 END = 1
    OR CASE WHEN c.DateOfBirth = u.DateOfBirth                   THEN 0
            WHEN c.DateOfBirth IS NULL AND u.DateOfBirth IS NULL THEN 0
            ELSE 1 END = 1;

This works…but it is hard to read, and now you need to keep track of which columns are nullable and which ones aren’t. What happens when LastName is changed to allow NULL? The update is no longer correct, and needs to be fixed.

This is where my favorite trick comes in; Using the EXISTS operator and the EXCEPT set operator to identify changed rows.


The Basics - How EXCEPT works

The EXCEPT set operator compares two sets of records, and returns all of the records from the first set that don’t have a matching record in the second set.

The most basic examples would be:

-- Returns nothing
SELECT 1, NULL
EXCEPT
SELECT 1, NULL;

-- Returns NULL
SELECT NULL
EXCEPT
SELECT 1;

-- Returns 1
SELECT 1
EXCEPT
SELECT NULL;

The first example returns nothing because the two sets match, but the following two examples return records from the first set because it was unable to find any matching records in the second set.

The other thing to note is that the EXCEPT operator treats the comparison of NULL values as equal. Unlike standard comparison operators. It’s this difference that helps us use it to find changed rows.


Let’s set up some sample data:

IF OBJECT_ID('tempdb..#Customer','U') IS NOT NULL DROP TABLE #Customer; --SELECT * FROM #Customer
CREATE TABLE #Customer (
    CustomerID  int         NOT NULL PRIMARY KEY,
    FirstName   varchar(50) NOT NULL,
    MiddleName  varchar(50)     NULL,
    LastName    varchar(50) NOT NULL,
    DateOfBirth date            NULL,
);

INSERT INTO #Customer (CustomerID, FirstName, MiddleName, LastName, DateOfBirth)
VALUES ( 1, 'Sheldon'   , 'Dennis'  ,'Saunders'     , '2019-12-10')
    ,  ( 2, 'Barry'     , NULL      ,'Richardson'   , '1990-09-29')
    ,  ( 3, 'Rosa'      , 'Evelyn'  ,'Rodriquez'    , '1974-09-11')
    ,  ( 4, 'Dwayne'    , NULL      ,'Neal'         , '1997-01-26')
    ,  ( 5, 'Jane'      , NULL      ,'Green'        , '1977-01-13')
    ,  ( 6, 'Margaret'  , NULL      ,'Rodriguez'    , '1991-06-08')
    ,  ( 7, 'Chris'     , 'Stephen' ,'King'         , '1982-11-15')
    ,  ( 8, 'Joe'       , NULL      ,'Smith'        , '1972-09-18')
    ,  ( 9, 'Paul'      , NULL      ,'Ramirez'      , '1971-02-20')
    ,  (10, 'Amanda'    , 'Beverly' ,'White'        , '2013-04-28');

Here we’ve got some sample data…We have a customer table, where we store the customers first, middle and last name, and their birth date. Note that MiddleName and DateOfBirth allow NULL.

Now lets create a new table where we can make modifications to the data for us to sync back to the original #Customer table:

IF OBJECT_ID('tempdb..#Updates','U') IS NOT NULL DROP TABLE #Updates; --SELECT * FROM #Updates
SELECT c.CustomerID, c.FirstName, c.MiddleName, c.LastName, c.DateOfBirth
INTO #Updates
FROM #Customer c;

UPDATE #Updates SET LastName    = 'Brown'      WHERE CustomerID = 5; -- Change Last Name
UPDATE #Updates SET MiddleName  = 'John'       WHERE CustomerID = 9; -- Add Middle Name
UPDATE #Updates SET MiddleName  = NULL         WHERE CustomerID = 3; -- Remove Middle Name
UPDATE #Updates SET DateOfBirth = '1990-09-22' WHERE CustomerID = 2; -- Change DateOfBirth

-- Add new Customer
INSERT INTO #Updates (CustomerID, FirstName, MiddleName, LastName, DateOfBirth)
VALUES (11, 'Chad', NULL, 'Baldwin', '1990-01-12');

Now we have a copy of the #Customer table named #Updates, and we’ve made a few changes to the data.


Let’s use EXISTS and EXCEPT to find all records which changed…

SELECT *
FROM #Customer c
    JOIN #Updates u ON u.CustomerID = c.CustomerID
WHERE EXISTS (
    SELECT c.FirstName, c.MiddleName, c.LastName, c.DateOfBirth
    EXCEPT
    SELECT u.FirstName, u.MiddleName, u.LastName, u.DateOfBirth
);

Cool right? This is giving you all records in #Customer which do not have a matching record in #Updates.

To go from that to an update or a merge statement, is fairly simple…

Update

UPDATE c
    SET c.FirstName   = u.FirstName,
        c.MiddleName  = u.MiddleName,
        c.LastName    = u.LastName,
        c.DateOfBirth = u.DateOfBirth
FROM #Customer c
    JOIN #Updates u ON u.CustomerID = c.CustomerID
WHERE EXISTS (
    SELECT c.FirstName, c.MiddleName, c.LastName, c.DateOfBirth
    EXCEPT
    SELECT u.FirstName, u.MiddleName, u.LastName, u.DateOfBirth
);

Merge

MERGE INTO #Customer c
USING #Updates u ON u.CustomerID = c.CustomerID
WHEN MATCHED AND EXISTS (
                    SELECT c.FirstName, c.MiddleName, c.LastName, c.DateOfBirth
                    EXCEPT
                    SELECT u.FirstName, u.MiddleName, u.LastName, u.DateOfBirth
                )
THEN
    UPDATE SET c.FirstName    = u.FirstName,
                c.MiddleName  = u.MiddleName,
                c.LastName    = u.LastName,
                c.DateOfBirth = u.DateOfBirth
WHEN NOT MATCHED BY TARGET
THEN
    INSERT (CustomerID, FirstName, MiddleName, LastName, DateOfBirth)
    VALUES (u.CustomerID, u.FirstName, u.MiddleName, u.LastName, u.DateOfBirth);

What about performance?

If you were to compare query plans between the first method and the EXISTS/EXCEPT method, it appears the latter will generate a slightly more complicated execution plan.

However, I have found that despite this, the EXISTS/EXCEPT method almost always performs better, even with very large workloads. Not only do I consistently see it run faster, but it also requires significantly less reads on the dependent tables.