Performance Problem When Using OR In A JOIN

I recently had a developer come to me with a poorly performing query.  As you will see below, it’s a fairly straight forward query, consisting of one table with one join and a simple WHERE clause.  The problem was that this query was taking over 11 minutes to run, and only returned about 40,000 results.  Let’s take a look at the query.

Note: The queries and execution plans below have been run through the “Anonymize” feature of Sentry One Plan Explorer, and then tweaked a little by me for better readability.  I don’t really name my tables things like “object2”.

A quick look at the execution plan in Plan Explorer shows us where the problem is.

Bad Execution Plan

Wait, what?!?  SQL Server is taking 628,000 rows and putting them into a spool of over 7 BILLION rows!  And when I hovered over that data stream it showed that we had an actual data size of 112GB!  Houston, I think we found the problem…something in the JOIN between our two tables is causing SQL Server to have to do a ton of work.  So, let’s take another look at that query.

Looking at our JOIN, we can see there is an OR statement in the join condition.  This smells funny.  That condition is basically telling SQL Server to do try to do two separate things, something along the lines of:

This is confusing to SQL Server.  What could be looked at as 2 separate pieces of logic is being crammed into one, which causes it to create a crazy 7 BILLION row, 112GB table spool.

The fix for this would be to separate that JOIN into two joins, and since this was a small query I decided to do that with a UNION.  Here is my updated query:

Both queries in the UNION are the same, except for the JOIN statements, which are just the two parts of the original JOIN now being run separately.  As an added bonus, since I used a UNION we no longer need DISTINCT in the query. (I hate DISTINCT!)  Now when we check the execution plan, we get the following:

Better Execution Plan

MUCH BETTER!  We no longer have that crazy table spool, and our query execution time dropped from 11 minutes to 2oo milliseconds.  I would call that a win!  Even though there are still some indexing improvements that could be made to the query, and a couple of warnings to look into, we now get a much simpler, more efficient execution plan by removing the OR from the JOIN condition.

 

(Visited 26,520 times, 5 visits today)