MySQL Left Join
Even if you think you know everything about LEFT JOIN, I bet you will learn something or two in this post!
- The difference between the ON clause and the WHERE clause.
- A simple way to better understand a complex Matching-Condition with WHERE … IS NULL clause.
- The difference between the Matching-Conditions and the Where-conditions.
A reminder about “A LEFT JOIN B ON conditional_expr”
The ON condition (in the expression “A LEFT JOIN B ON conditional_expr”) is used to decide how to retrieve rows from table B (Matching-Stage).
If there is no row in B that matches the ON condition, an extra B row is generated with all columns set to NULL.
In the Matching-Stage any condition in the WHERE clause is not used. Only after the Matching-Stage, the condition in the WHERE clause will be used. It will filter out rows retrieved from the Matching-Stage.
Lets see a LEFT JOIN example:
mysql> CREATE TABLE `product` ( `id` int(10) unsigned NOT NULL auto_increment, `amount` int(10) unsigned default NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 mysql> CREATE TABLE `product_details` ( `id` int(10) unsigned NOT NULL, `weight` int(10) unsigned default NULL, `exist` int(10) unsigned default NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 mysql> INSERT INTO product (id,amount) VALUES (1,100),(2,200),(3,300),(4,400); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> INSERT INTO product_details (id,weight,exist) VALUES (2,22,0),(4,44,1),(5,55,0),(6,66,1); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM product; +----+--------+ | id | amount | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | | 4 | 400 | +----+--------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM product_details; +----+--------+-------+ | id | weight | exist | +----+--------+-------+ | 2 | 22 | 0 | | 4 | 44 | 1 | | 5 | 55 | 0 | | 6 | 66 | 1 | +----+--------+-------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id); +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | 2 | 22 | 0 | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | 4 | 44 | 1 | +----+--------+------+--------+-------+ 4 rows in set (0.00 sec) |
Is there a difference between the ON clause and the WHERE clause?
A question: Is there a difference in the result set of the following two queries?
1. SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id) AND product_details.id=2; 2. SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id) WHERE product_details.id=2; |
It is best to understand by example:
mysql> SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id) AND product_details.id=2; +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | 2 | 22 | 0 | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | NULL | NULL | NULL | +----+--------+------+--------+-------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id) WHERE product_details.id=2; +----+--------+----+--------+-------+ | id | amount | id | weight | exist | +----+--------+----+--------+-------+ | 2 | 200 | 2 | 22 | 0 | +----+--------+----+--------+-------+ 1 row in set (0.01 sec) |
The first query retrieve all rows from product table while using the ON condition to decide which rows to retrieve from the Left joined product_details table.
The second query is doing simple Left-Join. It filters out rows not matching the WHERE clause conditions.
See more examples:
mysql> mysql> SELECT * FROM product LEFT JOIN product_details ON product.id = product_details.id AND product.amount=100; +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | NULL | NULL | NULL | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | NULL | NULL | NULL | +----+--------+------+--------+-------+ 4 rows in set (0.00 sec) |
All the rows from the product table are retrieved. However, no matching found at product_details table (there is no row that matches the condition product.id = product_details.id AND product.amount=100).
mysql> SELECT * FROM product LEFT JOIN product_details ON (product.id = product_details.id) AND product.amount=200; +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | 2 | 22 | 0 | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | NULL | NULL | NULL | +----+--------+------+--------+-------+ 4 rows in set (0.01 sec) |
All the rows from the product table are retrieved. However, only one matching found at product_details table.
LEFT JOIN with WHERE … IS NULL clause
What is happening if you use the WHERE …. IS NULL clause?
As stated before, the Where-condition stage is happening after the Matching-Stage. This means that the WHERE IS NULL clause will filter out, from the Matching-Stage result, rows that didn’t satisfy the Matching-Condition.
All that is fine on the paper but if you are using more than one condition in the ON clause it is starting to be confusing.
I have developed a simple way to better understand a complex Matching-Condition with WHERE … IS NULL clause by:
- looking at the IS NULL clause as a negation of the Matching-Condition.
- using the Logical rule: !(A and B) == !A OR !B
Look at the following example:
mysql> SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id AND b.weight!=44 AND b.exist=0 WHERE b.id IS NULL; +----+--------+ | id | amount | +----+--------+ | 1 | 100 | | 3 | 300 | | 4 | 400 | +----+--------+ 3 rows in set (0.00 sec) |
Lets examine the Matching clause (ON clause):
(a.id=b.id) AND (b.weight!=44) AND (b.exist=0) |
Remember that we can think of the IS NULL clause as a negation of the Matching-Condition.
This means we will retrieve the following rows:
!( exist(b.id that equals to a.id) AND b.weight !=44 AND b.exist=0 ) !exist(b.id that equals to a.id) || !(b.weight !=44) || !(b.exist=0) !exist(b.id that equals to a.id) || b.weight =44 || b.exist=1 |
Like in a C programing language, the operands of logical-AND and logical-OR expressions are evaluated from left to right. If the value of the first operand is sufficient to determine the result of the operation, the second operand is not evaluated (short-circuit evaluation).
In our case, this means, retrieve all rows in A that don’t have matching id in B PLUS, for the rows that do have matching id in B, retrieve only the ones that have b.weight =44 OR b.exist=1
See another example:
mysql> SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id AND b.weight!=44 AND b.exist=1 WHERE b.id IS NULL; +----+--------+ | id | amount | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | | 4 | 400 | +----+--------+ 4 rows in set (0.00 sec) |
Explanation:
! ( exist(bid that equals to aid) AND b.weight !=44 AND b.exist=1 ) !exist(bid that equals to aid) || !(b.weight !=44) || !(b.exist=1) !exist(bid that equals to aid) || b.weight =44 || b.exist=0 |
The battle between the Matching-Conditions and the Where-conditions
You can get the same results (A.*) if you put only the basic matching condition in the ON clause and the negation of the rest in the Where condition clause.
For example,
Instead of writing:
SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id AND b.weight!=44 AND b.exist=0 WHERE b.id IS NULL; |
You can write:
SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id WHERE b.id is null OR b.weight=44 OR b.exist=1; |
mysql> SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id WHERE b.id is null OR b.weight=44 OR b.exist=1; +----+--------+ | id | amount | +----+--------+ | 1 | 100 | | 3 | 300 | | 4 | 400 | +----+--------+ 3 rows in set (0.00 sec) |
Instead of writing:
SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id AND b.weight!=44 AND b.exist!=0 WHERE b.id IS NULL; |
You can write:
SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id WHERE b.id is null OR b.weight=44 OR b.exist=0; |
mysql> SELECT a.* FROM product a LEFT JOIN product_details b ON a.id=b.id WHERE b.id is null OR b.weight=44 OR b.exist=0; +----+--------+ | id | amount | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | | 4 | 400 | +----+--------+ 4 rows in set (0.00 sec) |
Does these queries really the same?
These queries retrieve the same result set as long as you need only the values from the first table (e.g. A.*). In a case that you are retrieving values from the LEFT JOINed table, the results values are not the same.
As stated before, the condition in the WHERE clause filters out rows retrieved from the Matching-Stage.
For example:
mysql> SELECT * FROM product a LEFT JOIN product_details b ON a.id=b.id AND b.weight!=44 AND b.exist=1 WHERE b.id is null; +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | NULL | NULL | NULL | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | NULL | NULL | NULL | +----+--------+------+--------+-------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM product a LEFT JOIN product_details b ON a.id=b.id WHERE b.id IS NULL OR b.weight=44 OR b.exist=0; +----+--------+------+--------+-------+ | id | amount | id | weight | exist | +----+--------+------+--------+-------+ | 1 | 100 | NULL | NULL | NULL | | 2 | 200 | 2 | 22 | 0 | | 3 | 300 | NULL | NULL | NULL | | 4 | 400 | 4 | 44 | 1 | +----+--------+------+--------+-------+ 4 rows in set (0.00 sec) |
General Note: If you use LEFT JOIN to find rows that do not exist in some table and you have the following test: col_name IS NULL in the WHERE part,
where col_name is a column that is declared as NOT NULL, MySQL stops searching for more rows (for a particular key combination) after
it has found one row that matches the LEFT JOIN condition.
Great post! Any info on how the placement of the conditions (in the ON vs in the WHERE) affect the performance of the query?
Great post Hazan.
The question, I think, is that the ON clause should be used for stating how both table relate and not to filter records.
A slightly different way to look at your first SELECT that might enlighten the problem (maybe :)) is:
SELECT * FROM product LEFT JOIN
(select * from product_details
WHERE product_details.id=2) product_details
ON (product.id = product_details.id);
And we can see more clearly the effect of the ‘product_details.id=2’ predicate.
Again: great post.
[ ]
david
Great post! You explained some very good points on the LEFT JOIN. Well worth the read ?
Hi Michael,
I did not benchmark it but it make sense that it is better to filter at early stages. I think in most cases, it is better to filter in the Matching stage (ON clause) if you can and if it is relevant.
However, be careful, it can make your query become not readable.
Great post!
Just wanted to note that all this applies to any relational database, not just MySQL.
— Peter.
thanks a lot for your great article on left join. I guess, learning to structure correct queries is one of the toughest arts.
Hello,
I have done trivial measurement of performance comparing the use of filtering condition in ON caluse or WHERE clause and the results might surprise you. Query runs approximately 4x faster when the filtering condition is in where clause, even though the expected filtering ratio is about 1/20.
Stats: MySQL 5.1
Cardinality c_m_members: 33 283
Cardinality c_m_payments: 30 218
Each member pays once a year, some pay for several years in row, some (75 %) never pay. Payments are spread over 20 years.
Average runtime for 5 queries: 0.0026 s
select
*
from
c_m_members m
left outer join c_m_payments platba_minuly on (m.c_id_m = platba_minuly.c_id_m)
left outer join c_m_payments platba_tento on (m.c_id_m = platba_tento.c_id_m)
where
(
platba_minuly.year = (year(now())-1) and
platba_tento.year = year(now())
)
AND
…
Average runtime for 5 queries: 0.0095 s
select
*
from
c_m_members m
left outer join c_m_payments platba_minuly
on (m.c_id_m = platba_minuly.c_id_m and platba_minuly.year = (year(now())-1))
left outer join c_m_payments platba_tento
on (m.c_id_m = platba_tento.c_id_m and platba_tento.year = year(now()))
where
…
The reason is most likely the optimizer which can handle WHERE caluse much better.
The first query retrieve all rows from first table while using the ON condition to decide which rows info to retrieve from the Left joined table.
The second query is doing simple Left-Join and then it filters out rows not matching the WHERE clause conditions.
This means the number of rows retrieved from your first query is much larger then the second.
I just want to comment to THANK YOU for the post!
I was struggling to make a left join with additional conditions to work.
I looked into the official MySQL documentation, and they simply do not explain the difference of using ON and the difference of using AND or WHERE, not to mention the “Matching stage” and the filtering of those results afterwards (maybe they explain it but the manual is ridicullously complicated for what must be simple to explain, as you did, so I just gave up)
Other pages just failed to mention that you could filter results using WHERE, or that you can broaden the matching condition inside the ON part.
your examples are just perfect!
for people interested, what i wanted is to get all the events a person has been called to (‘summons’ table) joined with the table that registers attendance to such events (‘attenances’ table).
so my query ended like this:
SELECT *
FROM summons s
LEFT JOIN attendances a
ON (s.event_id = a.event_id
AND s.employee_id=a.employee_id)
WHERE s.employee_id=15
in the ON part (matching stage) I wanted that the events and employees where the same, and then i just needed to filter the specific employee i wanted (15 in this case).
So that way i get all summons for that employee, and get NULL rows for the ones to which they didn’t assist.
what i am afraid of, is that when i get to have hundreds of rows in each table, the query could take a while to run. (right now i just have like 20 summon rows, and like 100 attendances.
Any ideas of optimizing the query and getting the same result? BTW, both tables are independent to each other, none of either has a FK pointing to the other. I don’t know it that could matter.
thank you very much!!! m(_ _)m
You are welcome!
For your question: if you have a compound index consist from (employee_id,event_id) on each table, your query should run fast even if you will have hundreds of rows.
If you will need help you can feel free to ask me.
This was a wonderful explanation and very timely for me, as I have been struggling with a complex LEFT JOIN that I couldn’t figure out. I had been wondering if there is any difference to the result based on the order of the criteria in the ON clause and you explained that also.
Thank you very much!
You provided some excellent information that helps people understand outer joins better, but you neglected to mention the effect of adding selection criteria in the join clause when the join is not fully covered. “Weight = 44” is a selection criteria, not a join criteria so that the engine must access both the index and the data space to satisfy the the join the criteria unless Weight is included in the same index with id. Properly segregate the join and selection criteria and insure that the join criteria is fully covered (index) for best performance. Remember that selection criteria is much more volatile that join criteria which is fundamental to data design.
Since the original article is a few years old, there’s no need to post my reply. I just ran across it during learning the variations in mySQL syntax after years of Sybase, Oracle and SQLServer. It was a good article – thanks for helping the community.
great post.Good explanation of difference between conditon with on in left join & where clause.