<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>swaits.com: Understanding JOINs</title>
    <link>http://swaits.com/articles/2006/03/29/understanding-joins</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>A Blog by Stephen Waits</description>
    <item>
      <title>Understanding JOINs</title>
      <description>&lt;h1&gt;Understanding JOINs&lt;/h1&gt;

&lt;h2&gt;Introduction&lt;/h2&gt;

&lt;p&gt;As my wife was taking a class on SQL, it occurred to me that some people may use JOINs without understanding how they work.  When she began her class, I told her the toughest concept to grasp would be JOINs.  Then one day she came home and said that they&amp;#8217;d done JOINs that afternoon. They&amp;#8217;d spent an hour or two out of a full five day course on one of the most conceptually difficult areas within all of SQL.&lt;/p&gt;

&lt;p&gt;I believe understanding how something works makes it much easier to use.  So, with that, to help her, and others, I decided to write this little tutorial.  It&amp;#8217;ll be written the way I came to understand joins.&lt;/p&gt;

&lt;h2&gt;The Cartesian Product&lt;/h2&gt;

&lt;p&gt;Forget about JOINs for now.  We&amp;#8217;re going to learn something else first.&lt;/p&gt;

&lt;p&gt;It really all boils down to this.  Understand what&amp;#8217;s going on here, combined with your knowledge of SELECT statements and WHERE clauses, and you&amp;#8217;ll have it.  Don&amp;#8217;t make it too difficult.&lt;/p&gt;

&lt;p&gt;What is the Cartesian Product?  It&amp;#8217;s every possible combination of each of the rows in two or more tables.  For example, let&amp;#8217;s say we have two tables, T1, and T2.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;T1   T2
==   ==
 a    1
 b    2
 c    3
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Given that, the Cartesian Product of T1 and T2 is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;T1,T2
=====
 a  1
 a  2
 a  3
 b  1
 b  2
 b  3
 c  1
 c  2
 c  3
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now go back to our definition of the Cartesian Product: every possible combination of the rows in two or more tables.  See how the rows in T2 (1, 2, 3) are repeated once for each of the rows in T1 (a, b, c)? That&amp;#8217;s the Cartesian Product.&lt;/p&gt;

&lt;p&gt;Another example that may be more familiar to you:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Rank   Suit
====   ====
  A    Spades
  K    Hearts
  Q    Diamonds
  J    Clubs
  10
  9
  8
  7
  6
  5
  4
  3
  2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the Cartesian Product?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Rank, Suit
==========
 A Spades
 A Hearts
 A Diamonds
 A Clubs
 K Spades
 K Hearts
 K Diamonds
 K Clubs
 ... (removed for brevity)
 3 Spades
 3 Hearts
 3 Diamonds
 3 Clubs
 2 Spades
 2 Hearts
 2 Diamonds
 ... hint, hint, it's a full deck of 52 cards!
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice the size of the Cartesian Product?  52.  How is that?  13 Ranks, 4 Suits.  13 * 4 = 52.  That&amp;#8217;s right, it&amp;#8217;s just the product of the number of rows in each of the tables involved.  This isn&amp;#8217;t that important.&lt;/p&gt;

&lt;p&gt;A more complex example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;T1   T2   T3
==   ==   ==
 a    1    +
 b    2    $
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The Cartesian Product of more than two tables works just the same way as for two tables.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;T1,T2,T3
========
 a  1  +
 a  1  $
 a  2  +
 a  2  $
 b  1  +
 b  1  $
 b  2  +
 b  2  $
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Again, it&amp;#8217;s just every possible combination of of each row in all of the tables involved.  This time it was 8 rows because each of the three tables only had two rows in it.  2 * 2 * 2 = 8.&lt;/p&gt;

&lt;h2&gt;A Slightly More Useful Cartesian Product&lt;/h2&gt;

&lt;p&gt;Now let&amp;#8217;s look at the Cartesian Product of two tables that might be a little more useful:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;People   id  name
======   --  ----
          0  Susan
          1  Frank

Phones   person_id  phone
======   ---------  -----
                 0  222-3456
                 1  777-8989
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we&amp;#8217;ve laid out two tables with a single relation.  Clearly, we&amp;#8217;re trying to represent two people stored in the People table, each with a phone number stored in the Phones table.  Susan&amp;#8217;s number is 222-3456 and Frank&amp;#8217;s number is 777-8989.&lt;/p&gt;

&lt;p&gt;And what happens when we mix these two tables up into a Cartesian Product?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;people.id  people.name  phones.person_id  phones.phone
---------  -----------  ----------------  ------------
        0        Susan                 0      222-3456
        0        Susan                 1      777-8989
        1        Frank                 0      222-3456
        1        Frank                 1      777-8989
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What do we have here?  Just another Cartesian Product.  That&amp;#8217;s each row from People combined with each row from Phones.  Specifically, it&amp;#8217;s the 1st row from People with the 1st row from Phones, the 1st row from People with the 2nd row from Phones, the 2nd row from People with the 1st row from Phones, and 2nd row from People with the 2nd row from Phones.  Study this table until it makes sense to you!&lt;/p&gt;

&lt;h2&gt;Now do a JOIN&lt;/h2&gt;

&lt;p&gt;Here&amp;#8217;s the thing.  I told you not to think about JOINs, but we&amp;#8217;ve actually been doing a JOIN this whole time.  That&amp;#8217;s right.  The Cartesian Product is the result of JOINing two or more tables together.  It&amp;#8217;s actually a &amp;#8220;join&amp;#8221; in the most literal sense of the word.  In fact, we got that combination (Cartesian Product) of People and Phones earlier by using this very SQL statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select * from people,phones;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice how we selected everything from two tables?  Side note: this raw Cartesian Product is technically a &amp;#8220;Cross Join&amp;#8221;, but don&amp;#8217;t worry about that for now.  As we&amp;#8217;ll see later that there are several types of joins.&lt;/p&gt;

&lt;p&gt;Unfortunately, this JOIN is not that useful.  It&amp;#8217;s clear by looking at the People and Phones table that Susan&amp;#8217;s phone number is 222-3456 and Frank&amp;#8217;s phone number is 777-8989.  But since the raw Cartesian Product (or a &amp;#8220;Cross Join&amp;#8221;) combines &lt;em&gt;every&lt;/em&gt; row in one table with &lt;em&gt;every other&lt;/em&gt; row in another table, we&amp;#8217;re getting all kinds of bogus &lt;em&gt;unrelated&lt;/em&gt; rows in our select.&lt;/p&gt;

&lt;p&gt;How can we make it more useful?  Let&amp;#8217;s look at the data again.  Try to pick out the &lt;em&gt;related&lt;/em&gt; rows when you look at the data.  In other words, which rows do we care about?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM people,phones;

people.id  people.name  phones.person_id  phones.phone
---------  -----------  ----------------  ------------
        0        Susan                 0      222-3456
        0        Susan                 1      777-8989
        1        Frank                 0      222-3456
        1        Frank                 1      777-8989
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;How will we tell the database server to keep the &lt;em&gt;related&lt;/em&gt; rows and throw away the &lt;em&gt;unrelated&lt;/em&gt; rows for us?&lt;/p&gt;

&lt;p&gt;Well, just like most other SELECT statements, we need to add a WHERE clause to tell it exactly which rows we care about.  Which rows do we care about in this case?  Take a look at the result; we want the rows where &amp;#8220;people.id&amp;#8221; is the same as &amp;#8220;phones.person_id&amp;#8221;.  In other words, just the rows that make sense.  Let&amp;#8217;s add that to our statement and run it again:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM people,phones 
WHERE people.id=phones.person_id;

people.id  people.name  phones.person_id  phones.phone
---------  -----------  ----------------  ------------
        0        Susan                 0      222-3456
        1        Frank                 1      777-8989
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Do you see what we did?  We picked a few rows out of the (potentially huge) Cartesian Product.  That&amp;#8217;s all there is to it.  Important Note: This is an INNER JOIN.  It&amp;#8217;s the alternate syntax and it&amp;#8217;s the easiest to understand.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s make our INNER JOIN even more useful by only specifying the columns we care about in our SELECT statement:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT people.name,phones.phone FROM people,phones
WHERE people.id=phones.person_id;

people.name  phones.phone
-----------  ------------
      Susan      222-3456
      Frank      777-8989
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we&amp;#8217;re getting somewhere!  All we did this time was choose to only select the columns &amp;#8220;people.name&amp;#8221; and &amp;#8220;phones.phone&amp;#8221; instead of &amp;#8220;*&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Before we move on, we can do one more thing to this SELECT to clean it up a bit.  In this case, we&amp;#8217;ll use some aliases to shorten up our table names and make it a little easier to type&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ppl.name,ph.phone 
FROM people AS ppl,phones AS ph 
WHERE ppl.id=ph.person_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, we simply aliased &amp;#8220;people&amp;#8221; to &amp;#8220;ppl&amp;#8221; and &amp;#8220;phones&amp;#8221; to &amp;#8220;ph&amp;#8221;.  This just makes it a little easier to read and edit.&lt;/p&gt;

&lt;h2&gt;Syntax&lt;/h2&gt;

&lt;p&gt;Most SQL programmers like to format their complex SELECT statements a little more nicely than all strung out on a single line.  Our SELECT statement from above, when formatted nicely ends up looking like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ppl.name, ph.phone
  FROM people AS ppl, phones AS ph 
  WHERE ppl.id=ph.person_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The INNER JOIN syntax we used here is implicit.  See how the words INNER and JOIN never appear in this statement?  That&amp;#8217;s because this is an alternate syntax.  In fact, it&amp;#8217;s important to realize that the INNER JOIN above is &lt;em&gt;exactly the same&lt;/em&gt; as the following INNER JOIN:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT ppl.name, ph.phone
  FROM people AS ppl
  INNER JOIN
  phones AS ph
    ON ppl.id=ph.person_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is the official syntax; however, the first syntax is widely accepted.  I personally prefer the first, simpler syntax for INNER JOINs.&lt;/p&gt;

&lt;p&gt;The most important thing to remember, though, is that these two are identical statements.&lt;/p&gt;

&lt;h2&gt;More JOINs&lt;/h2&gt;

&lt;p&gt;There&amp;#8217;s actually a little more to JOINs than the basic INNER JOIN. However, it&amp;#8217;s &lt;em&gt;very important&lt;/em&gt; to remember that they&amp;#8217;re all basically variations on this simple JOIN (aka Cartesian Product).  In general, the JOIN builds up a huge wad of combined rows, and the WHERE trims it down to just the few rows you&amp;#8217;re interested in.&lt;/p&gt;

&lt;p&gt;This is the exact same process we followed earlier.  Use the JOIN to build up a bunch of data, and use the WHERE to pull out only the useful data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;INNER - The intersection of two tables; a &amp;#8220;default&amp;#8221; JOIN.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LEFT OUTER - Every row in the &amp;#8220;left&amp;#8221; table, plus the right table, filling in empty right rows with NULLs as needed.  This goes beyond the Cartesian Product because it will join in NULL rows that don&amp;#8217;t really exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RIGHT OUTER - Like LEFT OUTER, but every row on the &amp;#8220;right&amp;#8221;, filling in NULLs on the &amp;#8220;left&amp;#8221;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more details on each JOIN type, please see the &lt;a href="http://en.wikipedia.org/wiki/Join_" title="SQL"&gt;JOIN entry at Wikipedia&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Summary&lt;/h2&gt;

&lt;p&gt;JOINs are pretty simple.  The actual JOIN itself builds up a large chunk of rows.  Many of those rows won&amp;#8217;t be useful at all.  You use the WHERE clause to pull out only the rows that make sense and that you&amp;#8217;re interested in.&lt;/p&gt;

&lt;p&gt;The INNER JOIN, the most common, is always a subset of the Cartesian Product.  The LEFT OUTER and RIGHT OUTER JOINs take that a little farther, but not much.&lt;/p&gt;

&lt;p&gt;I hope I&amp;#8217;ve helped you better understand JOINs.&lt;/p&gt;</description>
      <pubDate>Wed, 29 Mar 2006 00:31:55 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:1b701669-bab1-4daa-861d-ed8f9631b45b</guid>
      <author>steve@waits.net (Stephen Waits)</author>
      <link>http://swaits.com/articles/2006/03/29/understanding-joins</link>
      <category>programming</category>
      <trackback:ping>http://swaits.com/articles/trackback/212</trackback:ping>
    </item>
  </channel>
</rss>
