How to Query Multiple Tables with JOINs
SQL databases are amazing tools that help programmers store tons of data that is easy to retrieve in a very structured way. To properly use databases, though, you need to understand how to retrieve data from them. After all, that's one of the primary functions of a SQL database: to retrieve information at lightning speed. So, let's discuss how to cross-reference data in a SQL database using Join commands.
Note that we aren't going to reference code snippets in this article. Instead, we are going to discuss what a Join statement is and how they work. Join statements are easy to construct after you understand how and why they are used, so you need to learn that nugget first.
What is a SQL Join Statement?
SQL is a structured data storage system. That means that any of the data you are storing in a SQL database needs to correlate in some way. For instance, you could have a database that stores customer information, product information, and which customers purchased which products.
Do you see how that data all correlates somehow? Customers purchase products, so you need to know who that customer is, what product they purchased, and how much you need to charge for that product.
That data is stored in different tables in a database, though. One table stores information about your customers, another stores information about the products you sell, and another stores the information about which customer purchased which products.
That's what a Join statement is used for in a SQL query. Join statements combine related data between tables in a database so you can reference that data.
For instance, if you need to know all the products that Johnny B. purchased last year, you need to merge the information about Johnny B. in the customer information table along with the sales data from the table that holds that data. Likewise, you may need to know how much money Johnny B. spent, so you need to merge data from the table holding sales information with the table that stores information about the products you sell.
There are multiple types of Join statements depending on how you need to merge data, too:
Of course, being able to use the Join commands in a SQL query depends on having a well-organized and well-normalized database.
An Overview of Querying Multiple Tables with JOINs [VIDEO]
In this video, CBT Nuggets trainer Garth Schulte covers querying multiple tables with the many join operators available in SQL, which provides you with many different approaches to querying multiple tables. Garth covers inner joins, three kinds of outer joins, cross-joins, and self-joins. After all that, he’ll give a few tips for making your multi-table queries efficient and more useful.
What Does it Mean to Normalize a Database?
Relational databases, like a SQL database, store data that correlates to each other. Though data might be 'relational', it might not be the same type of data or data about the same things. So, there is the concept of normalizing a database.
Let's say that you have a simple e-commerce application that sells products to consumers. You need to store information about your customers, information about the products you sell, and information about any sales you made.
In this example, you would have three tables in your database. One table will hold customer information like a customer's name, phone number, and street address. Another table will hold product information like the name of a product, that product's SKU, and how much that product costs to customers. The last table will hold information about which customer purchased which products and when.
You wouldn't want to store all of that information in one table. Otherwise, each record in that table would store duplicate information like a customer's phone number or a product SKU multiple times. We don't need to know Johnny B.'s address every single time he makes a purchase at our store. That's a bit obnoxious.
Separating that information into different tables so we don't have duplicate information is called normalizing a database.
You have noticed that the table in the database that stores sales information could have duplicate data, though. What if Johnny B. always purchases milk and eggs every Friday afternoon? Well, in this case, each record is a different event or object. Likewise, Johnny B. is a unique object, too. We aren't getting a new Johnny B. every Friday when they purchase milk and eggs. So keeping Johnny B.'s personal information and his sales records in different tables is appropriate.
What are Primary and Foreign Keys?
How do we match data from different SQL tables together? We use things called keys. There are different types of keys, but for simplicity's sake, we are only going to discuss primary and foreign keys. So, what is a primary key?
If you own a business that sells gizmos, you might have a database that stores customer information, sales information, and product information. That data would be stored in different SQL tables. So, the customer information table should not have multiple records for a customer named Johnny B. unless you physically had different customers with the same name. If there is only one Johnny B. there should only be one record in that customer information table.
But, we just said you could have different customers named Johnny B. How does this work? Each entry in the customer information table in your database should have a primary key. A primary key is a unique identifier and cannot belong to different records in that table. So, each customer named Johnny B. would have a different primary key. This is also the reason that repeatable things like a person's name shouldn't be used as a primary key.
Foreign keys are nothing more than keys that point to the primary key of data in another table. So, using the example above, each sales record in your database would have a foreign key that points to the primary key of a customer record in the customer information table. By doing this, we can pull the correct sales data for the correct Johnny B. since we have multiple customers named Johnny B. That way we can send a 'Thank You' card to the correct customer for spending so much money at our fictional store.
What are Inner Joins?
Inner Joins are used in SQL statements when you need to merge data and that data must match. For example, if you need to find all the customers that should be mailed a thank you card for spending a lot of money with your business, you need to know how much those customers spent along with their mailing address. If a customer doesn't have a mailing address, then you don't want their data, though.
Sometimes SQL tables are designed to store data that is nullable. That means a record can be stored without any value in the database. Joining tables requires data to exist to function properly, though. If a record doesn't have any data, the join operation can't be performed because there isn't any data to match together. So, data such as primary keys and foreign keys cannot have a null value and function with an inner join operation properly. If data is missing from that record, the join command will simply ignore that data. If the table you are trying to join does have nullable data, it may be a better candidate for an outer join instead of an inner join.
What are Outer Joins?
Outer Joins can match data from multiple tables together even if specific data you want to match is null.
One of the ways Outer Joins can be useful is if a data column can be nullable. For example, let's say that you have three tables. One of those tables stores customer information, another stores customer's primary contact information, and the last table stores alternate contact information for customers. Customers will always have primary contact information, but they may not have alternate contact information.
If you performed an inner join with these three tables, any customers without alternate contact information would be ignored. On the other hand, if you performed an Outer Join with the alternate contact table, your query would still display records of customers without that data.
What are Cross Joins?
A Cross Join is different from an Inner or Outer Join. A Cross Join is used when a SQL query result would require a record from one table to match multiple times with a record from a different table causing data in a query to be duplicated multiple times.
Let's say that you have multiple products in your store that come in different colors. You want to pull all the information for your database for each of the products that come in various colors. For instance, you sell a bicycle and water bottle for that bike. Both the bicycle and water bottle come in five different colors. In this example, your SQL query that uses a Cross Join to find that information would list each bicycle in each color and each water bottle in each color as well. Assuming you only had one bike type and one water bottle type, this SQL query with a Cross Join would return ten results.
What are Self Joins?
What is a self-join in SQL, and why would you use a self-join? A self-join is when a SQL query makes a table reference to itself.
That might seem confusing at first, but let's consider this example. Suppose you have an e-commerce app, and in that app, you need to let customers bill sales to another party. That secondary party would still be documented as a customer in the database, though. That customer wouldn't be stored on a different table. So, you need to join the same table together to cross-reference two different customer records together. That way if Johnny B. is making Sally P. pay for their order, you can pull the appropriate information from the database to ship Johnny B.'s order to them while sending the bill to Sally P. at a different address.
We covered a lot of information in this article, but there is still a lot to learn about SQL joins. You are in luck, though. For some people, videos can be a lot easier to understand. So, if you want to review the information we discussed in this article as well as learn a couple of new things about SQL joins, check out this YouTube video on SQL joins. If you would prefer a much deeper dive into the world of SQL and data management, check out CBT Nuggets' T-SQL course.