How to join two or more tables using SQL INNER JOIN

This article describes what a SQL INNER JOIN join is, why it is used, and how to write a SELECT statement that uses a join. It also covers some of the basics of relational database design, including the most commonly used join, the equi-join (also known as the inner join).

1. Connection

One of the most powerful features of SQL is the ability to join tables in the execution of data queries. Joins are the most important operation you can perform with SQL's SELECT, and a good understanding of joins and their syntax is an extremely important part of learning SQL.

Before you can use joins effectively, you must understand relational tables and some basics of relational database design. The following introduction doesn't cover everything on this topic, but it's enough to get you started.

1.1 Relationship table

To understand relational tables, it is best to look at an example.

There is a database table that contains product catalogs, with one row for each type of item. For each item, the information to store includes the product description, price, and the supplier that produced the product.

Now there are multiple items produced by the same supplier, so where to store supplier information such as supplier name, address, contact method, etc.? The rationale for storing this data separately from product information is:

  • Each product produced by the same supplier has the same supplier information, and repeating this information for each product is a waste of time and storage space;
  • If the supplier information changes, such as supplier relocation or phone number change, it only needs to be modified once;
  • If there is duplicate data (i.e. each product stores supplier information), it is difficult to guarantee that the data is entered the same way each time. Inconsistent data is difficult to exploit in reports.

The point is, it's never a good thing to have the same data appear multiple times, which is fundamental to the design of relational databases.

The design of relational tables is to decompose information into multiple tables, one for each type of data. Tables are related to each other by some common values ​​(hence the name relational database).

In this example, two tables can be created: one to store supplier information and the other to store product information. The Vendors table contains all vendor information, one row per vendor with a unique ID.

This identification is called the primary key and can be a vendor ID or any other unique value.

The Products table only stores product information, it doesn't store information about vendors other than vendor ID s (the primary key of the Vendors table).

The primary key of the Vendors table associates the Vendors table with the Products table, and the vendor ID can be used to find out the details of the corresponding vendor from the Vendors table.

The benefits of doing this are:

  • Supplier information is not repeated, and time and space are not wasted;
  • If the supplier information changes, only a single record in the Vendors table can be updated, and the data in the related table does not need to be changed;
  • Since the data is not duplicated, the data is clearly consistent, making it easier to process the data and generate reports.

In conclusion, relational data can be stored efficiently and processed easily. Therefore, relational databases are far more scalable than non-relational databases.

Scalable (scale)

Ability to adapt to increasing workloads without failing. A well-designed database or application is said to scale well.

1.2 Why use joins

As mentioned earlier, breaking data into multiple tables is more efficient to store, easier to process, and more scalable. But these benefits come at a price.

If the data is stored in multiple tables, how can the data be retrieved with a single SELECT statement?

The answer is to use joins. Simply put, a join is a mechanism used to associate tables in a SELECT statement, hence the name join. Using a special syntax, it is possible to join multiple tables to return a set of outputs, joining the correct rows in the associated tables at runtime.

Description: Use an interactive DBMS tool

It is important to understand that connections are not physical entities. In other words, it doesn't exist in the actual database table. The DBMS makes joins as needed, and it persists for the duration of query execution.

Many DBMS s provide a graphical interface for interactively defining table relationships. These tools are extremely helpful in maintaining referential integrity. When working with relational tables, it is very important to insert legitimate data only in relational columns.

Returning to the example here, if an invalid supplier ID is stored in the Products table, the corresponding products are not accessible because they are not associated with a supplier.

To avoid this, the database can be instructed to only allow legal values ​​in the Vendors ID column of the Products table (that is, vendors that appear in the Vendors table).

Referential integrity means that the DBMS enforces data integrity rules. These rules are generally managed by the DBMS that provides the interface.

2. Create a connection

Creating a join is as simple as specifying all the tables to join and how to relate them. See the example below:

SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;

output:

vend_name                prod_name                prod_price
--------------------     --------------------     ----------
Doll House Inc.          Fish bean bag toy        3.4900
Doll House Inc.          Bird bean bag toy        3.4900
Doll House Inc.          Rabbit bean bag toy      3.4900
Bears R Us               8 inch teddy bear        5.9900
Bears R Us               12 inch teddy bear       8.9900
Bears R Us               18 inch teddy bear       11.9900
Doll House Inc.          Raggedy Ann              4.9900
Fun and Games            King doll                9.4900
Fun and Games            Queen doll               9.4900

Let's look at this code.

The SELECT statement, like all previous statements, specifies the columns to retrieve. The biggest difference here is that the two columns specified (prod_name and prod_price) are in one table, while the first column (vend_name) is in another table.

Now look at the FROM clause.

Unlike the previous SELECT statement, the FROM clause of this statement lists two tables: Vendors and Products.

They are the names of the two tables joined by this SELECT statement. The two tables are properly joined with a WHERE clause that instructs the DBMS to match the vend_id in the Vendors table with the vend_id in the Products table.

As you can see, the two columns to match are specified as Vendors.vend_id and Products.vend_id. This fully qualified column name is needed here, and if only the vend_id is given, the DBMS doesn't know which one it is referring to (one per table).

As you can see from the previous output, a single SELECT statement returned data from two different tables.

Warning: Fully Qualified Column Names

like How to use subqueries in SQL As mentioned, fully qualified column names must be used (separate table and column names with a period) when ambiguity in the referenced columns is possible.

Most DBMS s return an error if an ambiguous column name is referenced that is not qualified by a table name.

2.1 Importance of the WHERE clause

It may seem odd to use the WHERE clause to create joins, but there's actually a very good reason.

Remember that when joining several tables in one SELECT statement, the corresponding relationships are constructed on the fly. There is nothing in the database table definition that tells the DBMS how to join the tables. You have to do this yourself.

When joining two tables, what is actually done is pairing each row in the first table with each row in the second table.

The WHERE clause acts as a filter condition to include only those rows that match the given condition (here, the join condition). Without the WHERE clause, every row in the first table will be paired with every row in the second table, regardless of whether they logically fit together.

cartesian product

The result returned by a table relationship without a join condition is a Cartesian product. The number of rows retrieved will be the number of rows in the first table multiplied by the number of rows in the second table.

To understand this, look at the following SELECT statement and its output:

SELECT vend_name, prod_name, prod_price
FROM Vendors, Products;

output:

vend_name           prod_name                       prod_price
----------------    ----------------------------    ----------
Bears R Us          8 inch teddy bear               5.99
Bears R Us          12 inch teddy bear              8.99
Bears R Us          18 inch teddy bear              11.99
Bears R Us          Fish bean bag toy               3.49
Bears R Us          Bird bean bag toy               3.49
Bears R Us          Rabbit bean bag toy             3.49
Bears R Us          Raggedy Ann                     4.99
Bears R Us          King doll                       9.49
Bears R Us          Queen doll                      9.49
Bear Emporium       8 inch teddy bear               5.99
Bear Emporium       12 inch teddy bear              8.99
Bear Emporium       18 inch teddy bear              11.99
Bear Emporium       Fish bean bag toy               3.49
Bear Emporium       Bird bean bag toy               3.49
Bear Emporium       Rabbit bean bag toy             3.49
Bear Emporium       Raggedy Ann                     4.99
Bear Emporium       King doll                       9.49
Bear Emporium       Queen doll                      9.49
Doll House Inc.     8 inch teddy bear               5.99
Doll House Inc.     12 inch teddy bear              8.99
Doll House Inc.     18 inch teddy bear              11.99
Doll House Inc.     Fish bean bag toy               3.49
Doll House Inc.     Bird bean bag toy               3.49
Doll House Inc.     Rabbit bean bag toy             3.49
Doll House Inc.     Raggedy Ann                     4.99
Doll House Inc.     King doll                       9.49
Doll House Inc.     Queen doll                      9.49
Furball Inc.        8 inch teddy bear               5.99
Furball Inc.        12 inch teddy bear              8.99
Furball Inc.        18 inch teddy bear              11.99
Furball Inc.        Fish bean bag toy               3.49
Furball Inc.        Bird bean bag toy               3.49
Furball Inc.        Rabbit bean bag toy             3.49
Furball Inc.        Raggedy Ann                     4.99
Furball Inc.        King doll                       9.49
Furball Inc.        Queen doll                      9.49
Fun and Games       8 inch teddy bear               5.99
Fun and Games       12 inch teddy bear              8.99
Fun and Games       18 inch teddy bear              11.99
Fun and Games       Fish bean bag toy               3.49
Fun and Games       Bird bean bag toy               3.49
Fun and Games       Rabbit bean bag toy             3.49
Fun and Games       Raggedy Ann                     4.99
Fun and Games       King doll                       9.49
Fun and Games       Queen doll                      9.49
Jouets et ours      8 inch teddy bear               5.99
Jouets et ours      12 inch teddy bear              8.99
Jouets et ours      18 inch teddy bear              11.99
Jouets et ours      Fish bean bag toy               3.49
Jouets et ours      Bird bean bag toy               3.49
Jouets et ours      Rabbit bean bag toy             3.49
Jouets et ours      Raggedy Ann                     4.99
Jouets et ours      King doll                       9.49
Jouets et ours      Queen doll                      9.49

As you can see from the above output, the corresponding Cartesian product is not what we want. The data returned here matches every product with every supplier, including products for which the supplier is incorrect (even if the supplier has no product at all).

Note: Don't forget the WHERE clause

Make sure all joins have a WHERE clause, otherwise the DBMS will return much more data than it wants.

Similarly, to ensure the correctness of the WHERE clause. Incorrect filter conditions can cause the DBMS to return incorrect data.

Hint: Fork joins

Sometimes, a join that returns a Cartesian product is also called a cross join.

2.2 Inner joins

The join used so far is called an equijoin and it is based on an equality test between two tables. This join is also called an inner join. In fact, you can use a slightly different syntax for this join, specifying the join type explicitly.

The following SELECT statement returns exactly the same data as the previous example:

SELECT vend_name, prod_name, prod_price
FROM Vendors
INNER JOIN Products ON Vendors.vend_id = Products.vend_id;

The SELECT in this statement is the same as the previous SELECT statement, but the FROM clause is different.

Here, the relationship between the two tables is a partial FROM clause specified by an INNER JOIN.

When using this syntax, the join condition is given with a specific ON clause instead of a WHERE clause. The actual condition passed to ON is the same as that passed to WHERE.

Refer to the specific DBMS documentation for which syntax to use.

Explanation: "correct" syntax

The ANSI SQL specification prefers the INNER JOIN syntax, which previously used the simple equivalence syntax.

In fact, SQL purists treat simple syntax with contempt.

That is to say, DBMS does support simple format and standard format, I suggest you understand both formats, and it depends on which one is more comfortable for you to use.

2.3 Joining multiple tables

SQL does not limit the number of tables that can be joined in a SELECT statement. The basic rules for creating joins are also the same. List all the tables first, then define the relationships between the tables. E.g:

SELECT prod_name, vend_name, prod_price, quantity
FROM OrderItems, Products, Vendors
WHERE Products.vend_id = Vendors.vend_id
 AND OrderItems.prod_id = Products.prod_id
 AND order_num = 20007;

output:

prod_name           vend_name         prod_price     quantity
---------------     -------------     ----------     --------
18 inch teddy bear  Bears R Us        11.9900        50
Fish bean bag toy   Doll House Inc.   3.4900         100
Bird bean bag toy   Doll House Inc.   3.4900         100
Rabbit bean bag toy Doll House Inc.   3.4900         100
Raggedy Ann         Doll House Inc.   4.9900         50

This example shows the items in order 20007. Order items are stored in the OrderItems table. Each product is stored by its product ID, which references the product in the Products table.

These products are linked to the corresponding vendors in the Vendors table by vendor ID s, which are stored in each product's record.

Here the FROM clause lists three tables, the WHERE clause defines these two join conditions, and the third join condition is used to filter out the items in order 20007.

Note: Performance Considerations

The DBMS associates each table specified at runtime to handle joins. This processing can be very resource intensive, so care should be taken not to join unnecessary tables. The more tables that are joined, the worse the performance degradation.

Note: Maximum number of tables in a join

While SQL itself does not limit the number of tables in each join constraint, in practice many DBMS s do. See the specific DBMS documentation for its limitations.

Now to review How to use subqueries in SQL For example, the following SELECT statement returns a list of customers who ordered the product RGAN01:

SELECT cust_name, cust_contact
FROM Customers
WHERE cust_id IN (SELECT cust_id
                  FROM Orders
                  WHERE order_num IN (SELECT order_num
                                      FROM OrderItems
                                      WHERE prod_id = 'RGAN01'));

Such as How to use subqueries in SQL As mentioned, subqueries are not always the most efficient way to perform complex SELECT operations, here is the same query using joins:

SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
 AND OrderItems.order_num = Orders.order_num
 AND prod_id = 'RGAN01';

output:

cust_name                         cust_contact
-----------------------------     --------------------
Fun4All                           Denise L. Stephens
The Toy Store                     Kim Howard

Such as How to use subqueries in SQL As stated, the data returned in this query requires the use of 3 tables.

But here, instead of using them in nested subqueries, we are using two joins to join the tables. There are three WHERE clause conditions here. The tables in the first two associative joins, the latter one filters the data for the product RGAN01.

Tip: Do more experiments

As you can see, there is generally more than one way to perform any given SQL operation. There is rarely an absolutely right or absolutely wrong approach.

Performance can be affected by conditions such as the type of operation, the DBMS used, the amount of data in the table, the presence of indexes or keys, etc.

Therefore, it is necessary to experiment with different selection mechanisms to find out what works best for a particular situation.

Description: The column name of the join

In all of the above examples, the names of the columns in the join are the same (for example, the columns in the Customers and Orders tables are both called cust_id).

It is not required that the column names be the same, and you will often encounter databases with different naming conventions. I built the table like this just for simplicity.

3. Summary

Joins are one of the most important and powerful features in SQL, and their effective use requires a basic understanding of relational database design.

In this article's introduction to joins, I covered some basics of relational database design, including the most commonly used join, the equi-join (also known as the inner join).

How SQL uses self-join, natural join and outer join Describes how to create other types of joins.

Original link: https://www.developerastrid.com/sql/sql-inner-join/

(Finish)

Tags: SQL

Posted by shack on Tue, 17 May 2022 04:26:18 +0300