Using the JOIN function to combine tables
Right now the information in the three tables from the sample e-commerce data is quite dispersed. For example, the order table contains no other information about who made the order than the customer id. So if you need to contact a certain customer, you would have to take the customer id from the order table and then manually look that up in the customer table to find the corresponding email address. It would be better to have one combined table with all of this information. Here’s how this could be done in SQL:
As you see, this looks quite similar to a usual SELECT statement. The main difference in this query is the left join statement after the FROM statement. The left join statement is structured into the clause LEFT JOIN followed by the destination of the table being combined with the primary table. After this is an ON clause, which defines the common identifier in the two tables. This is what’s required for a left join to work.
Another new feature in the query above is the table alias preceding the selected columns. This gives an alias or nickname for the whole table, using AS to define the destination of the table. As you’ve learned, you can use AS to give aliases to individual columns (SELECT x_column AS x_alias). In a similar way, you can give tables an alias by adding the AS statement after the table destination in the FROM and JOIN statements. You can then refer to these table aliases when picking the columns in the SELECT statement by adding the alias and a . before the column name.
Table aliases are not a requirement for all columns when using joins, but if there are columns in the two tables that have the same header, you need to specify which of the tables you’re referring to. It can also be helpful to easily go back and see which columns belong to which tables, so it’s a good idea to add aliases if you start to combine many tables and columns.
You can try to type this query in the editor on your own:
1select
2 orders.id as order_id
3 , customers.first_name
4 , customers.last_name
5 , customers.email
6 , customers.phone
7 , orders.shipping_address_country
8 , orders.shipping_address_city
9 , orders.shipping_address_street
10 , orders.shipping_address_postal_code
11 , orders.billing_country
12 , orders.fulfillment_status
13 , orders.received_at
14 , orders.shipped_at
15 , orders.fulfilled_at
16from
17 {{raw.e_commerce_sample.webshop_order}} as orders
18 left join {{raw.e_commerce_sample.webshop_customer}} as customers on customers.id = customer_id
The result should look something like this:
In this example, columns from two different tables were selected and displayed together. But as you’ve probably figured out, left joins can be combined with all the other functions you’ve now learned to use in SQL.
For example: Say you want to know your revenue for every month. The sales numbers are stored in the Order line table, while the information about when the order was received is in the Order table. You need to join these two tables to get this view, then sum the quantity times the price, and finally group by the year and month of when the order was received. You can also order it by year and month. Here’s what the query looks like:
1select
2 extract(
3 year
4 from
5 (parse_date('%d / %m / %Y', orders.received_at))
6 ) as recived_at_year
7 , extract(
8 month
9 from
10 (parse_date('%d / %m / %Y', orders.received_at))
11 ) as recived_at_month
12 , sum(order_line.price * order_line.quantity) as total_revenue
13from
14 {{raw.e_commerce_sample.webshop_order}} orders
15 left join {{raw.e_commerce_sample.webshop_order_line}} order_line on orders.id = order_id
16group by
17 recived_at_year
18 , recived_at_month
19order by
20 recived_at_year
21 , recived_at_month
If you try on your own, the resulting table should look like this:
Exercise 11: Revenue per order
Say you want a table showing when each order was received and the revenue for each order. This would require joining the order line table with the order table, and then selecting the order id, the date when the order was received, and the sum of each quantity times the price. Remember that each order can have several order lines, so you’ll need to sum these up to get this for each order. Try to write a query that returns these columns in a table.
If you run your query, it should give a result like this:
In SQL there are often many means to an end, so you’ll usually be able to get the required result by writing different variations of the same query. A simple way to solve the exercise above would be to write a query like this:
1select
2 orders.id as order_id
3 , orders.received_at
4 , sum(order_line.price * order_line.quantity) as revenue_per_order
5from
6 {{raw.e_commerce_sample.webshop_order}} orders
7 left join {{raw.e_commerce_sample.webshop_order_line}} order_line on orders.id = order_id
8group by
9 order_id
10 , orders.received_at