LIMITing the amount of data with SQL
SQL can handle very large amounts of data. But if you’re working with very large data sets, this requires a lot of computing power, which can take up resources both in terms of time and money. So it’s a good idea to limit the amount of data you work with in SQL. Especially when you’re developing your models, it’s often enough to work with a smaller sample of the full data set, and then you can apply the final model on the full table. You can use the LIMIT function for this. The order line table contains 1000 rows. If you wanted to develop a model for this table by only experimenting with the first 5 rows, it could be done like this:
1select
2 *
3from
4 {{raw.e_commerce_sample.webshop_order_line}}
5limit
6 5
This returns this table:
The order in which you place your ‘limit’ statement in your query is important. For example, if you want to add an ‘order by’ statement to the query above and order the query as:
- SELECT
- FROM
- LIMIT
- ORDER BY
The result would be to first select the 5 top rows in the raw table, and then order those 5 rows. If you change the order of the statements to:
- SELECT
- FROM
- ORDER BY
- LIMIT
The resulting table would instead be the 5 top rows after you’ve ordered them. You can try this on your own.