Subqueries

60 min12 pages

What is Subqueries?

Concept from package: Course on SQL and relational databases covering querying, data retrieval, joins, and database design

~60 min12 pages
SQLsubqueriesdatabase

Subqueries are queries nested inside another SQL query and are evaluated to produce a result that the outer query can use. They are powerful tools for isolating a set of values before applying a broader operation, such as filtering, aggregation, or comparison. A subquery can appear in several places: in the WHERE clause, as a scalar value in the SELECT list, in the FROM clause as a derived table, or in EXISTS checks. Understanding subqueries begins with recognizing how the inner query is executed first and how its results influence the outer query. For example, you might want to find all employees whose salary is above the average salary across the company. Here, the inner query would compute the average salary, and the outer query would compare each employee's salary to that value. Subqueries can be correlated or uncorrelated: an uncorrelated subquery can be run independently and returns a fixed result, while a correlated subquery references columns from the outer query and is evaluated once for each row of the outer query. This distinction matters for performance and readability. In practice, subqueries enable concise, expressive data retrieval without requiring multiple separate statements or temporary tables. They are a step toward more advanced patterns like common table expressions (CTEs) and window functions, but they also come with caveats: they can be less performant if not written carefully, and incorrect correlation can lead to unexpected results or multiple-row scalars where a single value is expected. Start by warming up with an uncorrelated subquery that filters rows based on a fixed value from the inner query, then explore correlated forms that dynamically depend on the outer query’s current row.

Which statement best describes a subquery in SQL?

A subquery is a separate query that runs after the outer query completes, used for logging.
A subquery is a query nested inside another query, whose result or existence can influence the outer query.
A subquery is a stored procedure invoked from within the outer query.
A subquery is only allowed in the FROM clause as a temporary table.

Sign up to unlock quizzes

Example: Simple Uncorrelated Subquery

SELECT employee_id, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

Example: Subquery in FROM (Derived Table)

SELECT d.dept_id, d.avg_salary FROM (SELECT dept_id, AVG(salary) AS avg_salary FROM employees GROUP BY dept_id) AS d WHERE d.avg_salary > 60000;

Example: Correlated Subquery

SELECT e.employee_id, e.name FROM employees e WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

Uncorrelated subqueries run independently of the outer query and return a single value or a set of values that do not reference columns from the outer query. They are often easier to optimize because the inner query can be executed once. A classic case is filtering by a constant derived from another query, such as retrieving products with prices above the maximum price found in a different category. Uncorrelated subqueries can be used in the WHERE, HAVING, or IN clauses. When used in IN, the inner query must return a single column, which the outer query compares against. In contrast, correlated subqueries reference outer query columns, making them dependent on the outer row. This dependence means the inner query is typically executed once per outer row, which can be expensive if not indexed properly. A common strategy is to rewrite a correlated subquery as a JOIN or use a CTE to improve readability and performance. As you gain comfort, practice identifying whether a subquery references outer columns and consider the performance implications of repeated execution.

In which scenario is a subquery considered uncorrelated?

When the inner query references a column from the outer query
When the inner query can be executed independently of the outer query
When the outer query depends on the inner query's values for each row
When the inner query returns multiple columns

Sign up to unlock quizzes

Example: Subquery in IN

SELECT product_id, product_name FROM products WHERE category_id IN (SELECT category_id FROM categories WHERE is_active = TRUE);

Example: Scalar Subquery in SELECT

SELECT employee_id, (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e1.department_id) AS avg_department_salary FROM employees e1;

Which clause can host a scalar subquery returning a single value?

SELECT
WHERE
FROM
SELECT (subquery) AS value

Sign up to unlock quizzes

Practice: NOT EXISTS

SELECT e.employee_id, e.name FROM employees e WHERE NOT EXISTS (SELECT 1 FROM payroll p WHERE p.employee_id = e.employee_id AND p.paid = FALSE);

EXISTS and NOT EXISTS are special operators that test for the presence of rows returned by a subquery. EXISTS returns true if the subquery yields at least one row; NOT EXISTS returns true if the subquery yields no rows. These operators are optimized by the database engine to stop scanning as soon as a match is found, which can be efficient for large datasets. A typical use case is identifying customers who have placed orders: SELECT customer_id FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id); Here, the inner query looks for matching orders, and the outer query returns customers who have at least one order. NOT EXISTS, on the other hand, helps find records without a related entry, such as customers who never placed an order. Both forms support correlated inner queries and are powerful for anti-joins and verification checks without needing explicit joins.

What does EXISTS check in a subquery?

It returns all matching rows from the inner query
It returns true if the inner query produces at least one row
It deletes rows found by the inner query
It sorts the results of the inner query

Sign up to unlock quizzes

Example: NOT EXISTS

SELECT customer_id, name FROM customers c WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

Example: EXISTS with Correlation

SELECT department_id, dept_name FROM departments d WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.department_id AND e.salary > 100000);

Which statement about NOT EXISTS is true?

It returns results when the inner subquery finds at least one match
It returns results when the inner subquery yields no rows
It always requires a JOIN to function
It can only be used in the SELECT clause

Sign up to unlock quizzes

Practice: Correlated EXISTS

SELECT c.customer_id, c.name FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.order_date >= '2024-01-01');

Correlated subqueries reference the outer query’s columns, making the inner query depend on the current row. This means the inner query is evaluated for each row produced by the outer query. The classic example is calculating an attribute per group, such as listing departments with an average salary above a threshold defined by the outer query. While powerful, correlated subqueries can be less efficient because the inner query runs repeatedly. A common optimization is to rewrite the logic using a join to avoid repeating the inner computation for every row, or to use a CTE to materialize the inner result once and then reference it. Understanding when to use a correlated subquery versus a JOIN is a key skill in SQL performance tuning.

What characterizes a correlated subquery?

The inner query references outer query columns
The inner query can be executed without the outer query
The inner query returns multiple outer columns
The inner query is always in the FROM clause

Sign up to unlock quizzes

Example: Correlated Subquery in WHERE

SELECT e.employee_id, e.name FROM employees e WHERE e.salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id);

Example: Moving Correlation to JOIN

SELECT d.dept_id, d.dept_name, AVG(e.salary) AS avg_salary FROM departments d JOIN employees e ON e.department_id = d.dept_id GROUP BY d.dept_id, d.dept_name HAVING AVG(e.salary) > (SELECT AVG(salary) FROM employees);

Which approach often improves performance when using correlated logic?

Convert to a correlated subquery in the SELECT list
Rewrite using a JOIN with aggregation when appropriate
Avoid any subqueries and always fetch all data
Use a subquery in the ORDER BY clause

Sign up to unlock quizzes

Practice: Subquery in HAVING

SELECT dept_id, AVG(salary) AS avg_salary FROM employees GROUP BY dept_id HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Scalar subqueries return a single value used in expressions. They are common in the SELECT clause to compute derived metrics per row or in the WHERE clause to compare a field to a computed value. A scalar subquery must return exactly one value; if it returns more than one, the query fails with an error unless used with an operator like IN. Practice with an example: find each employee’s salary as a percentage of the company’s total payroll. The inner subquery returns the total payroll, and the outer query calculates the percentage for each employee. Be mindful if the inner subquery can return NULL values, and consider how NULLs affect comparisons and arithmetic. Also, ensure that scalar subqueries are efficient by avoiding expensive repeated scans; use proper indexing or materialized results when possible.

What happens if a scalar subquery in the SELECT list returns more than one value?

SQL automatically picks the first value
The query returns multiple rows for that outer row
The query results in an error
The inner query is ignored and outer row is returned as-is

Sign up to unlock quizzes

Example: Scalar Subquery in SELECT

SELECT e.employee_id, e.name, (SELECT SUM(salary) FROM employees) AS total_salary FROM employees e;

Example: Subquery with IN

SELECT product_id, product_name FROM products WHERE category_id IN (SELECT category_id FROM categories WHERE is_active);

Which clause commonly hosts a scalar subquery in this context?

WHERE
SELECT
ORDER BY
FROM

Sign up to unlock quizzes

Practice: Scalar Subquery in WHERE

SELECT e.employee_id, e.name FROM employees e WHERE e.salary > (SELECT AVG(salary) FROM employees);

Subqueries in the FROM clause create derived tables, which are temporary result sets you can query against in the outer query. This pattern can simplify complex logic by isolating an expensive calculation first, then reusing its results. Examples include computing per-category aggregates and joining that result to the main table. Derived tables must have valid column names; you can alias them for clarity. When using derived tables, ensure the inner query returns a single column or provide named columns for multiple columns. Performance considerations include materializing the derived table and indexing the outer join on the derived content. This technique is often interchangeable with common table expressions (CTEs) for readability and maintainability.

What is a derived table in SQL terms?

A temporary table produced by a subquery in the FROM clause that outer queries can reference
A permanent table created by a CREATE TABLE AS statement
A table created by joining two tables directly without a subquery
A table that stores SQL execution plans

Sign up to unlock quizzes

Example: Derived Table

SELECT dt.category_id, dt.total_sales FROM (SELECT category_id, SUM(sales) AS total_sales FROM orders GROUP BY category_id) AS dt WHERE dt.total_sales > 10000;

Example: Multiple Columns in Derived Table

SELECT d.category_id, d.total_qty, p.product_name FROM (SELECT category_id, SUM(quantity) AS total_qty FROM order_items GROUP BY category_id) AS d JOIN products p ON p.category_id = d.category_id;

When using a derived table, what must you provide so the outer query can reference its columns?

An alias for the subquery in the FROM clause
A WHERE clause in the inner query
A GROUP BY in the outer query
A JOIN condition without an alias

Sign up to unlock quizzes

Practice: Complex Derived Table

SELECT dt.category_id, dt.avg_price FROM (SELECT category_id, AVG(price) AS avg_price FROM items GROUP BY category_id) AS dt ORDER BY dt.avg_price DESC;

You've previewed 6 of 12 pages

Sign up free to unlock the rest of this lesson and start tracking your progress.

6 more pages waiting:

  • Page 7
  • Page 8
  • Page 9
  • Page 10
  • +2 more...

Related Topics