This post on acing a SQL interview originally appeared on the Deskbright blog.
Background on SQL
SQL, a programming language used to communicate with databases, is one of the most useful skills to know for recruiting in the tech industry. Pronounced "ess-cue-ell" and short for Structured Query Language, this incredible tool is a must-have for analyzing large data sets. In particular, it shines when applied to relational databases — unique tables of data that are all related to each other in some way.
Because SQL is so ubiquitous in the tech world, many companies conduct SQL interviews before extending job offers. This helps ensure that job applicants — particularly for roles in project management, analytics, business intelligence, and software engineering — are comfortable using SQL on the job.
If you’ve got an upcoming SQL interview, you’re probably wondering what sorts of questions you might get. Recruiters are generally vague on details, and it can be scary to walk into a conversation like this one blind.
So, we’ve provided our list of the 7 most common SQL interview questions so that you can get some practice in before you exam. With a little bit of advance preparation, you’ll feel prepared and confident on interview day.
1. What is a relational database, and what is SQL? Please provide examples as part of your explanation.
Even though you probably know what relational databases are — and what SQL itself is — it can be tough to come up with coherent, simple explanations during a live interview. To that end, be sure that you've prepared in advance to answer this simple question. Here are some tips:
A relational database is a set of data tables that are somehow linked to — or related to — each other. It is used to store different types of information that can be pulled together to answer specific analytical questions. It's a useful way to minimize the amount of data stored on a server without losing any critical information.
That's a bit of a vague definition, so let's take a look at a relational database in practice. A simple version of a relational database for an online retailer might contain two separate data tables:
- Customers. A list of customer information, including customer names, contact information, and shipping preferences. Each record in this database contains a unique
customer_idfield by which the customer can be identified.
- Orders. A list of orders purchased at the retailer's website. Each order listing also contains a
customer_idfield, which is used to link that order's details with the specific customer who placed the order.
Of course, we wouldn't need a multi-table database if we simply included customer information on the
Orders table. But that wouldn't be particularly efficient: if a single customer placed multiple orders, his or her name, contact information, and shipping preferences would be listed on multiple lines of the
Orders table — leading to unnecessary duplication and an unmanageably large database. Instead, we create a relational database to save space and show how different pieces of data are linked together.
SQL, then, is simply the language used to communicate with this relational database. Databases don't yet understand human languages like English — it's simply too syntactically complex — so we use a standardized language to communicate with them that we know will be understood.
2. What are the different types of SQL
JOIN clauses, and how are they used?
In SQL, a
JOIN clause is used to return a table that merges the contents of two or more other tables together. For example, if we had two tables — one containing information on
Customers and another containing information on the
Orders various customers have placed — we could use a
JOIN clause to bring them together to create a new table: a complete list of orders by customer, with all necessary information to make shipments.
There are multiple types of
JOIN clauses, and they all serve slightly different functions:
INNER JOINreturns a list of rows for which there is a match in both tables specified. It's the default join type, so if you just type
JOINwithout specifying any other conditions, an
INNER JOINwill be used.
LEFT JOINwill return all results from the left table in your statement, matched against rows in the right table when possible. If a row in the left table does not contain a corresponding match in the right table, it will still be listed — with
NULLvalues in columns for the right table.
RIGHT JOINwill return all results from the right table in your statement, matched against rows in the left table when possible. If a row in the right table does not contain a corresponding match in the left table, it will still be listed — with
NULLvalues in columns for the left table.
FULL JOINwill return all results from both the left and the right tables in your statement. If there are instances in which rows from the left table do not match the right table or vice versa, all data will still be pulled in — but SQL will output
NULLvalues in all columns that are not matched.
CROSS JOINreturns the
Cartesian productof two tables — in other words, each individual row of the left table matched with each individual row of the right table.
3. Why is this query not returning the expected results?
We have 1000 total rows in the
SELECT * FROM orders; -- 1000 rows in set (0.05 sec)
And 23 of those orders are from the user with
customer_id = 45:
SELECT * FROM orders WHERE customer_id = 45; -- 23 rows in set (0.10 sec)
Yet, when we
SELECT the number of orders that are not from
customer_id = 45, we only get 973 results:
SELECT * FROM orders WHERE customer_id <> 45; -- 973 rows in set (0.11 sec)
973 + 23 = 996. But shouldn't the number of orders with
customer_id equal to 45 plus the number of orders with
customer_id not equal to 45 equal 1000? Why is this query not returning the expected results?
The answer: this data set most likely contains
order values with a
customer_id. When using the
SELECT clause with conditions, rows with the
NULL value will not match against either the = or the <> operator.
Our second query above could be modified as follows to produce the expected results:
SELECT * FROM orders WHERE (customer_id <> 45 OR customer_id IS NULL); -- 977 rows in set (0.11 sec)
4. Why does one of these queries work while the other does not?
Consider the following query, which returns the expected results:
SELECT CASE WHEN (3 IN (1, 2, 3, NULL)) THEN 'Three is here!' ELSE "Three isn't here!" END AS result; /* +----------------+ | result | +----------------+ | Three is here! | +----------------+ 1 row in set (0.00 sec) */
"Three is here!" is shown, because the value 3 is included in the
IN clause. But what about the following query?
SELECT CASE WHEN (3 NOT IN (1, 2, NULL)) THEN "Three isn't here!" ELSE 'Three is here!' END AS result; /* +----------------+ | result | +----------------+ | Three is here! | +----------------+ 1 row in set (0.00 sec) */
Three is not included in the second set — so why does our query mistakenly deliver the output,
"Three is here!"?
The answer, once again, has to do with the way MYSQL handles
NULL values. Let's take a closer look. In our first query, we ask whether the value
3 is included in the set
(1, 2, 3, NULL). Our statement is functionally equivalent to the following:
SELECT CASE WHEN ((3 = 1) OR (3 = 2) OR (3 = 3) OR (3 = NULL)) THEN 'Three is here!' ELSE "Three isn't here!" END AS result; /* +----------------+ | result | +----------------+ | Three is here! | +----------------+ 1 row in set (0.00 sec) */
Since 3 is definitely equal to 3, one of our
OR conditions is met, and the statement outputs,
"Three is here!". Our second statement, on the other hand, asks whether the value
NOT included in the set
(1, 2, NULL). This statement is functionally equivalent to the following:
SELECT CASE WHEN ((3 <> 1) AND (3 <> 2) AND (3 <> NULL)) THEN "Three isn't here!" ELSE "Three is here!" END AS result; /* +----------------+ | result | +----------------+ | Three is here! | +----------------+ 1 row in set (0.00 sec) */
In this case, the conditional check
3 <> NULL fails, because in ANSI-standard SQL, we need to use the
IS NULL statement rather than the
5. Can you construct a basic
orders tables, with the following respective schema:
CREATE TABLE `customers` ( `customer_id` int(11) NOT NULL AUTO_INCREMENT, `first_name` varchar(255) NOT NULL, `last_name` varchar(255) NOT NULL, `email` varchar(255) NOT NULL, `address` varchar(255) DEFAULT NULL, `city` varchar(255) DEFAULT NULL, `state` varchar(2) DEFAULT NULL, `zip_code` varchar(5) DEFAULT NULL, PRIMARY KEY (`customer_id`) ); CREATE TABLE `orders` ( `order_id` int(11) NOT NULL AUTO_INCREMENT, `customer_id` int(11) NOT NULL, `order_placed_date` date NOT NULL, PRIMARY KEY (`order_id`), KEY `customer_id` (`customer_id`), FOREIGN KEY (`customer_id`) REFERENCES `customers` (`customer_id`) );
Can you construct a simple
SELECT statement that uses an
INNER JOIN to combine all information from both the
The answer here is really simple. Here's how we'd do it:
SELECT * FROM orders INNER JOIN customers on orders.customer_id = customers.customer_id;
6. Working with the
We've written a query based on the
orders table above to select all orders from the year 2016. But something is wrong with our query. Can you figure out what it is?
SELECT order_id, customer_id, YEAR(order_placed_date) AS order_year FROM orders WHERE order_year = 2016;
Here's the answer:
order_year is an
alias, meaning that it's being used as another name for a more complex reference:
YEAR(order_placed_date). It turns out that in SQL, aliases can only be referenced in
ORDER BY, and
HAVING clauses — they can't be used in
WHERE clauses. Running the above code will produce the following result:
--ERROR 1054 (42S22): Unknown column 'order_year' in 'where clause'
To fix this problem, we need to reiterate the definition of the
order_year alias in the
WHERE clause like so:
SELECT order_id, customer_id, YEAR(order_placed_date) AS order_year FROM orders WHERE YEAR(order_placed_date) = 2016; --498 rows in set (0.00 sec)
7. Using the
Consider the following database schema:
CREATE TABLE `products` ( `product_id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `price` decimal(19,4) NOT NULL, PRIMARY KEY (`product_id`) ); CREATE TABLE `order_products` ( `order_product_id` int(11) NOT NULL AUTO_INCREMENT, `order_id` int(11) NOT NULL, `product_id` int(11) NOT NULL, PRIMARY KEY (`order_product_id`), KEY `order_id` (`order_id`), KEY `product_id` (`product_id`), FOREIGN KEY (`order_id`) REFERENCES `orders` (`order_id`), FOREIGN KEY (`product_id`) REFERENCES `products` (`product_id`) )
Can you write a query that finds the total order price (e.g., the sum of
product.price from each order) for all
This question is a bit tough, as we'll have to use both the
SUM function and the
GROUP BY clause to aggregate orders by
order_id. Here's how we do it:
SELECT order_id, SUM(price) AS total_order_price FROM order_products INNER JOIN products ON order_products.product_id = products.product_id GROUP BY order_id; --1000 rows in set (0.01 sec)
Looking for more SQL prep? Don't forget to check out Deskbright's resources page!