IN THIS CHAPTER
Pulling data from multiple tables with a single SQL statement
Comparing a value from one table with a set of values from another table
Using the SELECT statement to compare a value from one table with a single value from another table
Comparing a value from one table with all the corresponding values in another table
Making queries that correlate rows in one table with corresponding rows in another table
Determining which rows to update, delete, or insert by using a subquery
One of the best ways to protect your data’s integrity is to avoid modification anomalies (see Chapter 5 for the gory details of those) by normalizing your database. Normalization involves breaking up a single table into multiple tables, each of which has a single theme. You don’t want product information in the same table with customer information, for example, even if the customers have bought products.
If you normalize a database properly, the data is scattered across multiple tables. Most queries that you want to make need to pull data from two or more tables. One way to do this is to use a join operator or one of the other relational operators (UNION
, INTERSECT
, or EXCEPT
). The relational operators take information from multiple tables and combine it all into a single result set. Different operators combine the data in different ways.
Another way to pull data from two or more tables is to use a nested query. In SQL, a nested query is one in which an outer enclosing statement contains within it a subquery. That subquery may serve as an enclosing statement for a lower-level subquery that is nested within it. There are no theoretical limits to the number of nesting levels a nested query may have, but you do face some practical limits that depend on your SQL implementation.
Subqueries are invariably SELECT
statements, but the outermost enclosing statement may also be an INSERT
, UPDATE
, or DELETE
statement.
A subquery can operate on a table other than the table that its enclosing statement operates on, so nested queries give you another way to extract information from multiple tables.
For example, suppose that you want to query your corporate database to find all department managers who are more than 50 years old. With the joins I discuss in Chapter 11, you can use a query like this:
SELECT D.Deptno, D.Name, E.Name, E.Age
FROM DEPT D, EMPLOYEE E
WHERE D.ManagerID = E.ID AND E.Age >50 ;
D
is the alias for the DEPT table, and E
is the alias for the EMPLOYEE table. The EMPLOYEE table has an ID
column that is the primary key, and the DEPT table has a ManagerID
column that is the ID
value of the employee who is the department’s manager. A simple join (the list of tables in the FROM
clause) pairs the related tables, and a WHERE
clause filters out all rows except those that meet the criteria. Note that the SELECT
statement’s parameter list includes the Deptno
and Name
columns from the DEPT table and the Name
and Age
columns from the EMPLOYEE table.
Next, suppose that you’re interested in the same set of rows but you want only the columns from the DEPT table. In other words, you’re interested in the departments whose managers are 50 or older, but you don’t care who those managers are or exactly how old they are. You could then write the query with a subquery rather than a join:
SELECT D.Deptno, D.Name
FROM DEPT D
WHERE EXISTS (SELECT * FROM EMPLOYEE E
WHERE E.ID = D.ManagerID AND E.Age > 50) ;
This query has two new elements: the EXISTS
keyword and the SELECT *
in the WHERE
clause of the inner SELECT
. The inner SELECT
is a subquery (or subselect), and the EXISTS
keyword is one of several tools for use with a subquery that is described in this chapter.
Subqueries are located within the WHERE
clause of their enclosing statement. Their function is to set the search conditions for the WHERE
clause. Each kind of subquery produces a different result. Some subqueries produce a list of values that is then used as input by the enclosing statement. Other subqueries produce a single value that the enclosing statement then evaluates with a comparison operator. A third kind of subquery returns a value of True or False.
To illustrate how a nested query returns a set of rows, imagine that you work for a systems integrator of computer equipment. Your company, Zetec Corporation, assembles systems from components that you buy, and then it sells them to companies and government agencies. You keep track of your business with a relational database. The database consists of many tables, but right now you’re concerned with only three of them: the PRODUCT table, the COMP_USED table, and the COMPONENT table. The PRODUCT table (shown in Table 12-1) contains a list of all your standard products. The COMPONENT table (shown in Table 12-2) lists components that go into your products, and the COMP_USED table (shown in Table 12-3) tracks which components go into each product.
TABLE 12-1 PRODUCT Table
Column |
Type |
Constraints |
|
|
|
|
|
|
|
|
|
|
|
|
TABLE 12-2 COMPONENT Table
Column |
Type |
Constraints |
|
|
|
|
|
|
|
|
|
TABLE 12-3 COMP_USED Table
Column |
Type |
Constraints |
|
|
|
|
|
|
A component may be used in multiple products, and a product can contain multiple components (a many-to-many relationship). This situation can cause integrity problems. To circumvent the problems, create the linking table COMP_ USED to relate COMPONENT to PRODUCT. A component may appear in many rows in the COMP_USED table, but each of those rows will reference only one component (a one-to-many relationship). Similarly, a product may appear in many rows in COMP_USED, but each row references only one product (another one-to-many relationship). Adding the linking table transforms a troublesome many-to-many relationship into two relatively simple one-to-many relationships. This process of reducing the complexity of relationships is one example of normalization.
One form of a nested query compares a single value with the set of values returned by a SELECT
statement. It uses the IN
predicate with the following syntax:
SELECT column_list
FROM table
WHERE expression IN (subquery) ;
The expression in the WHERE
clause evaluates to a value. If that value is IN
the list returned by the subquery, then the WHERE
clause returns a True value. The specified columns from the table row being processed are added to the result table. The subquery may reference the same table referenced by the outer query, or it may reference a different table.
In the following example, I use Zetec’s database to demonstrate this type of query. Assume that there’s a shortage of computer monitors in the computer industry, so that when you run out of monitors, you can no longer deliver products that include them. You want to know which products are affected. Glancing gratefully at your own monitor, enter the following query:
SELECT Model
FROM COMP_USED
WHERE CompID IN
(SELECT CompID
FROM COMPONENT
WHERE CompType = 'Monitor') ;
SQL processes the innermost query first, so it processes the COMPONENT table, returning the value of CompID
for every row where CompType
is 'Monitor'
. The result is a list of the ID
numbers of all monitors. The outer query then compares the value of CompID
in every row in the COMP_USED table against the list. If the comparison is successful, the value of the Model
column for that row is added to the outer SELECT
‘s result table. The result is a list of all product models that include a monitor. The following example shows what happens when you run the query:
Model
-----
CX3000
CX3010
CX3020
MB3030
MX3020
MX3030
You now know which products will soon be out of stock. It’s time to go to the sales force and tell them to slow down on promoting these products.
When you use this form of nested query, the subquery must specify a single column, and that column’s data type must match the data type of the argument preceding the IN
keyword.
I’m sure you remember the KISS principle. Keeping things simple is important when you’re dealing with software of any kind, but it is especially important when dealing with database software. Statements that include nested SELECT
s can be difficult to get right. One way to get them working the way they should is to run the inner SELECT
all by itself first and then verify that the result you get is the result you expect. When you’re sure the inner SELECT
is functioning properly, you can enclose it in the outer part of the statement and have a better chance that the whole thing will work as advertised.
Just as you can introduce a subquery with the IN
keyword, you can do the opposite and introduce it with the NOT IN
keywords. In fact, now is a great time for Zetec management to make such a query. By using the query in the preceding section, Zetec management found out what products not to sell. That is valuable information, but it doesn’t pay the rent. What Zetec management really wants to know is what products to sell. Management wants to emphasize the sale of products that don’t contain monitors. A nested query featuring a subquery introduced by the NOT IN
keywords provides the requested information:
SELECT Model
FROM COMP_USED
WHERE CompID NOT IN
(SELECT CompID
FROM COMPONENT
WHERE CompType = 'Monitor')) ;
This query produces the following result:
Model
-----
PX3040
PB3050
PX3040
PB3050
Worth noting is the fact that the result set contains duplicates. The duplication occurs because a product containing several components that are not monitors has a row in the COMP_USED table for each component. The query creates an entry in the result table for each of those rows.
In the example, the number of rows does not create a problem because the result table is short. In the real world, however, such a result table may have hundreds or thousands of rows. To avoid confusion, it’s best to eliminate the duplicates. You can do so easily by adding the DISTINCT
keyword to the query. Only rows that are distinct (different) from all previously retrieved rows are added to the result table:
SELECT DISTINCT Model
FROM COMP_USED
WHERE CompID NOT IN
(SELECT CompID
FROM COMPONENT
WHERE CompType = 'Monitor')) ;
As expected, the result is as follows:
Model
-----
PX3040
PB3050
Introducing a subquery with one of the six comparison operators (=
, <>
, <
,<=
, >
, >=
) is often useful. In such a case, the expression preceding the operator evaluates to a single value, and the subquery following the operator must also evaluate to a single value. An exception is the case of the quantified comparison operator, which is a comparison operator followed by a quantifier (ANY
, SOME
, or ALL
).
To illustrate a case in which a subquery returns a single value, look at another piece of Zetec Corporation’s database. It contains a CUSTOMER table that holds information about the companies that buy Zetec products. It also contains a CONTACT table that holds personal data about individuals at each of Zetec’s customer organizations. The tables are structured as shown in Tables 12-4 and 12-5.
TABLE 12-4 CUSTOMER Table
Column |
Type |
Constraints |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TABLE 12-5 CONTACT Table
Column |
Type |
Constraints |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Say that you want to look at the contact information for Olympic Sales, but you don’t remember that company’s CustID
. Use a nested query like this one to recover the information you want:
SELECT *
FROM CONTACT
WHERE CustID =
(SELECT CustID
FROM CUSTOMER
WHERE Company = 'Olympic Sales') ;
The result looks something like this:
CustID ContFName ContLName ContPhone ContInfo
------ --------- --------- --------- --------
118 Jerry Attwater 505-876-3456 Will play
major role in
additive
manufacturing.
You can now call Jerry at Olympic and tell him about this month’s special sale on 3D printers.
When you use a subquery in an “=
” comparison, the subquery’s SELECT
list must specify a single column (CustID
in the example). When the subquery is executed, it must return a single row in order to have a single value for the comparison.
In this example, I assume that the CUSTOMER table has only one row with a Company
value of 'Olympic Sales'
. The CREATE TABLE
statement for CUSTOMER specifies a UNIQUE
constraint for Company
, and this statement guarantees that the subquery in the preceding example returns a single value (or no value). Subqueries like the one in this example, however, are commonly used on columns that are not specified to be UNIQUE
. In such cases, you must rely on prior knowledge of the database contents for believing that the column has no duplicates.
If more than one customer has a value of 'Olympic Sales'
in the Company
column (perhaps in different states), the subquery raises an error.
If no customer with such a company name exists, the subquery is treated as if it was null, and the comparison becomes unknown. In this case, the WHERE
clause returns no row (because it returns only rows with the condition True and filters rows with the condition False or unknown). This would probably happen, for example, if someone misspelled the Company
as 'Olumpic Sales'
.
Although the equal operator (=
) is the most common, you can use any of the other five comparison operators in a similar structure. For every row in the table specified in the enclosing statement’s FROM
clause, the single value returned by the subquery is compared with the expression in the enclosing statement’s WHERE
clause. If the comparison gives a True value, a row is added to the result table.
You can guarantee that a subquery will return a single value if you include an aggregate function in it. Aggregate functions always return a single value. (Aggregate functions are described in Chapter 3.) Of course, this way of returning a single value is helpful only if you want the result of an aggregate function.
Suppose you are a Zetec salesperson and you need to earn a big commission check to pay for some unexpected bills. You decide to concentrate on selling Zetec’s most expensive product. You can find out what that product is with a nested query:
SELECT Model, ProdName, ListPrice
FROM PRODUCT
WHERE ListPrice =
(SELECT MAX(ListPrice)
FROM PRODUCT) ;
In the preceding nested query, both the subquery and the enclosing statement operate on the same table. The subquery returns a single value: the maximum list price in the PRODUCT table. The outer query retrieves all rows from the PRODUCT table that have that list price.
The next example shows a comparison subquery that uses a comparison operator other than =
:
SELECT Model, ProdName, ListPrice
FROM PRODUCT
WHERE ListPrice <
(SELECT AVG(ListPrice)
FROM PRODUCT) ;
The subquery returns a single value: the average list price in the PRODUCT table. The outer query retrieves all rows from the PRODUCT table that have a lower list price than the average list price.
In the original SQL standard, a comparison could have only one subquery, and it had to be on the right side of the comparison. SQL:1999 allowed either or both operands of the comparison to be subqueries, and later versions of SQL retain that expansion of capability.
Another way to make sure that a subquery returns a single value is to introduce it with a quantified comparison operator. The universal quantifier ALL
, and the existential quantifiers SOME
and ANY
, when combined with a comparison operator, process the list returned by a subquery, reducing it to a single value.
You’ll see how these quantifiers affect a comparison by looking at the baseball pitchers’ complete game database from Chapter 11, which is listed next.
The contents of the two tables are given by the following two queries:
SELECT * FROM NATIONAL
FirstName LastName CompleteGames
--------- -------- -------------
Sal Maglie 11
Don Newcombe 9
Sandy Koufax 13
Don Drysdale 12
Bob Turley 8
SELECT * FROM AMERICAN
FirstName LastName CompleteGames
--------- -------- -------------
Whitey Ford 12
Don Larson 10
Bob Turley 8
Allie Reynolds 14
The presumption is that the pitchers with the most complete games should be in the American League because of the presence of designated hitters in that league. One way to verify this presumption is to build a query that returns all American League pitchers who have thrown more complete games than all the National League pitchers. The query can be formulated as follows:
SELECT *
FROM AMERICAN
WHERE CompleteGames > ALL
(SELECT CompleteGames FROM NATIONAL) ;
This is the result:
FirstName LastName CompleteGames
---------- --------- --------------
Allie Reynolds 14
The subquery (SELECT CompleteGames FROM NATIONAL
) returns the values in the CompleteGames
column for all National League pitchers. The > ALL
quantifier says to return only those values of CompleteGames
in the AMERICAN table that are greater than each of the values returned by the subquery. This condition translates into “greater than the highest value returned by the subquery.” In this case, the highest value returned by the subquery is 13 (Sandy Koufax). The only row in the AMERICAN table higher than that is Allie Reynolds’s record, with 14 complete games.
What if your initial presumption was wrong? What if the major-league leader in complete games was a National League pitcher, in spite of the fact that the National League has no designated hitter? If that was the case, the query
SELECT *
FROM AMERICAN
WHERE CompleteGames > ALL
(SELECT CompleteGames FROM NATIONAL) ;
would return a warning that no rows satisfy the query’s conditions — meaning that no American League pitcher has thrown more complete games than the pitcher who has thrown the most complete games in the National League.
A query returns data from all table rows that satisfy the query’s conditions. Sometimes many rows are returned; sometimes only one comes back. Sometimes none of the rows in the table satisfy the conditions, and no rows are returned. You can use the EXISTS
and NOT EXISTS
predicates to introduce a subquery. That structure tells you whether any rows in the table located in the subquery’s FROM
clause meet the conditions in its WHERE
clause.
Subqueries introduced with EXISTS
and NOT EXISTS
are fundamentally different from the other subqueries in this chapter so far. In all the previous cases, SQL first executes the subquery and then applies that operation’s result to the enclosing statement. EXISTS
and NOT EXISTS
subqueries, on the other hand, are examples of correlated subqueries.
A correlated subquery first finds the table and row specified by the enclosing statement and then executes the subquery on the row in the subquery’s table that correlates with the current row of the enclosing statement’s table.
The subquery either returns one or more rows or it returns none. If it returns at least one row, the EXISTS
predicate succeeds (see the following section), and the enclosing statement performs its action. In the same circumstances, the NOT EXISTS
predicate fails (see the section after that), and the enclosing statement does not perform its action. After one row of the enclosing statement’s table is processed, the same operation is performed on the next row. This action is repeated until every row in the enclosing statement’s table has been processed.
Suppose you are a salesperson for Zetec Corporation and you want to call your primary contact people at all of Zetec’s customer organizations in California. Try the following query:
SELECT *
FROM CONTACT
WHERE EXISTS
(SELECT *
FROM CUSTOMER
WHERE CustState = 'CA'
AND CONTACT.CustID = CUSTOMER.CustID) ;
Notice the reference to CONTACT.CustID
, which is referencing a column from the outer query and comparing it with another column, CUSTOMER.CustID
, from the inner query. For each candidate row of the outer query, you evaluate the inner query, using the CustID
value from the current CONTACT
row of the outer query in the WHERE
clause of the inner query.
Here’s what happens:
CustID
column links the CONTACT table to the CUSTOMER table.CustID
, and checks that row’s CustState
field.CUSTOMER.CustState
= 'CA'
, the current CONTACT
row is added to the result table.CONTACT
record is then processed in the same way, and so on, until the entire CONTACT table has been processed.SELECT * FROM CONTACT
, all the contact table’s fields are returned, including the contact’s name and phone number.In the previous example, the Zetec salesperson wants to know the names and numbers of the contact people of all the customers in California. Imagine that a second salesperson is responsible for all of the United States except California. She can retrieve her contact people by using NOT EXISTS
in a query similar to the preceding one:
SELECT *
FROM CONTACT
WHERE NOT EXISTS
(SELECT *
FROM CUSTOMER
WHERE CustState = 'CA'
AND CONTACT.CustID = CUSTOMER.CustID) ;
Every row in CONTACT for which the subquery does not return a row is added to the result table.
As noted in a previous section of this chapter, subqueries introduced by IN
or by a comparison operator need not be correlated queries, but they can be.
In the earlier section “Subqueries introduced by the keyword IN
,” I discuss how a noncorrelated subquery can be used with the IN
predicate. To show how a correlated subquery may use the IN
predicate, ask the same question that came up with the EXISTS
predicate: What are the names and phone numbers of the contacts at all of Zetec’s customers in California? You can answer this question with a correlated IN
subquery:
SELECT *
FROM CONTACT
WHERE 'CA' IN
(SELECT CustState
FROM CUSTOMER
WHERE CONTACT.CustID = CUSTOMER.CustID) ;
The statement is evaluated for each record in the CONTACT table. If, for that record, the CustID
numbers in CONTACT and CUSTOMER match, then the value of CUSTOMER.CustState
is compared to 'CA'
. The result of the subquery is a list that contains, at most, one element. If that one element is 'CA'
, the WHERE
clause of the enclosing statement is satisfied, and a row is added to the query’s result table.
A correlated subquery can also be introduced by one of the six comparison operators, as shown in the next example.
Zetec pays bonuses to its salespeople based on their total monthly sales volume. The higher the volume is, the higher the bonus percentage is. The bonus percentage list is kept in the BONUSRATE table:
MinAmount MaxAmount BonusPct
--------- --------- --------
0.00 24999.99 0.
25000.00 49999.99 0.1
50000.00 99999.99 0.2
100000.00 249999.99 0.3
250000.00 499999.99 0.4
500000.00 749999.99 0.5
750000.00 999999.99 0.6
If a person’s monthly sales are between $100,000.00 and $249,999.99, the bonus is 0.3 percent of sales.
Sales are recorded in a transaction master table named TRANSMASTER:
TRANSMASTER
-----------
Column Type Constraints
------ ---- -----------
TransID INTEGER PRIMARY KEY
CustID INTEGER FOREIGN KEY
EmpID INTEGER FOREIGN KEY
TransDate DATE
NetAmount NUMERIC
Freight NUMERIC
Tax NUMERIC
InvoiceTotal NUMERIC
Sales bonuses are based on the sum of the NetAmount
field for all of a person’s transactions in the month. You can find any person’s bonus rate with a correlated subquery that uses comparison operators:
SELECT BonusPct
FROM BONUSRATE
WHERE MinAmount <=
(SELECT SUM (NetAmount)
FROM TRANSMASTER
WHERE EmpID = 133)
AND MaxAmount >=
(SELECT SUM (NetAmount)
FROM TRANSMASTER
WHERE EmpID = 133) ;
This query is interesting in that it contains two subqueries, making use of the logical connective AND
. The subqueries use the SUM
aggregate operator, which returns a single value: the total monthly sales of employee number 133. That value is then compared against the MinAmount
and the MaxAmount
columns in the BONUSRATE table, producing the bonus rate for that employee.
If you had not known the EmpID
but had known the EmplName
, you could arrive at the same answer with a more complex query:
SELECT BonusPct
FROM BONUSRATE
WHERE MinAmount <=
(SELECT SUM (NetAmount)
FROM TRANSMASTER
WHERE EmpID =
(SELECT EmpID
FROM EMPLOYEE
WHERE EmplName = 'Coffin'))
AND MaxAmount >=
(SELECT SUM (NetAmount)
FROM TRANSMASTER
WHERE EmpID =
(SELECT EmpID
FROM EMPLOYEE
WHERE EmplName = 'Coffin'));
This example uses subqueries nested within subqueries, which, in turn, are nested within an enclosing query to arrive at the bonus rate for the employee named Coffin. This structure works only if you know for sure that the company has one, and only one, employee whose last name is Coffin. If you know that more than one employee has the same last name, you can add terms to the WHERE
clause of the innermost subquery until you’re sure that only one row of the EMPLOYEE table is selected.
You can have a correlated subquery in a HAVING
clause just as you can in a WHERE
clause. As I mention in Chapter 10, a HAVING
clause is usually preceded by a GROUP BY
clause. The HAVING
clause acts as a filter to restrict the groups created by the GROUP BY
clause. Groups that don’t satisfy the condition of the HAVING
clause are not included in the result. When used this way, the HAVING
clause is evaluated for each group created by the GROUP BY
clause.
In the absence of a GROUP BY
clause, the HAVING
clause is evaluated for the set of rows passed by the WHERE
clause — which is considered to be a single group. If neither a WHERE
clause nor a GROUP BY
clause is present, the HAVING
clause is evaluated for the entire table:
SELECT TM1.EmpID
FROM TRANSMASTER TM1
GROUP BY TM1.Department
HAVING MAX (TM1.NetAmount) >= ALL
(SELECT 2 * AVG (TM2.NetAmount)
FROM TRANSMASTER TM2
WHERE TM1.EmpID <> TM2.EmpID) ;
This query uses two aliases for the same table, enabling you to retrieve the EmpID
number of all salespeople who had a sale of at least twice the average sale of all the other salespeople. The query works as follows:
SELECT
, FROM
, and GROUP BY
clauses.HAVING
clause filters these groups. For each group, it calculates the MAX
of the NetAmount
column for the rows in that group.The inner query evaluates twice the average NetAmount
from all rows of TRANSMASTER whose EmpID
is different from the EmpID
of the current group of the outer query.
In the last line, you have to reference two different EmpID
values — so you use different aliases for TRANSMASTER in the FROM
clauses of the outer and inner queries.
EmpID
from the current row of the inner subquery (TM2.EmpID
) and the EmpID
from the current group of the outer subquery (TM1.EmpID
).In addition to SELECT
statements, UPDATE
, DELETE
, and INSERT
statements can also include WHERE
clauses. Those WHERE
clauses can contain subqueries in the same way that SELECT
statements’ WHERE
clauses do.
For example, Zetec has just made a volume purchase deal with Olympic Sales and wants to provide Olympic with a retroactive 10 percent credit for all its purchases in the last month. You can give this credit with an UPDATE
statement:
UPDATE TRANSMASTER
SET NetAmount = NetAmount * 0.9
WHERE SaleDate > (CurrentDate – 30) DAY AND CustID =
(SELECT CustID
FROM CUSTOMER
WHERE Company = 'Olympic Sales') ;
You can also have a correlated subquery in an UPDATE
statement. Suppose the CUSTOMER table has a column LastMonthsMax
, and Zetec wants to give such a credit for purchases that exceed LastMonthsMax
for the customer:
UPDATE TRANSMASTER TM
SET NetAmount = NetAmount * 0.9
WHERE NetAmount>
(SELECT LastMonthsMax
FROM CUSTOMER C
WHERE C.CustID = TM.CustID) ;
Note that this subquery is correlated: The WHERE
clause in the last line references both the CustID
of the CUSTOMER row from the subquery and the CustID
of the current TRANSMASTER row that is a candidate for updating.
A subquery in an UPDATE
statement can also reference the table that is being updated. Suppose that Zetec wants to give a 10 percent credit to customers whose purchases have exceeded $10,000:
UPDATE TRANSMASTER TM1
SET NetAmount = NetAmount * 0.9
WHERE 10000 < (SELECT SUM(NetAmount)
FROM TRANSMASTER TM2
WHERE TM1.CustID = TM2.CustID);
The inner subquery calculates the SUM
of the NetAmount
column for all TRANSMASTER rows for the same customer. What does this mean? Suppose the customer with CustID = 37
has four rows in TRANSMASTER with values for NetAmount
: 3000
, 5000
, 2000
, and 1000
. The SUM
of NetAmount
for this CustID
is 11000
.
The order in which the UPDATE
statement processes the rows is defined by your implementation and is generally not predictable. The order may differ depending on how the rows are arranged on the disk. Assume that the implementation processes the rows for this CustID
in this order: first the TRANSMASTER with a NetAmount
of 3000
, then the one with NetAmount
= 5000
, and so on. After the first three rows for CustID 37
have been updated, their NetAmount
values are 2700
(90 percent of $3,000), 4500
(90 percent of $5,000), and 1800
(90 percent of $2,000). Then, when you process the last TRANSMASTER row for CustID 37
(whose NetAmount
is 1000
), the SUM
returned by the subquery would seem to be 10000
— that is, the SUM
of the new NetAmount
values of the first three rows for CustID 37
, and the old NetAmount
value of the last row for CustID 37
. Thus it would seem that the last row for CustID 37
isn’t updated, because the comparison with that SUM
is not True — after all, 10000
is not less than 10000
. But that is not how the UPDATE
statement is defined when a subquery references the table that is being updated.
All evaluations of subqueries in an UPDATE
statement reference the old values of the table — the ones that are being updated. In the preceding UPDATE
for CustID 37
, the subquery returns 11000
— the original SUM
.
The subquery in a WHERE
clause operates the same as a SELECT
statement or an UPDATE
statement. The same is true for DELETE
and INSERT
. To delete all of Olympic’s transactions, use this statement:
DELETE FROM TRANSMASTER
WHERE CustID =
(SELECT CustID
FROM CUSTOMER
WHERE Company = 'Olympic Sales') ;
As with UPDATE
, DELETE
subqueries can also be correlated and can also reference the table being deleted. The rules are similar to the rules for UPDATE
subqueries. Suppose you want to delete all rows from TRANSMASTER for customers whose total NetAmount
is larger than $10,000:
DELETE FROM TRANSMASTER TM1
WHERE 10000 < (SELECT SUM(NetAmount)
FROM TRANSMASTER TM2
WHERE TM1.CustID = TM2.CustID) ;
This query deletes all rows from TRANSMASTER that have CustID 37
, as well as any other customers with purchases exceeding $10,000. All references to TRANSMASTER in the subquery denote the contents of TRANSMASTER before any deletions by the current statement. So even when you’re deleting the last TRANSMASTER row for CustID 37
, the subquery is evaluated on the original TRANSMASTER table and returns 11000
.
When you update, delete, or insert database records, you risk making a table’s data inconsistent with other tables in the database. Such an inconsistency is called a modification anomaly, discussed in Chapter 5. If you delete TRANSMASTER records and a TRANSDETAIL table depends on TRANSMASTER, you must delete the corresponding records from TRANSDETAIL, too. This operation is called a cascading delete, because the deletion of a parent record must cascade to its associated child records. Otherwise the undeleted child records become orphans. In this case, they would be invoice detail lines that are in limbo because they are no longer connected to an invoice record.
If your implementation of SQL doesn’t support cascading deletes, you must do the deletions yourself. In this case, delete the appropriate records from the child table before deleting the corresponding record from the parent. That way, you don’t have orphan records in the child table, even for a second.
In the preceding section, I show how an UPDATE
, DELETE
, or INSERT
statement can include a nested SELECT
statement within a WHERE
clause. SQL:2011 introduced a related capability, in which a data manipulation command (such as UPDATE
, INSERT
, DELETE
, or MERGE
statements) can be nested within a SELECT
statement. This capability is called pipelined DML.
One way to look at a data change operation is to envision a table before it is changed with a DELETE
, INSERT
, or UPDATE
operation. You could call the table before the change the old table and the table after the change, the new table. During the data change operation, auxiliary tables, called delta tables, are created. A DELETE
operation creates an old delta table, which contains the rows to be deleted. An INSERT
operation creates a new delta table, which contains the rows to be inserted. An UPDATE
operation would create both an old and a new delta table, the old for the rows being replaced and the new for the rows replacing them.
With pipelined DML, you can retrieve the information in the delta tables. Suppose you want to delete from your product line all products with ProductIDs between 1000 and 1399, and you want a record of exactly which products in that range are deleted. You could use the following code:
SELECT Oldtable.ProductID
FROM OLD TABLE (DELETE FROM Product
WHERE ProductID BETWEEN 1000 AND 1399)
AS Oldtable ;
In this example, the keywords OLD TABLE
specify that the result of the SELECT
is coming from the old delta table. The result is the list of ProductID numbers for the products that are being deleted.
Similarly, you could retrieve a list from the new delta table by using the NEW TABLE
keywords, which displays the Product ID numbers of rows inserted by an INSERT
operation or updated by an UPDATE
operation. Because an UPDATE
operation created both an old delta table and a new delta table, you can retrieve the contents of either or both by using pipelined DML.