IN THIS CHAPTER
Using the CASE conditional expressions
Converting a data item from one data type to another
Saving data-entry time by using row value expressions
SQL is described in Chapter 2 as a data sublanguage. In fact, the sole function of SQL is to operate on data in a database. SQL lacks many of the features of a conventional procedural language. As a result, developers who use SQL must switch back and forth between SQL and its host language to control the flow of execution. This repeated switching complicates matters at development time and negatively affects performance at run time.
The performance penalty exacted by SQL’s limitations prompts the addition of new features to SQL every time a new version of the international specification is released. One of those added features, the CASE
expression, provides a long-sought conditional structure. A second feature, the CAST
expression, facilitates data conversion in a table from one type of data to another. A third feature, the row value expression, enables you to operate on a list of values where previously you could operate only on a single value. For example, if your list of values is a list of columns in a table, you can now perform an operation on all those columns by using a very simple syntax.
Every complete computer language has some sort of conditional statement or command. In fact, most have several. Probably the most common conditional statement or command is the IF…THEN…ELSE…ENDIF
structure. If the condition following the IF
keyword evaluates to True, the block of commands following the THEN
keyword executes. If the condition doesn’t evaluate to True, the block of commands after the ELSE
keyword executes. The ENDIF
keyword signals the end of the structure. This structure is great for any decision that goes one of two ways. The structure doesn’t work as well for decisions that can have more than two possible outcomes.
Most complete languages have a CASE
statement that handles situations in which you may want to perform more than two tasks based on more than two possible values of a condition.
SQL has a CASE
statement and a CASE
expression. A CASE
expression is only part of a statement — not a statement in its own right. In SQL, you can place a CASE
expression almost anywhere a value is legal. At run time, a CASE
expression evaluates to a value. SQL’s CASE
statement doesn’t evaluate to a value; rather, it causes a block of statements to be executed.
The CASE
expression searches a table, one row at a time, taking on the value of a specified result whenever one of a list of conditions is True. If the first condition is not satisfied for a row, the second condition is tested — and if it is True, the result specified for it is given to the expression, and so on until all conditions are processed. If no match is found, the expression takes on a NULL value. Processing then moves to the next row.
You can use the CASE
expression in the following two ways:
CASE
searches for rows in a table where specified conditions are True. If CASE
finds a search condition to be True for a table row, the statement containing the CASE
expression makes a specified change to that row.CASE
expression depends on which of several specified values in the table field is equal to each table row.The next two sections, “Using CASE
with search conditions” and “Using CASE
with values,” help clarify these concepts. In the first section, two examples use CASE
with search conditions. One example searches a table and makes changes to table values, based on a condition. The second section explores two examples of the value form of CASE
.
One powerful way to use the CASE
expression is to search a table for rows in which a specified search condition is True. If you use CASE
this way, the expression uses the following syntax:
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
…
WHEN conditionn THEN resultn
ELSE resultx
END
CASE
examines the first qualifying row (the first row that meets the conditions of the enclosing WHERE
clause, if any) to see whether condition1
is True. If it is, the CASE
expression receives a value of result1
. If condition1
is not True, CASE
evaluates the row for condition2
. If condition2
is True, the CASE
expression receives the value of result2
, and so on. If none of the stated conditions are True, the CASE
expression receives the value of result
x
. The ELSE
clause is optional. If the expression has no ELSE
clause and none of the specified conditions are True, the expression receives a null value. After the SQL statement containing the CASE
expression applies itself to the first qualifying row in a table and takes the appropriate action, it processes the next row. This sequence continues until the SQL statement finishes processing the entire table.
Because you can embed a CASE
expression within an SQL statement almost anywhere a value is possible, this expression gives you tremendous flexibility. You can use CASE
within an UPDATE
statement, for example, to make changes to table values — based on certain conditions. Consider the following example:
UPDATE FOODS
SET RATING = CASE
WHEN FAT < 1
THEN 'very low fat'
WHEN FAT < 5
THEN 'low fat'
WHEN FAT < 20
THEN 'moderate fat'
WHEN FAT < 50
THEN 'high fat'
ELSE 'heart attack city'
END ;
This statement evaluates the WHEN
conditions in order until the first True value is returned, after which the statement ignores the rest of the conditions.
Table 8-2 in Chapter 8 shows the fat content of 100 grams of certain foods. A database table holding this information can contain a RATING
column that gives a quick assessment of the fat content’s meaning. If you run the preceding UPDATE
on the FOODS table in Chapter 8, the statement assigns asparagus a value of very low fat
, gives chicken a value of low fat
, and puts butter in the heart attack city
category.
Another valuable use of CASE
is exception avoidance — checking for conditions that cause errors.
Consider a case that determines compensation for salespeople. Companies that compensate their salespeople by straight commission often pay their new employees by giving them a draw against the future commissions they’re expected to earn. In the following example, new salespeople receive a draw against commission; the draw is phased out gradually as their commissions rise:
UPDATE SALES_COMP
SET COMP = COMMISSION + CASE
WHEN COMMISSION > DRAW
THEN 0
WHEN COMMISSION < DRAW
THEN DRAW
END ;
If the salesperson’s commission is zero, the structure in this example avoids a division-by-zero operation, which would cause an error if allowed to happen. If the salesperson has a nonzero commission, the total compensation is the commission plus a draw that’s reduced in proportion to the size of the commission.
All the THEN
expressions in a CASE
expression must be of the same type — all numeric, all character, or all date. The result of the CASE
expression is also of the same type.
You can use a more compact form of the CASE
expression if you’re comparing a test value for equality with a series of other values. This form is useful within a SELECT
or UPDATE
statement if a table contains a limited number of values in a column and you want to associate a corresponding result value to each of those column values. If you use CASE
in this way, the expression has the following syntax:
CASE test_value
WHEN value1 THEN result1
WHEN value2 THEN result2
…
WHEN valuen THEN resultn
ELSE resultx
END
If the test value (test_value
) is equal to value1
, then the expression takes on the value result1
. If test_value
is not equal to value1
but is equal to value2
, then the expression takes on the value result2
. The expression tries each comparison value in turn, all the way down to value
n
, until it achieves a match. If none of the comparison values equal the test value, then the expression takes on the value result
x
. Again, if the optional ELSE
clause isn’t present and none of the comparison values match the test value, the expression receives a null value.
To understand how the value form works, consider a case in which you have a table containing the names and ranks of various military officers. You want to list the names preceded by the correct abbreviation for each rank. The following statement does the job:
SELECT CASE RANK
WHEN 'general' THEN 'Gen.'
WHEN 'colonel' THEN 'Col.'
WHEN 'lieutenant colonel' THEN 'Lt. Col.'
WHEN 'major' THEN 'Maj.'
WHEN 'captain' THEN 'Capt.'
WHEN 'first lieutenant' THEN '1st. Lt.'
WHEN 'second lieutenant' THEN '2nd. Lt.'
ELSE NULL
END,
LAST_NAME
FROM OFFICERS ;
The result is a list similar to the following example:
Capt. Midnight
Col. Sanders
Gen. Washington
Maj. Disaster
Nimitz
Chester Nimitz was an admiral in the United States Navy during World War II. Because his rank isn’t listed in the CASE
expression, the ELSE
clause doesn’t give him a title.
For another example, suppose Captain Midnight gets a promotion to major and you want to update the OFFICERS database accordingly. Assume that the variable officer_last_name
contains the value 'Midnight'
and that the variable new_rank
contains an integer (4
) that corresponds to Midnight’s new rank, according to the following table.
new_rank |
Rank |
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
You can record the promotion by using the following SQL code:
UPDATE OFFICERS
SET RANK = CASE :new_rank
WHEN 1 THEN 'general'
WHEN 2 THEN 'colonel'
WHEN 3 THEN 'lieutenant colonel'
WHEN 4 THEN 'major'
WHEN 5 THEN 'captain'
WHEN 6 THEN 'first lieutenant'
WHEN 7 THEN 'second lieutenant'
WHEN 8 THEN NULL
END
WHERE LAST_NAME = :officer_last_name ;
An alternative syntax for the CASE
expression with values is:
CASE
WHEN test_value = value1 THEN result1
WHEN test_value = value2 THEN result2
…
WHEN test_value = valuen THEN resultn
ELSE resultx
END
The one thing you can be sure of in this world is change. Sometimes things change from one known state to another. Other times, you think you know something but later you find out you didn’t know it after all. Classical thermodynamics and modern chaos theory both tell us that systems naturally migrate from a well-known, ordered state into a disordered state that no one can predict. Anyone who has ever monitored the status of a teenager’s room for a one-week period after the room is cleaned can vouch for the accuracy of these theories.
Database tables have definite values in fields containing known contents. Usually, if the value of a field is unknown, the field contains the null value. In SQL, you can use a CASE
expression to change the contents of a table field from a definite value to a null value. The null value indicates that you no longer know the field’s value. Consider the following example.
Imagine that you own a small airline that offers flights between Southern California and Washington state. Until recently, some of your flights stopped at San Jose International Airport to refuel before continuing. Unfortunately, you just lost your right to fly into San Jose. From now on, you must make your refueling stop at either San Francisco International Airport or Oakland International Airport. At this point, you don’t know which flights stop at which airport, but you do know that none of the flights are stopping at San Jose. You have a FLIGHT database that contains important information about your routes, and now you want to update the database to remove all references to San Jose. The following example shows one way to do this:
UPDATE FLIGHT
SET RefuelStop = CASE
WHEN RefuelStop = 'San Jose'
THEN NULL
ELSE RefuelStop
END ;
Because occasions like this one — in which you want to replace a known value with a null value — frequently arise, SQL offers a shorthand notation to accomplish this task. The preceding example, expressed in this shorthand form, looks like this:
UPDATE FLIGHT
SET RefuelStop = NULLIF(RefuelStop, 'San Jose') ;
You can translate this expression to English as, “Update the FLIGHT table by setting the RefuelStop
column to null if the existing value of RefuelStop
is 'San Jose'
. Otherwise make no change.”
NULLIF
is even handier if you’re converting data that you originally accumulated for use with a program written in a standard programming language such as C++ or Java. Standard programming languages don’t have nulls, so a common practice is to use special values to represent the concept of “not known” or “not applicable”. A numeric –1
may represent a not-known value for SALARY
, for example, and a character string "***"
may represent a not-known or not-applicable value for JOBCODE
. If you want to represent these not-known and not-applicable states in an SQL-compatible database by using nulls, you must convert the special values to nulls. The following example makes this conversion for an employee table, in which some salary values are unknown:
UPDATE EMP
SET Salary = CASE Salary
WHEN -1 THEN NULL
ELSE Salary
END ;
You can perform this conversion more conveniently by using NULLIF
, as follows:
UPDATE EMP
SET Salary = NULLIF(Salary, -1) ;
COALESCE
, like NULLIF
, is a shorthand form of a particular CASE
expression. COALESCE
deals with a list of values that may or may not be null. Here’s how it works:
COALESCE
expression takes on that value.A CASE
expression with this function has the following form:
CASE
WHEN value1 IS NOT NULL
THEN value1
WHEN value2 IS NOT NULL
THEN value2
…
WHEN valuen IS NOT NULL
THEN valuen
ELSE NULL
END
The corresponding COALESCE
shorthand looks like this:
COALESCE(value1, value2, …, valuen)
You may want to use a COALESCE
expression after you perform an OUTER JOIN
operation (discussed in Chapter 11). In such cases, COALESCE
can save you a lot of typing.
Chapter 2 covers the data types that SQL recognizes and supports. Ideally, each column in a database table has a perfect choice of data type. In this non-ideal world, however, exactly what that perfect choice may be isn’t always clear. In defining a database table, suppose you assign a data type to a column that works perfectly for your current application. Suppose that later on you want to expand your application’s scope — or write an entirely new application that uses the data differently. This new use could require a data type different from the one you originally chose.
You may want to compare a column of one type in one table with a column of a different type in a different table. For example, you could have dates stored as character data in one table and as date data in another table. Even if both columns contain the same sort of information (dates, for example), the fact that the types are different may prevent you from making the comparison. In the earliest SQL standards, SQL-86 and SQL-89, type incompatibility posed a big problem. SQL-92, however, introduced an easy-to-use solution in the CAST
expression.
The CAST
expression converts table data or host variables of one type to another type. After you make the conversion, you can proceed with the operation or analysis that you originally envisioned.
Naturally, you face some restrictions when using the CAST
expression. You can’t just indiscriminately convert data of any type into any other type. The data that you’re converting must be compatible with the new data type. You can, for example, use CAST
to convert the CHAR(10)
character string '201 7-04-26'
to the DATE
type. But you can’t use CAST
to convert the CHAR(10)
character string 'rhinoceros'
to the DATE
type. You can’t convert an INTEGER
to the SMALLINT
type if the former exceeds the maximum size of a SMALLINT
.
You can convert an item of any character type to any other type (such as numeric or date) provided the item’s value has the form of a literal of the new type. Conversely, you can convert an item of any type to any of the character types, provided the value of the item has the form of a literal of the original type.
The following list describes some additional conversions you can make:
INTERVAL DAY
or INTERVAL SECOND
.DATE
to a TIMESTAMP
. The time part of the TIMESTAMP
fills in with zeros.TIME
to a TIME
with a different fractional-seconds precision or a TIMESTAMP
. The date part of the TIMESTAMP
fills in with the current date.TIMESTAMP
to a DATE
, a TIME
, or a TIMESTAMP
with a different fractional-seconds precision.INTERVAL
to an exact numeric type or another year-month INTERVAL
with different leading-field precision.INTERVAL
to an exact numeric type or another day-time INTERVAL
with different leading-field precision.One of the conversions that comes up quite often is the conversion from a datetime to a character string type, or the inverse, converting a character string to a datetime type. Amazingly enough, the SQL standard did not specify how to do this until SQL:2016. By that time, all the implementations had come up with their own way of doing it. As a consequence, at the time of this writing, none of the popular implementations complies completely with the standard syntax.
The template placeholders specified in Chapter 7 in the section on formatting and parsing dates and times give the elements that can be combined into a format template such as 'YYYY-MM-DD'
for Year-Month-Day.
To convert a datetime to the character string type, use the following:
CAST (<datetime> AS <character string type> [FORMAT<template>])
An example of this might be:
CAST ('1969-07-20' AS CHAR)
This would not change the order of the components of the date, but would change its type.
Alternatively,
CAST ('1969-07-20' AS CHAR FORMAT 'MM-DD-YYYY')
would change both the order of the components and the type.
To convert a character string to the datetime type, the following syntax complies with SQL:2016:
CAST('1969-07-20' AS DATE FORMAT '07-20-1969')
This converts the string to a date and puts it into the data format most commonly used in the USA.
Suppose you work for a company that keeps track of prospective employees as well as the employees you’ve actually hired. You list the prospective employees in a table named PROSPECT
, and you distinguish them by their Social Security numbers, which you happen to store as a CHAR(9)
type. You list the employees in a table named EMPLOYEE
, and you distinguish them by their Social Security numbers, which are of the INTEGER
type. You now want to generate a list of all people who appear in both tables. You can use CAST
to perform the task:
SELECT * FROM EMPLOYEE
WHERE EMPLOYEE.SSN =
CAST(PROSPECT.SSN AS INTEGER) ;
The key use of CAST
is to deal with data types that are available in SQL but not in the host language that you use. The following list offers some examples of these data types:
DECIMAL
and NUMERIC
, but FORTRAN and Pascal don’t.FLOAT
and REAL
, but standard COBOL doesn’t.DATETIME
, which no other language has.Suppose you want to use FORTRAN or Pascal to access tables with DECIMAL(5,3)
columns, and you don’t want any inaccuracies to result from converting those values to the REAL
data type used by FORTRAN and Pascal. You can perform this task by using CAST
to move the data to and from character-string host variables. You retrieve a numeric salary of 198.37 as a CHAR(10)
value of '0000198.37'
. Then, if you want to update that salary to 203.74, you can place that value in a CHAR(10)
as '0000203.74'
. First you use CAST
to change the SQL DECIMAL(5,3)
data type to the CHAR(10)
type for the employee whose ID number you’re storing in the host variable :emp_id_var
, as follows:
SELECT CAST(Salary AS CHAR(10)) INTO :salary_var
FROM EMP
WHERE EmpID = :emp_id_var ;
The FORTRAN or Pascal application examines the resulting character-string value in :salary_var
, possibly sets the string to a new value of '000203.74'
, and then updates the database by calling the following SQL code:
UPDATE EMP
SET Salary = CAST(:salary_var AS DECIMAL(5,3))
WHERE EmpID = :emp_id_var ;
Dealing with character-string values such as '000198.37'
is awkward in FORTRAN or Pascal, but you can write a set of subroutines to do the necessary manipulations. You can then retrieve and update any SQL data from any host language, getting — and then setting — exact values.
The general idea is that CAST
is most valuable for converting between host types and the database rather than for converting within the database.
In the original SQL standards, SQL-86 and SQL-89, most operations dealt with a single value or a single column in a table row. To operate on multiple values, you had to build complex expressions by using logical connectives (which I discuss in Chapter 10).
SQL-92 introduced row value expressions, which operate on a list of values or columns rather than on a single value or column. A row value expression is a list of value expressions that you enclose in parentheses and separate by commas. You can code these expressions to operate on an entire row at once or on a selected subset of the row.
Chapter 6 covers how to use the INSERT
statement to add a new row to an existing table. To do so, the statement uses a row value expression. Consider the following example:
INSERT INTO FOODS
(FOODNAME, CALORIES, PROTEIN, FAT, CARBOHYDRATE)
VALUES
('Cheese, cheddar', 398, 25, 32.2, 2.1) ;
In this example, ('Cheese, cheddar’, 398, 25, 32.2, 2.1)
is a row value expression. If you use a row value expression in an INSERT
statement this way, it can contain null and default values. (A default value is the value that a table column assumes if you specify no other value.) The following line, for example, is a legal row value expression:
('Cheese, cheddar', 398, NULL, 32.2, DEFAULT)
You can add multiple rows to a table by putting multiple row value expressions in the VALUES
clause, as follows:
INSERT INTO FOODS
(FOODNAME, CALORIES, PROTEIN, FAT, CARBOHYDRATE)
VALUES
('Lettuce', 14, 1.2, 0.2, 2.5),
('Butter', 720, 0.6, 81.0, 0.4),
('Mustard', 75, 4.7, 4.4, 6.4),
('Spaghetti', 148, 5.0, 0.5, 30.1) ;
You can use row value expressions to save yourself from having to enter comparisons manually. Suppose you have two tables of nutritional values, one compiled in English and the other in Spanish. You want to find those rows in the English language table that correspond exactly to the rows in the Spanish language table. Without a row value expression, you may need to formulate something like the following example:
SELECT * FROM FOODS
WHERE FOODS.CALORIES = COMIDA.CALORIA
AND FOODS.PROTEIN = COMIDA.PROTEINAS
AND FOODS.FAT = COMIDA.GRASAS
AND FOODS.CARBOHYDRATE = COMIDA.CARBOHIDRATO ;
Row value expressions enable you to code the same logic, as follows:
SELECT * FROM FOODS
WHERE (FOODS.CALORIES, FOODS.PROTEIN, FOODS.FAT,
FOODS.CARBOHYDRATE)
=
(COMIDA.CALORIA, COMIDA.PROTEINAS, COMIDA.GRASAS,
COMIDA.CARBOHIDRATO) ;
In this example, you don’t save much typing. You would benefit slightly more if you were comparing more columns. In cases of marginal benefit like this example, you may be better off sticking with the older syntax because its meaning is clearer.
You gain one benefit by using a row value expression instead of its coded equivalent — the row value expression is processed much faster. In principle, a clever implementation can analyze the coded version and implement it as the row value version. In practice, this operation is a difficult optimization that no DBMS that I am aware of can perform.