IN THIS CHAPTER
Using variables to eliminate redundant coding
Extracting frequently required information from a database table field
Combining simple values to form complex expressions
This book emphasizes the importance of database structure for maintaining database integrity. Although the significance of database structure is often overlooked, you must never forget that the most important thing is the data itself. After all, the values held in the cells that form the intersections of the database table’s rows and columns are the raw materials from which you can derive meaningful relationships and trends.
You can represent values in several ways. You can represent them directly, or you can derive them with functions or expressions. This chapter describes the various kinds of values, as well as functions and expressions.
Functions examine data and calculate a value based on the data. Expressions are combinations of data items that SQL evaluates to produce a single value.
SQL recognizes several kinds of values:
The most visible values in a database are table row values. These are the values that each row of a database table contains. A row value is typically made up of multiple components because each column in a row contains a value. A field is the intersection of a single column with a single row. A field contains a scalar, or atomic, value. A value that’s scalar or atomic has only a single component.
In SQL, either a variable or a constant may represent a value. Logically enough, the value of a variable may change from time to time, but the value of a constant never changes. An important kind of constant is the literal value. The representation is itself the value.
Just as SQL has many data types, it also has many types of literals. Table 8-1 shows some examples of literals of the various data types.
TABLE 8-1 Example Literals of Various Data Types
Data Type |
Example Literal |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: Fifteen total characters and spaces are between the quote marks above. |
|
|
|
|
|
Note: Fifteen total characters and spaces are between the quote marks above. |
|
|
|
|
(A really long character string) |
|
|
|
|
|
(A really long string of ones and zeros) |
|
|
|
|
|
|
|
|
|
|
|
|
1This term is the word that Greeks use to name their own country in their own language. (The English equivalent is Hellas.)
2This term is the word lepton in Greek national characters.
Notice that single quotes enclose the literals of the non-numeric types. These marks help to prevent confusion; they can, however, also cause problems, as you can see in Table 8-1.
What if a literal is a character string that itself contains a phrase in single quotes? In that case, you must type two single quotes to show that one of the quote marks that you’re typing is a part of the character string and not an indicator of the end of the string. You’d type 'Earth"s atmosphere'
, for example, to represent the character literal 'Earth's atmosphere'
.
Although being able to manipulate literals and other kinds of constants while dealing with a database gives you great power, having variables is helpful, too. In many cases, you’d need to do much more work if you didn’t have variables. A variable, by the way, is a quantity that has a value that can change. Look at the following example to see why variables are valuable.
Suppose that you’re a retailer who has several classes of customers. You give your high-volume customers the best price, your medium-volume customers the next best price, and your low-volume customers the highest price. You want to index all prices to your cost of goods. For your F-35 product, you decide to charge your high-volume customers (Class C) 1.4 times your cost of goods. You charge your medium-volume customers (Class B) 1.5 times your cost of goods, and you charge your low-volume customers (Class A) 1.6 times your cost of goods.
You store the cost of goods and the prices that you charge in a table named PRICING. To implement your new pricing structure, you issue the following SQL commands:
UPDATE PRICING
SET Price = Cost * 1.4
WHERE Product = 'F-35'
AND Class = 'C' ;
UPDATE PRICING
SET Price = Cost * 1.5
WHERE Product = 'F-35'
AND Class = 'B' ;
UPDATE PRICING
SET Price = Cost * 1.6
WHERE Product = 'F-35'
AND Class = 'A' ;
This code is fine and meets your needs — for now. But if aggressive competition begins to eat into your market share, you may need to reduce your margins to remain competitive. To change your margins, you need to enter code something like this:
UPDATE PRICING
SET Price = Cost * 1.25
WHERE Product = 'F-35'
AND Class = 'C' ;
UPDATE PRICING
SET Price = Cost * 1.35
WHERE Product = 'F-35'
AND Class = 'B' ;
UPDATE PRICING
SET Price = Cost * 1.45
WHERE Product = 'F-35'
AND Class = 'A' ;
If you’re in a volatile market, you may need to rewrite your SQL code repeatedly. This task can become tedious, particularly if prices appear in multiple places in your code. You can minimize your work by replacing literals (such as 1.45
) with variables (such as :multiplierA
). Then you can perform your updates as follows:
UPDATE PRICING
SET Price = Cost * :multiplierC
WHERE Product = 'F-35'
AND Class = 'C' ;
UPDATE PRICING
SET Price = Cost * :multiplierB
WHERE Product = 'F-35'
AND Class = 'B' ;
UPDATE PRICING
SET Price = Cost * :multiplierA
WHERE Product = 'F-35'
AND Class = 'A' ;
Now whenever market conditions force you to change your pricing, you need to change only the values of the variables :multiplierC
, :multiplierB
, and :multiplierA
. These variables are parameters that pass to the SQL code, which then uses the variables to compute new prices.
Sometimes variables used in this way are called parameters or host variables. Variables are called parameters if they appear in applications written in SQL module language. They’re called host variables when they’re used in embedded SQL.
Embedded SQL means that SQL statements are embedded into the code of an application written in a host language. Alternatively, you can use SQL module language to create an entire module of SQL code. The host language application then calls the module. Either method can give you the capabilities that you want. The approach that you use depends on your SQL implementation.
If a user on a client machine connects to a database on a server, this connection establishes a session. If the user connects to several databases, the session associated with the most recent connection is considered the current session; previous sessions are considered dormant. SQL defines several special variables that are valuable on multiuser systems. These variables keep track of the different users. Here’s a list of the special variables:
SESSION_USER
: The special variable SESSION_USER
holds a value that’s equal to the user authorization identifier of the current SQL session. If you write a program that performs a monitoring function, you can interrogate SESSION_USER
to find out who is executing SQL statements.CURRENT_USER
: An SQL module may have a user-specified authorization identifier associated with it. The CURRENT_USER
variable stores this value. If a module has no such identifier, CURRENT_USER
has the same value as SESSION_USER
.SYSTEM_USER
: The SYSTEM_USER
variable contains the operating system’s user identifier. This identifier may differ from that same user’s identifier in an SQL module. A user may log on to the system as LARRY
, for example, but identify himself to a module as PLANT_MGR
. The value in SESSION_USER
is PLANT_MGR
. If he makes no explicit specification of the module identifier, and CURRENT_USER
also contains PLANT_MGR
, SYSTEM_USER
holds the value LARRY
. The SYSTEM_USER
, SESSION_USER
, and CURRENT_USER
special variables track who is using the system. You can maintain a log table and periodically insert into that table the values that SYSTEM_USER
, SESSION_USER
, and CURRENT_USER
contain. The following example shows how:
INSERT INTO USAGELOG (SNAPSHOT)
VALUES ('User ' || SYSTEM_USER ||
' with ID ' || SESSION_USER ||
' active at ' || CURRENT_TIMESTAMP) ;
This statement produces log entries similar to the following example:
User LARRY with ID PLANT_MGR active at 2018-04-07-23.50.00
Every column contains one value for each row of a table. SQL statements often refer to such values. A fully qualified column reference consists of the table name, a period, and then the column name (for example, PRICING.Product
). Consider the following statement:
SELECT PRICING.Cost
FROM PRICING
WHERE PRICING.Product = 'F-35' ;
Here PRICING.Product
is a column reference. This reference contains the value 'F-35'
. PRICING.Cost
is also a column reference, but you don’t know its value until the preceding SELECT
statement executes.
Because it only makes sense to reference columns in the current table, you don’t generally need to use fully qualified column references. The following statement, for example, is equivalent to the previous one:
SELECT Cost
FROM PRICING
WHERE Product = 'F-35' ;
Sometimes you may be dealing with more than one table — say, when two tables in a database contain one or more columns with the same name. In such a case, you must fully qualify column references for those columns to guarantee that you get the column you want.
For example, suppose that your company maintains facilities in both Kingston and Jefferson, and you maintain separate employee records for each site. You name the Kingston employee table EMP_KINGSTON
, and you name the Jefferson employee table EMP_JEFFERSON
. You want a list of employees who work at both sites, so you need to find the employees whose names appear in both tables. The following SELECT
statement gives you what you want:
SELECT EMP_KINGSTON.FirstName, EMP_KINGSTON.LastName
FROM EMP_KINGSTON, EMP_JEFFERSON
WHERE EMP_KINGSTON.EmpID = EMP_JEFFERSON.EmpID ;
Because each employee’s ID number is unique and remains the same regardless of the work site, you can use this ID as a link between the two tables. This retrieval returns only the names of employees who appear in both tables.
An expression may be simple or complex. The expression can contain literal values, column names, parameters, host variables, subqueries, logical connectives, and arithmetic operators. Regardless of its complexity, an expression must reduce to a single value.
For this reason, SQL expressions are commonly known as value expressions. Combining multiple value expressions into a single expression is possible, as long as the component value expressions reduce to values that have compatible data types.
SQL has five kinds of value expressions:
The simplest string value expression specifies a single string value. Other possibilities include a column reference, a set function, a scalar subquery, a CASE
expression, a CAST
expression, or a complex string value expression. (I discuss CASE
and CAST
value expressions in Chapter 9; I get into subqueries in Chapter 12.)
Only one operator is possible in a string value expression: the concatenation operator. You may concatenate any of the value expressions I mention in the bulleted list in the previous section with another expression to create a more complex string value expression. A pair of vertical lines (||
) represents the concatenation operator. The following table shows some examples of string value expressions.
Expression |
Produces |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As the table shows, if you concatenate a string to a zero-length string, the result is the same as the original string.
In numeric value expressions, you can apply the addition, subtraction, multiplication, and division operators to numeric-type data. The expression must reduce to a numeric value. The components of a numeric value expression may be of different data types as long as all the data types are numeric. The data type of the result depends on the data types of the components from which you derive the result. Even so, the SQL standard doesn’t rigidly specify the type that results from any specific combination of source-expression components. That’s because of the differences among hardware platforms. Check the documentation for your specific platform when you’re mixing numeric data types.
Here are some examples of numeric value expressions:
–27
49 + 83
5 * (12 – 3)
PROTEIN + FAT + CARBOHYDRATE
FEET/5280
COST * :multiplierA
Datetime value expressions perform operations on data that deal with dates and times. These value expressions can contain components that are of the types DATE
, TIME
, TIMESTAMP
, or INTERVAL
. The result of a datetime value expression is always a datetime type (DATE
, TIME
, or TIMESTAMP
). The following expression, for example, gives the date one week from today:
CURRENT_DATE + INTERVAL '7' DAY
Times are maintained in Universal Time Coordinated (UTC) — known in the UK as Greenwich Mean Time — but you can specify an offset to make the time correct for any particular time zone. For your system’s local time zone, you can use the simple syntax given in the following example:
TIME '22:55:00' AT LOCAL
Alternatively, you can specify this value the long way:
TIME '22:55:00' AT TIME ZONE INTERVAL '-08.00' HOUR TO MINUTE
This expression defines the local time as the time zone for Portland, Oregon, which is eight hours earlier than that of Greenwich, England.
If you subtract one datetime from another, you get an interval. Adding one datetime to another makes no sense, so SQL doesn’t permit you to do so. If you add two intervals together or subtract one interval from another interval, the result is an interval. You can also either multiply or divide an interval by a numeric constant.
SQL has two types of intervals: year-month and day-time. To avoid ambiguities, you must specify which to use in an interval expression. The following expression, for example, gives the interval in years and months until you reach retirement age:
(BIRTHDAY_65 - CURRENT_DATE) YEAR TO MONTH
The following example gives an interval of 40 days:
INTERVAL '17' DAY + INTERVAL '23' DAY
The example that follows approximates the total number of months that a mother of five has been pregnant (assuming that she’s not currently expecting number six!):
INTERVAL '9' MONTH * 5
Intervals can be negative as well as positive and may consist of any value expression or combination of value expressions that evaluates to an interval.
The value of a conditional value expression depends on a condition. The conditional value expressions CASE
, NULLIF
, and COALESCE
are significantly more complex than the other kinds of value expressions. In fact, these three conditional value expressions are so complex that I don’t have enough room to talk about them here. (I give conditional value expressions extensive coverage in Chapter 9.)
A function is a simple (okay, no more than moderately complex) operation that the usual SQL commands don’t perform but that comes up often in practice. SQL provides functions that perform tasks that the application code in the host language (within which you embed your SQL statements) would otherwise need to perform. SQL has two main categories of functions: set (or aggregate) functions and value functions.
Set functions apply to sets of rows in a table rather than to a single row. These functions summarize some characteristic of the current set of rows. The set may include all the rows in the table or a subset of rows that are specified by a WHERE
clause. (I discuss WHERE
clauses extensively in Chapter 10.) Programmers sometimes call set functions aggregate functions because these functions take information from multiple rows, process that information in some way, and deliver a single-row answer. That answer is an aggregation of the information in the rows making up the set.
To illustrate the use of the set functions, consider Table 8-2, a list of nutrition facts for 100 grams of selected foods.
TABLE 8-2 Nutrition Facts for 100 Grams of Selected Foods
Food |
Calories |
Protein (grams) |
Fat (grams) |
Carbohydrate (grams) |
Almonds, roasted |
627 |
18.6 |
57.7 |
19.6 |
Asparagus |
20 |
2.2 |
0.2 |
3.6 |
Bananas, raw |
85 |
1.1 |
0.2 |
22.2 |
Beef, lean hamburger |
219 |
27.4 |
11.3 |
|
Chicken, light meat |
166 |
31.6 |
3.4 |
|
Opossum, roasted |
221 |
30.2 |
10.2 |
|
Pork, ham |
394 |
21.9 |
33.3 |
|
Beans, lima |
111 |
7.6 |
0.5 |
19.8 |
Cola |
39 |
|
|
10.0 |
Bread, white |
269 |
8.7 |
3.2 |
50.4 |
Bread, whole wheat |
243 |
10.5 |
3.0 |
47.7 |
Broccoli |
26 |
3.1 |
0.3 |
4.5 |
Butter |
716 |
0.6 |
81.0 |
0.4 |
Jelly beans |
367 |
|
0.5 |
93.1 |
Peanut brittle |
421 |
5.7 |
10.4 |
81.0 |
A database table named FOODS stores the information in Table 8-2. Blank fields contain the value NULL
. The set functions COUNT
, AVG
, MAX
, MIN
, and SUM
can tell you important facts about the data in this table.
The COUNT
function tells you how many rows are in the table or how many rows in the table meet certain conditions. The simplest usage of this function is as follows:
SELECT COUNT (*)
FROM FOODS ;
This function yields a result of 15, because it counts all rows in the FOODS table. The following statement produces the same result:
SELECT COUNT (Calories)
FROM FOODS ;
Because the Calories
column in every row of the table has an entry, the count is the same. If a column contains nulls, however, the function doesn’t count the rows corresponding to those nulls.
The following statement returns a value of 11 because 4 of the 15 rows in the table contain nulls in the Carbohydrate
column.
SELECT COUNT (Carbohydrate)
FROM FOODS ;
A field in a database table may contain a null value for a variety of reasons. One common reason is that the actual value is not known (or not yet known). Or the value may be known but not yet entered. Sometimes, if a value is known to be zero, the data-entry operator doesn’t bother entering anything in a field — leaving that field a null. This is not a good practice because zero is a definite value, and you can include it in computations. Null is not a definite value, and SQL doesn’t include null values in computations.
You can also use the COUNT
function, in combination with DISTINCT
, to determine how many distinct values exist in a column. Consider the following statement:
SELECT COUNT (DISTINCT Fat)
FROM FOODS ;
The answer that this statement returns is 12. You can see that a 100-gram serving of asparagus has the same fat content as 100 grams of bananas (0.2 grams) and that a 100-gram serving of lima beans has the same fat content as 100 grams of jelly beans (0.5 grams). Thus the table has a total of only 12 distinct fat values.
The AVG
function calculates and returns the average of the values in the specified column. Of course, you can use the AVG
function only on columns that contain numeric data, as in the following example:
SELECT AVG (Fat)
FROM FOODS ;
The result is 15.37. This number is so high primarily because of the presence of butter in the database. You may wonder what the average fat content may be if you didn’t include butter. To find out, you can add a WHERE
clause to your statement, as follows:
SELECT AVG (Fat)
FROM FOODS
WHERE Food <> 'Butter' ;
The average fat value drops down to 10.32 grams per 100 grams of food.
The MAX
function returns the maximum value found in the specified column. The following statement returns a value of 81 (the fat content in 100 grams of butter):
SELECT MAX (Fat)
FROM FOODS ;
The MIN
function returns the minimum value found in the specified column. The following statement returns a value of 0.4, because the function doesn’t treat the nulls as zeros:
SELECT MIN (Carbohydrate)
FROM FOODS ;
The SUM
function returns the sum of all the values found in the specified column. The following statement returns 3,924, which is the total caloric content of all 15 foods:
SELECT SUM (Calories)
FROM FOODS ;
SQL:2016 introduced a new set function, LISTAGG
, which aggregates the values of a group of table rows into a list of values separated by a delimiter, such as a comma, that you can specify. A common use of this capability is to transform the aggregation of table values into a string of comma-separated values (as in a CSV
file). Syntax for the LISTAGG
function follows this model:
LISTAGG (<expression>, <delimiter>) WITHIN GROUP (ORDER BY fieldname, …)
As an example, suppose your database has an EMPLOYEE
table that records the EmployeeID
, FirstName
, LastName
, and DepartmentID
of each of your employees. Suppose further that you want a listing of all your employees, grouped by department and listed in alphabetical order of each employee’s last name. You could build such a listing with the following query:
SELECT DepartmentID,
LISTAGG(LastName, ',') WITHIN GROUP (ORDER BY LastName)
AS Employees
FROM EMPLOYEE
GROUP BY DepartmentID ;
The result will be a tabular result set with a DepartmentID
column and an Employees
column. Each row of the result set will hold the DepartmentID
of a department followed by a comma-separated list of the employees in that department. The rows will be in alphanumeric order by DepartmentID
, and the employee names in each row will be ordered alphabetically.
A number of operations apply in a variety of contexts. Because you need to use these operations so often, incorporating them into SQL as value functions makes good sense. ISO/IEC standard SQL offers relatively few value functions compared with specific database management system implementations such as Access, Oracle, or SQL Server, but the few that standard SQL does have are probably the ones that you’ll use most often. SQL uses the following four types of value functions:
String value functions take one character string as an input and produce another character string as an output. SQL has ten such functions:
SUBSTRING
SUBSTRING SIMILAR
SUBSTRING_REGEX
TRANSLATE_REGEX
OVERLAY
UPPER
LOWER
TRIM
TRANSLATE
CONVERT
Use the SUBSTRING
function to extract a substring from a source string. The extracted substring is of the same type as the source string. If the source string is a CHARACTER VARYING
string, for example, the substring is also a CHARACTER VARYING
string. Following is the syntax of the SUBSTRING
function:
SUBSTRING (string_value FROM start [FOR length])
The clause in square brackets ([ ]
) is optional. The substring extracted from string_value
begins with the character that start
represents and continues for length
characters. If the FOR
clause is absent, the substring extracted extends from the start
character to the end of the string. Consider the following example:
SUBSTRING ('Bread, whole wheat' FROM 8 FOR 7)
The substring extracted is 'whole w'
. This substring starts with the eighth character of the source string and has a length of seven characters. On the surface, SUBSTRING
doesn’t seem like a very valuable function; if you have a literal like 'Bread, whole wheat'
, you don’t need a function to figure out characters 8 through 14. SUBSTRING
really is a valuable function, however, because the string value doesn’t need to be a literal. The value can be any expression that evaluates to a character string. Thus, you could have a variable named fooditem
that takes on different values at different times. The following expression would extract the desired substring regardless of what character string the fooditem
variable currently represents:
SUBSTRING (:fooditem FROM 8 FOR 7)
All the value functions are similar in that these functions can operate on expressions that evaluate to values as well as on the literal values themselves.
You need to watch out for a couple of things if you use the SUBSTRING
function. Make sure that the substring that you specify actually falls within the source string. If you ask for a substring that starts at (say) character eight but the source string is only four characters long, you get a null result. You must therefore have some idea of the form of your data before you specify a substring function. You also don’t want to specify a negative substring length, because the end of a string can’t precede the beginning.
If a column is of the VARCHAR
type, you may not know how far the field extends for a particular row. This lack of knowledge doesn’t present a problem for the SUBSTRING
function. If the length that you specify goes beyond the right edge of the field, SUBSTRING
returns whatever it finds. It doesn’t return an error.
Say that you have the following statement:
SELECT * FROM FOODS
WHERE SUBSTRING (Food FROM 8 FOR 7) = 'white' ;
This statement returns the row for white bread from the FOODS table, even though the value in the Food
column ('Bread, white'
) is less than 14 characters long.
If any operand (value from which an operator derives another value) in the substring function has a null value, SUBSTRING
returns a null result.
The regular expression substring function is a triadic function (meaning it operates on three parameters). The three parameters are a source character string, a pattern string, and an escape character. It then uses pattern matching (based on POSIX-based regular expressions) to extract and return a result string from the source character string.
Two instances of the escape character, each followed by the double-quote character, are used to partition the pattern string into three parts. Here’s an example:
Suppose the source character string S
is 'Four score and seven years ago, our fathers brought forth upon this continent, a new nation'
. Suppose further that the pattern string R
is 'and '/"'seven'/"' years'
, where the forward slash is the escape character.
Then
SUBSTRING S SIMILAR TO R ;
returns a result that is the middle piece of the pattern string, 'seven'
in this case.
SUBSTRING_REGEX
searches a string for an XQuery regular expression pattern and returns one occurrence of the matching substring.
According to the ISO/IEC international standard JTC 1/SC 32, the syntax of a substring regular expression is as follows:
SUBSTRING_REGEX <left paren>
<XQuery pattern> [ FLAG <XQuery option flag> ]
IN <regex subject string>
[ FROM <start position> ]
[ USING <char length units> ]
[ OCCURRENCE <regex occurrence> ]
[ GROUP <regex capture group> ] <right paren>
<XQuery pattern>
is a character string expression whose value is an XQuery regular expression.
<XQuery option flag>
is an optional character string, corresponding to the $flags
argument of the [XQuery F&O]
function fn:match
.
<regex subject string>
is the character string to be searched for matches to the <XQuery pattern>
.
<start position>
is an optional exact numeric value with scale 0, indicating the character position at which to start the search. (The default is 1.)
<char length units>
is CHARACTERS
or OCTETS
, indicating the unit in which <start position>
is measured. (The default is CHARACTERS
.)
<regex occurrence>
is an optional exact numeric value with scale 0, indicating which occurrence of a match is desired. (The default is 1.)
<regex capture group>
is an optional exact numeric value with scale 0 indicating which capture group of a match is desired. (The default is 0, indicating the entire occurrence.)
Here are some examples of the use of SUBSTRING_REGEX
:
SUBSTRING_REGEX ('\p{L}*' IN 'Just do it.')='Just'
SUBSTRING_REGEX ('\p{L}*' IN 'Just do it.' FROM 2)= 'ust'
SUBSTRING_REGEX ('\p{L}*' IN 'Just do it.' OCCURRENCE 2) = 'do'
SUBSTRING_REGEX ( '(do) (\p{L}*' IN 'Just do it.' GROUP 2) = 'it'
TRANSLATE_REGEX
searches a string for an XQuery regular expression pattern and returns the string with either one or every occurrence of the XQuery regular expression replaced by an XQuery replacement string.
According to the ISO/IEC international standard JTC 1/SC 32, the syntax of a regex transliteration is as follows:
TRANSLATE_REGEX <left paren>
<XQuery pattern> [ FLAG <XQuery option flag> ]
IN <regex subject string>
[ WITH <regex replacement string> ]
[ FROM <start position> ]
[ USING <char length units> ]
[ OCCURRENCE <regex transliteration occurrence> ] <right paren>
<regex transliteration occurrence> ::=
<regex occurrence>
| ALL
where:
<regex replacement string>
is a character string whose value is suitable for use as the $replacement
argument of the [XQuery F&O]
function fn:replace
. Default is the zero-length string.<regex transliteration occurrence>
is either the keyword ALL
, or an exact numeric value with scale 0, indicating which occurrence of a match is desired (default is ALL
).Here are some examples with no replacement string:
TRANSLATE_REGEX ('i' IN 'Bill did sit.') = 'Bll dd st.'
TRANSLATE_REGEX ('i' IN 'Bill did sit.' OCCURRENCE ALL) = 'Bll dd st.'
TRANSLATE_REGEX ('i' IN 'Bill did sit.' FROM 5) = 'Bill dd st.'
TRANSLATE_REGEX ('i' IN 'Bill did sit.' Occurrence 2) = 'Bill dd sit.'
Here are a few examples with replacement strings:
TRANSLATE_REGEX ('i' IN 'Bill did sit.' WITH 'a') = 'Ball dad sat. '
TRANSLATE_REGEX ('i' IN 'Bill did sit.' WITH 'a' OCCURRENCE ALL)= 'Ball dad sat.'
TRANSLATE_REGEX ('i' IN 'Bill did sit.' WITH 'a' OCCURRENCE 2) = 'Bill dad sit.'
TRANSLATE_REGEX ('i' IN 'Bill did sit.' WITH 'a' FROM 5) = 'Bill dad sat.'
OVERLAY
replaces a given substring of a string (specified by a given numeric starting position and a given length) with a replacement string. When the length specified for the substring is zero, nothing is removed from the original string, but the replacement string is inserted into the original string, starting at the specified starting position.
The UPPER
value function converts a character string to all-uppercase characters, as in the examples shown in the following table.
This Statement |
Returns |
|
|
|
|
The UPPER
function doesn’t affect a string that’s already in all-uppercase characters.
The LOWER
value function converts a character string to all-lowercase characters, as in the examples in the following table.
This Statement |
Returns |
|
|
|
|
The LOWER
function doesn’t affect a string that’s already in all-lowercase characters.
Use the TRIM
function to trim off leading or trailing blanks (or other characters) from a character string. The following examples show how to use TRIM
.
This Statement |
Returns |
|
|
|
|
|
|
|
|
The default trim character is the blank, so the following syntax also is legal:
TRIM (BOTH FROM ' treat ')
This syntax gives you the same result as the third example in the table — 'treat'
.
The TRANSLATE
and CONVERT
functions take a source string in one character set and transform the original string into a string in another character set. Examples might be English to Kanji or Hebrew to French. The conversion functions that specify these transformations are implementation-specific. Consult the documentation of your implementation for details.
If translating from one language to another were as easy as invoking an SQL TRANSLATE
function, that would be great. Unfortunately, it’s not that easy. All TRANSLATE
does is translate a character in the first character set to the corresponding character in the second character set. The function can, for example, translate 'Eλλασ'
to 'Ellas'
. But it can’t translate 'Eλλασ'
to 'Greece'
.
Numeric value functions can take a variety of data types as input, but the output is always a numeric value. SQL has 15 types of numeric value functions:
POSITION
)OCCURRENCES_REGEX
)POSITION_REGEX
)EXTRACT
)CHAR_LENGTH
, CHARACTER_LENGTH
, OCTET_LENGTH
)CARDINALITY
)ABS
)MOD
)LN
)EXP
)POWER
)SQRT
)FLOOR
)CEIL
, CEILING
)WIDTH_BUCKET
)POSITION
searches for a specified target string within a specified source string and returns the character position where the target string begins. For a character string, the syntax looks like this:
POSITION (target IN source [USING char length units])
You can optionally specify a character length unit other than CHARACTER
, but this is rare. If Unicode characters are in use, depending on the type, a character could be 8, 16, or 32 bits long. In cases where a character is 16 or 32 bits long, you can explicitly specify 8 bits with USING OCTETS
.
For a binary string, the syntax looks like this:
POSITION (target IN source)
If the value of the target is equal to an identical-length substring of contiguous octets in the source string, then the result is one greater than the number of octets preceding the start of the first such substring.
The following table shows a few examples.
This Statement |
Returns |
|
|
|
|
|
|
|
|
|
|
|
|
For both character strings and binary strings, if the function doesn’t find the target string, the POSITION
function returns a zero value. Also for both string types, if the target string has zero length (as in the last character example), the POSITION
function always returns a value of one. If any operand in the function has a null value, the result is a null value.
OCCURRENCES_REGEX
is a numeric function that returns the number of matches for a regular expression in a string. The syntax is as follows:
OCCURRENCES_REGEX <left paren>
<XQuery pattern> [ FLAG <XQuery option flag> ]
IN <regex subject string>
[ FROM <start position> ]
[ USING <char length units> ] <right paren>
Here are some examples:
OCCURRENCES_REGEX ( 'i' IN 'Bill did sit.' ) = 3
OCCURRENCES_REGEX ( 'i' IN 'Bill did sit.' FROM 5) = 2
OCCURRENCES_REGEX ( 'I' IN 'Bill did sit.' ) = 0
POSITION_REGEX
is a numeric function that returns the position of the start of a match, or one plus the end of a match, for a regular expression in a string. Here’s the syntax:
POSITION_REGEX <left paren> [ <regex position start or after> ]
<XQuery pattern> [ FLAG <XQuery option flag> ]
IN <regex subject string<
[ FROM <start position> ]
[ USING <char length units> ]
[ OCCURRENCE <regex occurrence> ]
[ GROUP <regex capture group> ] <right paren>
<regex position start or after> ::= START | AFTER
Perhaps some examples would make this clearer:
POSITION_REGEX ( 'i' IN 'Bill did sit.' ) = 2
POSITION_REGEX ( START 'i' IN 'Bill did sit.' ) = 2
POSITION_REGEX ( AFTER 'i' IN 'Bill did sit.' ) = 3
POSITION_REGEX ( 'i' IN 'Bill did sit.' FROM 5) = 7
POSITION_REGEX ( 'i' IN 'Bill did sit.' OCCURRENCE 2 ) = 7
POSITION_REGEX ( 'I' IN 'Bill did sit.' ) = 0
The EXTRACT
function extracts a single field from a datetime or an interval. The following statement, for example, returns 08:
EXTRACT (MONTH FROM DATE '2013-08-20')
The CHARACTER_LENGTH
function returns the number of characters in a character string. The following statement, for example, returns 16
:
CHARACTER_LENGTH ('Opossum, roasted')
As I note in regard to the SUBSTRING
function (in the “SUBSTRING” section, earlier in the chapter), this function is not particularly useful if its argument is a literal such as 'Opossum, roasted'
. I can just as easily write 16
as I can CHARACTER_LENGTH ('Opossum, roasted')
. In fact, writing 16
is easier. This function is more useful if its argument is an expression rather than a literal value.
In music, a vocal ensemble made up of eight singers is called an octet. Typically, the parts that the ensemble represents are first and second soprano, first and second alto, first and second tenor, and first and second bass. In computer terminology, an ensemble of eight data bits is called a byte. The word byte is clever in that the term clearly relates to bit but implies something larger than a bit. A nice wordplay — but (unfortunately) nothing in the word byte conveys the concept of “eightness.” By borrowing the musical term, a more apt description of a collection of eight bits becomes possible.
Practically all modern computers use eight bits to represent a single alphanumeric character. More complex character sets (such as Chinese) require 16 bits to represent a single character. The OCTET_LENGTH
function counts and returns the number of octets (bytes) in a string. If the string is a bit string, OCTET_LENGTH
returns the number of octets you need to hold that number of bits. If the string is an English-language character string (with one octet per character), the function returns the number of characters in the string. If the string is a Chinese character string, the function returns a number that is twice the number of Chinese characters. The following string is an example:
OCTET_LENGTH ('Beans, lima')
This function returns 11 because each character takes up one octet.
Some character sets use a variable number of octets for different characters. In particular, some character sets that support mixtures of Kanji and Latin characters use escape characters to switch between the two character sets. A string that contains both Latin and Kanji (for example) may have 30 characters and require 30 octets if all the characters are Latin; 62 characters if all the characters are Kanji (60 characters plus a leading and trailing shift character); and 150 characters if the characters alternate between Latin and Kanji (because each Kanji character needs two octets for the character and one octet each for the leading and trailing shift characters). The OCTET_LENGTH
function returns the number of octets you need for the current value of the string.
Cardinality deals with collections of elements such as arrays or multisets, where each element is a value of some data type. The cardinality of the collection is the number of elements that it contains. One use of the CARDINALITY
function might be this:
CARDINALITY (TeamRoster)
This function would return 12, for example, if there were 12 team members on the roster. TeamRoster
, a column in the TEAMS table, can be either an array or a multiset. An array is an ordered collection of elements, and a multiset is an unordered collection of elements. For a team roster, which changes frequently, multiset makes more sense.
The CARDINALITY
function returns the number of elements in the array or multiset that you specify. What it does not tell you is the maximum cardinality that was assigned to that array. There are occasions when you might want to know that.
As a result, SQL:2011 added a new function ARRAY_MAX_CARDINALITY
. As you might guess, it returns the maximum cardinality of the array that you specify. There is no declared maximum cardinality for a multiset.
Whereas the TRIM
function trims off the first or last character in a string, the TRIM_ARRAY
function trims off the last elements of an array.
To trim off the last three elements of the TeamRoster
array, use the following syntax:
TRIM_ARRAY (TeamRoster, 3)
The ABS
function returns the absolute value of a numeric value expression.
ABS (-273)
In this case, the function returns 273.
The MOD
function returns the modulus of two numeric value expressions.
MOD (3,2)
In this case, the function returns 1, the modulus of three divided by two.
The SIN
function returns the sine of a numeric value expression.
SIN (numeric value expression)
The COS
function returns the cosine of a numeric value expression.
COS (numeric value expression)
The TAN
function returns the tangent of a numeric value expression.
TAN (numeric value expression)
The ASIN
function returns the arcsine of a numeric value expression.
ASIN (numeric value expression)
The ACOS
function returns the arccosine of a numeric value expression.
ACOS (numeric value expression)
The ATAN
function returns the arctangent of a numeric value expression.
ATAN (numeric value expression)
The SINH
function returns the hyperbolic sine of a numeric value expression.
SINH (numeric value expression)
The COSH
function returns the hyperbolic cosine of a numeric value expression.
COSH (numeric value expression)
The TANH
function returns the hyperbolic tangent of a numeric value expression.
TANH (numeric value expression)
The LOG
function returns the logarithm to a specified base of a numeric value expression.
LOG (base, numeric value expression)
The LOG10
function returns the base-ten logarithm of a numeric value expression.
Log10 (numeric value expression)
The LN
function returns the natural logarithm of a numeric value expression.
LN (numeric value expression)
For LN (9)
, this function returns something like 2.197224577. The number of digits beyond the decimal point depends on the SQL implementation.
The EXP
function raises the base of the natural logarithms e to the power specified by a numeric value expression.
EXP (2)
Here the function returns something like 7.389056. The number of digits beyond the decimal point depends on the SQL implementation.
The POWER
function raises the value of the first numeric value expression to the power of the second numeric value expression.
POWER (2,8)
Here this function returns 256, which is 2 raised to the eighth power.
The SQRT
function returns the square root of the value of the numeric value expression.
SQRT (4)
In this case, the function returns 2, the square root of 4.
The FLOOR
function truncates the numeric value expression to the largest integer not greater than the expression.
FLOOR (3.141592)
This function returns 3.
The CEIL
or CEILING
function augments the numeric value expression to the smallest integer not less than the expression.
CEIL (3.141592)
This function returns 4.
The WIDTH_BUCKET
function, used in online application processing (OLAP), is a function of four arguments, returning an integer between 0 (zero) and the value of the fourth argument plus 1 (one). It assigns the first argument to an equiwidth partitioning of the range of numbers between the second and third arguments. Values outside this range are assigned to either 0 (zero) or the value of the fourth argument plus 1 (one).
For example:
WIDTH_BUCKET (PI, 0, 10, 5)
Suppose PI
is a numeric value expression with a value of 3.141592. The example partitions the interval from zero to 9.999999 … into five equal buckets, each with a width of two. The function returns a value of 2, because 3.141592 falls into the second bucket, which covers the range from 2 to 3.999999.
SQL includes three functions that return information about the current date, current time, or both. CURRENT_DATE
returns the current date; CURRENT_TIME
returns the current time; and CURRENT_TIMESTAMP
returns (surprise!) both the current date and the current time. CURRENT_DATE
doesn’t take an argument, but CURRENT_TIME
and CURRENT_TIMESTAMP
both take a single argument. The argument specifies the precision for the “seconds” part of the time value that the function returns. (Datetime data types and the precision concept are described in Chapter 2.)
The following table offers some examples of these datetime value functions.
This Statement |
Returns |
|
|
|
|
|
|
The date that CURRENT_DATE
returns is DATE
type data. The time that CURRENT_TIME (
p
)
returns is TIME
type data, and the timestamp that CURRENT_TIMESTAMP(
p
)
returns is TIMESTAMP
type data. Because SQL retrieves date and time information from your computer’s system clock, the information is correct for the time zone in which the computer resides.
In some applications, you may want to take advantage of functions that operate on character-type data; to do so, you convert dates, times, or timestamps to character strings. You can perform such a type conversion by using the CAST
expression, which I describe in Chapter 9.
An interval value function named ABS
was introduced in SQL:1999. It’s similar to the ABS
numeric value function, but operates on interval-type data rather than numeric-type data. ABS
takes a single operand and returns an interval of the identical precision that is guaranteed not to have a negative value. Here’s an example:
ABS ( TIME '11:31:00' – TIME '12:31:00' )
The result is
INTERVAL +'1:00:00' HOUR TO SECOND
Table functions return entire tables rather than just values. There are two types of table functions: ordinary table functions, and as of SQL:2016, polymorphic table functions.
An ordinary table function takes one or more tables as input, operates on them in some manner, and then outputs a result table. These functions must specify the names and types of the columns they return (the row-type) at the time of creation.
A polymorphic table function (PTF) returns a table whose row type is not declared when the function is created. The row type may depend on the arguments in the invocation of the PTF. A PTF may have generic table parameters, meaning that no row type is declared when the PTF is created. Furthermore, the row type of the result may depend on the row types of the input tables. At the time of this writing, polymorphic table functions are not yet fully supported by any popular DBMS.