IN THIS CHAPTER
Understanding recursive processing
Defining recursive queries
Finding ways to use recursive queries
One of the major criticisms of SQL, up through and including SQL-92, was its inability to implement recursive processing. Many important problems that are difficult to solve by other means yield readily to recursive solutions. Extensions included in SQL:1999 allow recursive queries — which greatly expand the language’s power. If your SQL implementation includes the recursion extensions, you can efficiently solve a large new class of problems. However, because recursion is not a part of core SQL, many implementations currently available do not include it.
Recursion is a feature that’s been around for years in programming languages such as Logo, LISP, and C++. In these languages, you can define a function (a set of one or more commands) that performs a specific operation. The main program invokes the function by issuing a command called a function call. If the function calls itself as a part of its operation, you have the simplest form of recursion.
A simple program that uses recursion in one of its functions provides an illustration of the joys and pitfalls of recursion. The following program, written in C++, draws a spiral on the computer screen. It assumes that the drawing tool is initially pointing toward the top of the screen, and it includes three functions:
line(
n
)
draws a line n units long.left_turn(
d
)
rotates the drawing tool d degrees counterclockwise.spiral(segment)
as follows:void spiral(int segment)
{
line(segment)
left_turn(90)
spiral(segment + 1)
} ;
If you call spiral(1)
from the main program, the following actions take place:
spiral(1)
draws a line one unit long toward the top of the screen.spiral(1)
turns left 90 degrees.spiral(1)
calls spiral(2)
.spiral(2)
draws a line two units long toward the left side of the screen.spiral(2)
turns left 90 degrees.spiral(2)
calls spiral(3)
.Eventually the program generates the spiral shown in Figure 13-1.
Well, okay, the situation here is not as serious as it was for Apollo 13 when the main oxygen tank exploded while the spacecraft was en route to the moon. Your problem is that the spiral-drawing program keeps calling itself and drawing longer and longer lines. It will continue to do that until the computer executing it runs out of resources and (if you’re lucky) puts an obnoxious error message on the screen. If you’re unlucky, the computer just crashes.
The scenario described in the previous section shows one of the dangers of using recursion. A program written to call itself invokes a new instance of itself — which in turn calls yet another instance, ad infinitum. This is generally not what you want. (Think of a certain cartoon mouse in a wizard’s hat trying to stop all those marching broomsticks… .)
To address this problem, programmers include a termination condition within the recursive function — a limit on how deep the recursion can go — so the program performs the desired action and then terminates gracefully. You can include a termination condition in your spiral-drawing program to save computer resources and prevent dizziness in programmers:
void spiral2(int segment)
{
if (segment <= 10)
{
line(segment)
left_turn(90)
spiral2(segment + 1)
}
} ;
When you call spiral2(1)
, it executes and then (recursively) calls itself until the value of segment
exceeds 10. At the point where segment
equals 11, the if (segment <=10)
construct returns a False value, and the code within the interior braces is skipped. Control returns to the previous invocation of spiral2
and, from there, returns all the way up to the first invocation, after which the program terminates. Figure 13-2 shows the sequence of calls and returns that occur.
Every time a function calls itself, it takes you one level farther away from the main program that was the starting point of the operation. For the main program to continue, the deepest iteration must return control to the iteration that called it. That iteration will have to do likewise, returning all the way back to the main program that made the first call to the recursive function.
Recursion is a powerful tool for repeatedly executing code when you don’t know at the outset how many times the code should be repeated. It’s ideal for searching through tree-shaped structures such as family trees, complex electronic circuits, or multilevel distribution networks.
A recursive query is a query that is functionally dependent upon itself. The simplest form of such functional dependence works like this: Query Q1 invokes itself in the body of the query expression. A more complex case is where query Q1 depends on query Q2, which in turn depends on query Q1. There is still a functional dependency, and recursion is still involved, no matter how many queries lie between the first and the second invocation of the same query. If that sounds weird, don’t worry: Here’s how it works …
Recursive queries may help save you time and frustration in dealing with various kinds of problems. Suppose, for example, that you have a pass that gives you free air travel on any flight of the (fictional) Vannevar Airlines. Way cool. The next question you ask is, “Where can I go for free?” The FLIGHT table contains all the flights that Vannevar runs. Table 13-1 shows the flight number and the source and destination of each flight.
TABLE 13-1 Flights Offered by Vannevar Airlines
Flight No. |
Source |
Destination |
3141 |
Portland |
Orange County |
2173 |
Portland |
Charlotte |
623 |
Portland |
Daytona Beach |
5440 |
Orange County |
Montgomery |
221 |
Charlotte |
Memphis |
32 |
Memphis |
Champaign |
981 |
Montgomery |
Memphis |
Figure 13-3 illustrates the routes on a map of the United States.
To get started on your vacation plan, create a database table for FLIGHT by using SQL as follows:
CREATE TABLE FLIGHT (
FlightNo INTEGER NOT NULL,
Source CHAR (30),
Destination CHAR (30) );
After the table is created, you can populate it with the data shown in Table 13-1.
Suppose you’re starting from Portland and you want to visit a friend in Montgomery. Naturally you wonder, “What cities can I reach via Vannevar if I start from Portland?” and “What cities can I reach via the same airline if I start from Montgomery?” Some cities are reachable in one hop; others are not. Some might require two or more hops. You can find all the cities that you can get to via Vannevar, starting from any given city on its route map — but if you do it one query at a time, you’re …
To find out what you want to know — provided you have the time and patience — you can make a series of queries, first using Portland
as the starting city:
SELECT Destination FROM FLIGHT WHERE Source = 'Portland';
The first query returns Orange County
, Charlotte
, and Daytona Beach
. Your second query uses the first of these results as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = 'Orange County';
The second query returns Montgomery
. Your third query returns to the results of the first query and uses the second result as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = 'Charlotte';
The third query returns Memphis
. Your fourth query goes back to the results of the first query and uses the remaining result as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = 'Daytona Beach';
Sorry, the fourth query returns a null result because Vannevar offers no outgoing flights from Daytona Beach. But the second query returned another city (Montgomery
) as a possible starting point, so your fifth query uses that result:
SELECT Destination FROM FLIGHT WHERE Source = 'Montgomery';
This query returns Memphis
, but you already know it’s among the cities you can get to (in this case, via Charlotte
). But you go ahead and try this latest result as a starting point for another query:
SELECT Destination FROM FLIGHT WHERE Source = 'Memphis';
The query returns Champaign
— which you can add to the list of reachable cities (even if you have to get there in two hops). As long as you’re considering multiple hops, you plug in Champaign
as a starting point:
SELECT Destination FROM FLIGHT WHERE Source = 'Champaign';
Oops. This query returns a null value; Vannevar offers no outgoing flights from Champaign. (Seven queries so far. Are you fidgeting yet?)
Vannevar doesn’t offer a flight out of Daytona Beach, either, so if you go there, you’re stuck — which might not be a hardship if it’s Spring Break week. (Of course, if you use up a week running individual queries to find out where to go next, you might get a worse headache than you’d get from a week of partying.) Or you might get stuck in Champaign — in which case, you could enroll in the University of Illinois and take a few database courses.
Granted, this method will (eventually) answer the question, “What cities are reachable from Portland?” But running one query after another, making each one dependent on the results of a previous query, is complicated, time-consuming, and fidgety.
A simpler way to get the info you need is to craft a single recursive query that does the entire job in one operation. Here’s the syntax for such a query:
WITH RECURSIVE
REACHABLEFROM (Source, Destination)
AS (SELECT Source, Destination
FROM FLIGHT
UNION
SELECT in.Source, out.Destination
FROM REACHABLEFROM in, FLIGHT out
WHERE in.Destination = out.Source
)
SELECT * FROM REACHABLEFROM
WHERE Source = 'Portland';
The first time through the recursion, FLIGHT has seven rows and REACHABLEFROM has none. The UNION
takes the seven rows from FLIGHT and copies them into REACHABLEFROM. At this point, REACHABLEFROM has the data shown in Table 13-2.
TABLE 13-2 REACHABLEFROM After One Pass through Recursion
Source |
Destination |
Portland |
Orange County |
Portland |
Charlotte |
Portland |
Daytona Beach |
Orange County |
Montgomery |
Charlotte |
Memphis |
Memphis |
Champaign |
Montgomery |
Memphis |
As I mention earlier, recursion is not a part of core SQL, and thus some implementations may not include it.
The second time through the recursion, things start to get interesting. The WHERE
clause (WHERE in.Destination = out.Source
) means that you’re looking only at rows where the Destination
field of the REACHABLEFROM table equals the Source
field of the FLIGHT table. For those rows, you’re taking two fields — the Source
field from REACHABLEFROM and the Destination
field from FLIGHT — and adding them to REACHABLEFROM as a new row. Table 13-3 shows the result of this iteration of the recursion.
TABLE 13-3 REACHABLEFROM After Two Passes through the Recursion
Source |
Destination |
Portland |
Orange County |
Portland |
Charlotte |
Portland |
Daytona Beach |
Orange County |
Montgomery |
Charlotte |
Memphis |
Memphis |
Champaign |
Montgomery |
Memphis |
Portland |
Montgomery |
Portland |
Memphis |
Orange County |
Memphis |
Charlotte |
Champaign |
The results are looking more useful. REACHABLEFROM now contains all the Destination
cities that are reachable from any Source
city in two hops or less. Next, the recursion processes three-hop trips, and so on, until all possible destination cities have been reached.
After the recursion is complete, the third and final SELECT
statement (which is outside the recursion) extracts from REACHABLEFROM only those cities you can reach from Portland by flying Vannevar. In this example, all six other cities are reachable from Portland — in few enough hops that you won’t feel like you’re traveling by pogo stick.
If you scrutinize the code in the recursive query, it doesn’t look any simpler than the seven individual queries it replaces. It does, however, have two advantages:
Imagine a real-world airline with many more cities on its route map. The more possible destinations that are available, the greater the advantage of using the recursive method.
What makes this query recursive? The fact that you’re defining REACHABLEFROM in terms of itself. The recursive part of the definition is the second SELECT
statement, the one just after the UNION
. REACHABLEFROM is a temporary table that fills with data progressively as the recursion proceeds. Processing continues until all possible destinations have been added to REACHABLEFROM. Any duplicates are eliminated, because the UNION
operator doesn’t add duplicates to the result table. After the recursion has finished running, REACHABLEFROM contains all the cities that are reachable from any starting city. The third and final SELECT
statement returns only those destination cities that you can reach from Portland. Bon voyage.
Any problem that you can lay out as a treelike structure can potentially be solved by using a recursive query. The classic industrial application is materials processing (the process of turning raw materials into finished goods). Suppose your company is building a new gasoline-electric hybrid car. Such a machine is built of subassemblies (engine, batteries, and so on), which are constructed from smaller subassemblies (crankshaft, electrodes, and so on), which are made of even smaller parts.
Keeping track of all the various parts can be difficult in a relational database that does not use recursion. Recursion enables you to start with the complete machine and ferret your way along any path to get to the smallest part. Want to find out the specs for the fastening screw that holds the clamp to the negative electrode of the auxiliary battery? The WITH RECURSIVE
structure gives SQL the capability to address such a brass-tacks-level problem.
Recursion is also a natural for what-if processing. In the Vannevar Airlines example, what if management discontinues service from Portland to Charlotte? How does that affect the cities that are reachable from Portland? A recursive query quickly gives you the answer.