Understanding
and Using APPLY (Part 1)
Introduction
This is the first of a two-part series of articles looking at
the APPLY operator. This part sets out to give the reader a good understanding
of how to design queries using the APPLY operator, using a detailed example to
cover the new ideas presented. There is also a brief look at user-defined
table-valued functions; since the concepts involved are central to using APPLY
effectively.
Next , the second part will compare APPLY with the familiar JOIN
operator, examine APPLY in more detail, and present more examples.
Understanding APPLY
Overview
APPLY encourages us to take a divide
and conquer
approach to query writing; it works by building a solution in stages. The
method might be familiar to those who have worked with procedural languages
like those in the .NET family. As we will see, APPLY works by calling a routine
for each member of an input set.
The details
APPLY is named after the process of applying a set of input rows to a
table-valued function. In this context, a table-valued function can refer to
just about anything that produces rows and columns as its output - so we are
not just referring to a user-defined table-valued function here.
APPLY calls the function once for each row from the input.
Column values from the input row are available for the function to use. The
function returns a table on each call, which always has the same set of
columns. The number of rows returned may vary between calls, however. The table
produced by the function is appended to the input row. If the function produces
more than one row, the input row is duplicated to match. Each member of the
input set produces part of the final output; these partial results are combined
by APPLY to produce the eventual full result.
If that seems all a bit complex and abstract, do not worry; it
is a lot simpler in practice, as shown later in an example.
Query design
APPLY encourages the designer to think in terms of applying a
common function to each row of an input set, with intermediate results
combining to form the final output. With one exception (which we will come to
next), this difference in logical design is largely a convenience feature. Most
queries written using APPLY can also be expressed using a JOIN. Part two of
this series will examine that statement in more detail.
The exception is where the function used is a user-defined table-valued function
(defined using the CREATE FUNCTION statement), which uses values from the input
set in its parameter list. This powerful ability is unique to the APPLY
operator.
In the next section, we will take a simple problem and solve
it using the divide and conquer approach promoted by APPLY. Incidentally, the
problem is easily solved using joins, but the point is to emphasise the design
approach taken when using APPLY.
Using APPLY
A simple problem
Imagine you are the person responsible for the database at a
school. You are asked to produce a report showing grade scores for each male
student, in each subject. The following illustration shows the three relevant
tables, together with a sample of the data in each.
Writing the function
Frequently, the natural-language description of the problem
makes it clear what the input set should be, and what the function should do.
In this case, the input set is the set of all male students.
The function is required to list the subjects and scores for one particular
student. By applying each member of the input set to the function, the desired
result will be built up in stages.
We start by writing a query to return subjects and scores for
one student. Notice that instead of embedding a particular value in the query,
a variable is used.
Though not strictly necessary (as we will see later on), our
next step is to encapsulate this logic in a user-defined table-valued function.
The parts carried over from the original query are highlighted in blue.
Notice how easy that is to do - the variable becomes the parameter, and the body of the SELECT statement becomes the definition of the function. We can use the following query to test our new function with a single student id.
This produces a table with three rows and two columns, as
shown below. This is one of the partial results that will later be combined by
APPLY to form the final output.
The input set
Now that we have the function, we can move on to produce the input set, which will drive the APPLY. The query to list all male students is trivial:
Writing the APPLY
The final step is to APPLY the input set to the function. The APPLY slots into the FROM clause of the input set query, and the function sits to the right of the APPLY operator.
The input set passes student ids, one at a time, to the
function through its parameter. Since we are using APPLY, we can pass this
value as a direct column reference.
The result of executing that query is:
The column data shown in dark green comes from the input
set, and
the data shown in blue comes from the function. Values from the input set are repeated
as necessary, to match the number of rows returned by the function. The
repeated rows are shown in a lighter green.
APPLY and User-Defined
Functions
As was mentioned in the section on query design, APPLY is not
limited to using user-defined functions created using the CREATE FUNCTION
statement. In the previous example, we could have omitted the user-defined
function completely, and written the query like this:
Instead of passing the student id to the function as a
parameter, the inner WHERE clause now contains a direct reference to a column
from the input set. This connection between input set and function is known as
a correlation. Some people find it useful to think of this usage of APPLY
as a correlated join.
You might be wondering why we went to the trouble of creating
a user-defined function in the previous section's example. We will discuss
those reasons next.
The advantages of in-line
user-defined functions
There are two types of user-defined table-valued function: multi-statement and in-line. For brevity, we will refer
to the in-line variety as an iTVF from now on.
An iTVF consists of a single SELECT statement, which can
include references to the function's parameters. See CREATE FUNCTION in the SQL Server documentation for syntax
details, more examples, and the list of restrictions. You might find it useful
to think of an iTVF as a view that accepts parameters. Just as for views, SQL
Server expands the definition of an iTVF directly into the query plan of an
enclosing query, before optimization is performed.
The effect is that SQL Server is able to apply its full range
of optimizations, considering the query as a whole. It is just as if you had
written the expanded query out by hand. This makes iTVFs a great way to:
1.
Encapsulate
logic;
2.
Produce
compact and readable code;
3.
Promote
code-reuse;
4.
Improve
consistency; and
5.
Encourage
a modular programming style
These advantages can help to reduce the number of bugs,
shorten development time, and make training new staff a quicker and easier
process. Many professionals maintain a library of iTVFs, for precisely these
reasons.
Multi-statement functions
A multi-statement user-defined function works quite
differently. In particular, it does not share the view-like property of being
expanded into the containing query before optimization. For this reason,
multi-statement functions tend to perform much less well than iTVFs. If you
decide to write a multi-statement function, be sure to test it thoroughly if
performance is an important consideration.
End of Part One
As you may have noticed from the preceding code examples, the
APPLY operator cannot be used by itself in the FROM clause. There are also two
forms of APPLY, CROSS and OUTER. We will discuss these in Part II of this series.
The Resources section below contains heavily annotated
versions of all the code featured in this part of the article, together with a
script to create the sample data used. It also includes an extra example,
showing how to return the TOP N scores for each student.