824.TXT
=====================================================================
Find the Maximum Record
=====================================================================
PRODUCT: R:BASE VERSION: 4.5 or Higher
=====================================================================
CATALOG: Programming in R:BASE AREA : Logic & Data Manipulation
=====================================================================
Suppose you want to find the last date on which a customer ordered
product, or the last date on which a customer paid, or was called,
or was sent a mailing; but for each customer, the last date is
different. A report like this for all the customers in California,
for example, might have a different last date for each customer.
You can't use the special WHERE clause operators LIMIT or COUNT; they
won't find different rows for each person. You might think of trying
WHERE COUNT = LAST AND custid = 1001, but COUNT = LAST is a special
condition that finds the last row in the table. It can't be combined
with any other condition in the WHERE clause.
You can find the last record for a customer using SELECT functions and
a correlated sub-SELECT. The sub-SELECT uses the MAX function to
generate a list with the last record for each customer. That list is
then compared with the data in the main table.
Following is a step-by-step explanation showing how to build a SELECT
command that selects transaction information about the last
transaction for each customer.
First select just the customer id and the maximum date. This is the
basis of the technique. A GROUP BY clause on a SELECT command easily
returns the data.
SELECT custid, MAX(transdate) FROM Transmaster GROUP BY custid
custid MAX (tra
---------- --------
100 07/25/95
101 08/17/95
102 05/19/95
103 07/12/95
104 08/06/95
105 07/13/95
106 07/29/95
107 08/22/95
We can't add other columns to the list of columns to display, however,
as we'd need to group by the other columns or use them in an aggregate
function. Grouping by the other columns doesn't give the desired
results, it changes the uniqueness qualifiers. Using the other
columns, such as invoicetotal, in aggregate functions doesn't work
either; we don't get all the information from the same
row_MAX(invoicetotal) is not necessarily on the same row as
MAX(transdate).
2. We need to use the MAX(transdate) in a WHERE clause to qualify the
rows. The WHERE clause, WHERE transdate = MAX(transdate), is invalid,
however. An aggregate function can only be used in the SELECT column
list or in a HAVING clause, and must be used with a GROUP BY. We can
use MAX(transdate) in a sub-SELECT and compare to transdate in the
WHERE clause.
The command, SELECT MAX(transdate) FROM Transmaster GROUP BY custid,
generates a list of dates, one for each customer. The list of dates
can be compared to the date column. An exception to the aggregate
function rule and GROUP BY is where you select only the function
result. The SELECT then can return a single value with no ambiguity.
We don't need to include custid in the SELECT command.
SELECT * +
FROM Transmaster +
WHERE transdate IN +
(SELECT MAX(transdate) +
FROM Transmaster +
GROUP BY custid) +
ORDER BY custid
transid custid empid transdate netamount
-------- -------- -------- --------- ------------
5080 100 133 07/25/95 $800.00
5000 101 102 08/17/95 $290.00
4865 102 129 05/19/95 $291.25
4970 103 131 07/12/95 $1,522.50
5070 104 129 08/06/95 $95,500.00
4780 105 160 07/13/95 $100.00
5065 106 160 07/29/95 $1,403.00
5085 107 131 08/22/95 $74,250.00
3. The command returns what looks like correct data. But, in actual
fact, we get the data for any row where the transaction date value
matches one of the maximum transaction date values, even if that date
is not the maximum date for a customer. You might get the correct data
one day, add records, and the next day the data returned is wrong.
Look at the following data set, for example. There are two rows for
customer #101. Customer #101 has a data record where the value for
transdate matches the last transdate value for another customer,
customer #100.
transid custid empid transdate netamount
-------- -------- -------- -------- ------------
5080 100 133 07/25/95 $800.00
5000 101 102 08/17/95 $290.00
4975 101 102 07/25/95 $875.00
4865 102 129 05/19/95 $291.25
4970 103 131 07/12/95 $1,522.50
5070 104 129 08/06/95 $95,500.00
4780 105 160 07/13/95 $100.00
5065 106 160 07/29/95 $1,403.00
5085 107 131 08/22/95 $74,250.00
The sub-SELECT just makes a list of values and the command can
retrieve extra data. This is where the data you loaded into your test
database is important. If you don't think of all possible data
combinations and don't have data that meets a particular condition,
you might think the command in Step 2 returns correct data every time.
To compare to just the rows for the same customer you use a correlated
sub-SELECT. A correlated sub-SELECT connects the row in the sub-SELECT
to the row of data being displayed.
SELECT * +
FROM Transmaster +
WHERE transdate = +
(SELECT MAX(transdate) +
FROM Transmaster t2 +
WHERE t2.custid = transmaster.custid) +
ORDER BY custid
transid custid empid transdate netamount
-------- -------- -------- --------- ------------
5080 100 133 07/25/95 $800.00
5000 101 102 08/17/95 $290.00
4865 102 129 05/19/95 $291.25
4970 103 131 07/12/95 $1,522.50
5070 104 129 08/06/95 $95,500.00
4780 105 160 07/13/95 $100.00
5065 106 160 07/29/95 $1,403.00
5085 107 131 08/22/95 $74,250.00
Also, the GROUP BY is no longer used in the sub-SELECT. We are
returning only one value_the maximum for a single customer. The GROUP
BY is not required when a single value is returned by an aggregate
function. As each row in the Transmaster table is selected, the
sub-SELECT computes the MAX(transdate) value for the customer id on
that row in the table and then compares the transdate column to that
value. If the values match, the row is displayed. Notice that the
transmaster table is used both in the sub-SELECT and in the SELECT
command that retrieves and displays the data. The transmaster table in
the sub-SELECT is given an alias name or correlation name. This tells
R:BASE to treat the transmaster table in the sub-SELECT as a different
copy of the table. The two copies of the table are then linked in the
WHERE clause of the sub-SELECT.
We get the desired results and aren't limited in the data we can
display. This same technique can be used to find the last record for
any number of conditions. All you need is a column to "group by" and
a column that identifies the last record.