824.TXT ===================================================================== Find the Maximum Record ===================================================================== PRODUCT: R:BASE VERSION: 4.5 or Higher ===================================================================== CATALOG: Programming in R:BASE AREA : Logic & Data Manipulation ===================================================================== Suppose you want to find the last date on which a customer ordered product, or the last date on which a customer paid, or was called, or was sent a mailing; but for each customer, the last date is different. A report like this for all the customers in California, for example, might have a different last date for each customer. You can't use the special WHERE clause operators LIMIT or COUNT; they won't find different rows for each person. You might think of trying WHERE COUNT = LAST AND custid = 1001, but COUNT = LAST is a special condition that finds the last row in the table. It can't be combined with any other condition in the WHERE clause. You can find the last record for a customer using SELECT functions and a correlated sub-SELECT. The sub-SELECT uses the MAX function to generate a list with the last record for each customer. That list is then compared with the data in the main table. Following is a step-by-step explanation showing how to build a SELECT command that selects transaction information about the last transaction for each customer. First select just the customer id and the maximum date. This is the basis of the technique. A GROUP BY clause on a SELECT command easily returns the data. SELECT custid, MAX(transdate) FROM Transmaster GROUP BY custid custid MAX (tra ---------- -------- 100 07/25/95 101 08/17/95 102 05/19/95 103 07/12/95 104 08/06/95 105 07/13/95 106 07/29/95 107 08/22/95 We can't add other columns to the list of columns to display, however, as we'd need to group by the other columns or use them in an aggregate function. Grouping by the other columns doesn't give the desired results, it changes the uniqueness qualifiers. Using the other columns, such as invoicetotal, in aggregate functions doesn't work either; we don't get all the information from the same row_MAX(invoicetotal) is not necessarily on the same row as MAX(transdate). 2. We need to use the MAX(transdate) in a WHERE clause to qualify the rows. The WHERE clause, WHERE transdate = MAX(transdate), is invalid, however. An aggregate function can only be used in the SELECT column list or in a HAVING clause, and must be used with a GROUP BY. We can use MAX(transdate) in a sub-SELECT and compare to transdate in the WHERE clause. The command, SELECT MAX(transdate) FROM Transmaster GROUP BY custid, generates a list of dates, one for each customer. The list of dates can be compared to the date column. An exception to the aggregate function rule and GROUP BY is where you select only the function result. The SELECT then can return a single value with no ambiguity. We don't need to include custid in the SELECT command. SELECT * + FROM Transmaster + WHERE transdate IN + (SELECT MAX(transdate) + FROM Transmaster + GROUP BY custid) + ORDER BY custid transid custid empid transdate netamount -------- -------- -------- --------- ------------ 5080 100 133 07/25/95 $800.00 5000 101 102 08/17/95 $290.00 4865 102 129 05/19/95 $291.25 4970 103 131 07/12/95 $1,522.50 5070 104 129 08/06/95 $95,500.00 4780 105 160 07/13/95 $100.00 5065 106 160 07/29/95 $1,403.00 5085 107 131 08/22/95 $74,250.00 3. The command returns what looks like correct data. But, in actual fact, we get the data for any row where the transaction date value matches one of the maximum transaction date values, even if that date is not the maximum date for a customer. You might get the correct data one day, add records, and the next day the data returned is wrong. Look at the following data set, for example. There are two rows for customer #101. Customer #101 has a data record where the value for transdate matches the last transdate value for another customer, customer #100. transid custid empid transdate netamount -------- -------- -------- -------- ------------ 5080 100 133 07/25/95 $800.00 5000 101 102 08/17/95 $290.00 4975 101 102 07/25/95 $875.00 4865 102 129 05/19/95 $291.25 4970 103 131 07/12/95 $1,522.50 5070 104 129 08/06/95 $95,500.00 4780 105 160 07/13/95 $100.00 5065 106 160 07/29/95 $1,403.00 5085 107 131 08/22/95 $74,250.00 The sub-SELECT just makes a list of values and the command can retrieve extra data. This is where the data you loaded into your test database is important. If you don't think of all possible data combinations and don't have data that meets a particular condition, you might think the command in Step 2 returns correct data every time. To compare to just the rows for the same customer you use a correlated sub-SELECT. A correlated sub-SELECT connects the row in the sub-SELECT to the row of data being displayed. SELECT * + FROM Transmaster + WHERE transdate = + (SELECT MAX(transdate) + FROM Transmaster t2 + WHERE t2.custid = transmaster.custid) + ORDER BY custid transid custid empid transdate netamount -------- -------- -------- --------- ------------ 5080 100 133 07/25/95 $800.00 5000 101 102 08/17/95 $290.00 4865 102 129 05/19/95 $291.25 4970 103 131 07/12/95 $1,522.50 5070 104 129 08/06/95 $95,500.00 4780 105 160 07/13/95 $100.00 5065 106 160 07/29/95 $1,403.00 5085 107 131 08/22/95 $74,250.00 Also, the GROUP BY is no longer used in the sub-SELECT. We are returning only one value_the maximum for a single customer. The GROUP BY is not required when a single value is returned by an aggregate function. As each row in the Transmaster table is selected, the sub-SELECT computes the MAX(transdate) value for the customer id on that row in the table and then compares the transdate column to that value. If the values match, the row is displayed. Notice that the transmaster table is used both in the sub-SELECT and in the SELECT command that retrieves and displays the data. The transmaster table in the sub-SELECT is given an alias name or correlation name. This tells R:BASE to treat the transmaster table in the sub-SELECT as a different copy of the table. The two copies of the table are then linked in the WHERE clause of the sub-SELECT. We get the desired results and aren't limited in the data we can display. This same technique can be used to find the last record for any number of conditions. All you need is a column to "group by" and a column that identifies the last record.