841.TXT ===================================================================== Multi-Level Grouping with SELECT ===================================================================== PRODUCT: R:BASE VERSION: 5.5 or Higher ===================================================================== CATALOG: Programming in R:BASE AREA : SELECT ===================================================================== Database queries don't always start off by sounding difficult. Many times the query sounds fairly simple, even trivial, and then you find yourself spending hours, or even days, trying to come up with that special SELECT command to return the desired data. Not every database query can be solved using a single SELECT command. Sometimes the solution can be found easier and faster by using a view. One database problem that sounds simple but isn't, is when you need to do multiple level grouping of data. Often this type of problem involves counting data or putting some type of a qualifier on a group of data. For example, in a mailing application, you may need to look at the number of pieces to be mailed for a particular zipcode and route combination; if there are more than ten pieces for a group then special handling is required. Maybe you want to know which customers have placed orders from at least three different sales reps. Or, a utility company charges different rates depending on the usage of electricity. You need to generate a list of customers who have been charged at least two different rates in a month. None of these problems sound difficult when stated like this, but when you try to build a query, you find out that the problem is more difficult than it sounds. The difficulty with building the query is that what really needs to be done is to use a GROUP BY clause on the results of a GROUP BY clause. This is easily solved by using views. Let's look more closely at the power utility application. We have a table, usage, with account and usage information, and a table of rate information, rates. Here's some sample data: The usage table: Account# ServiceDate HoursUsed -------- ----------- ---------- AA-0001 08/09/96 22 AA-0002 08/10/96 13 AA-0001 08/11/96 16 AA-0002 08/11/96 11 AA-0001 08/12/96 18 AA-0003 08/10/96 9 AA-0003 08/09/96 7 The rates table: MinHours MaxHours RatePerHour ---------- ---------- --------------- 0 10 $15.00 10 20 $14.50 20 50 $14.00 50 100 $13.00 100 1000000 $12.00 First, we need to join the two tables and add the rate per hour information to the rest of the account and usage information. We can use a view to do this. The view then has all the relevant information. This is what our view definition would be: CREATE VIEW rates_view AS + SELECT Account#, ServiceDate, HoursUsed, RatePerHour + FROM usage, rates + WHERE HoursUsed BETWEEN MinHours AND MaxHours The data contained in the view now includes the rate each account is charged per day depending on usage. Because this is a small amount of data, we can look at the data and see that only account #AA-0001 has been charged more than one rate in August. Account# ServiceDate HoursUsed RatePerHour -------- ----------- ---------- --------------- AA-0001 08/09/96 22 $14.00 AA-0001 08/11/96 16 $14.50 AA-0001 08/12/96 18 $14.50 AA-0002 08/10/96 13 $14.50 AA-0002 08/11/96 11 $14.50 AA-0003 08/09/96 7 $15.00 AA-0003 08/10/96 9 $15.00 But assuming we have hundreds, or even thousands, of rows, what command will give us the information we need? Using the GROUP BY option of the SELECT command, we can group the data in the view by account number and rate. SELECT Account#, RatePerHour FROM rates_view + GROUP BY Account#, RatePerHour Account# RatePerHour -------- --------------- AA-0001 $14.00 AA-0001 $14.50 AA-0002 $14.50 AA-0003 $15.00 This shows us the accounts that are charged more than one rate, but the results include all accounts. We only want to display the accounts charged more than one rate. Up to this point the query is fairly straightforward. Most users with some knowledge of SQL would be able to generate this list. The difficulty comes when trying to restrict the list to just accounts charged more than one rate. Any number of HAVING clauses and sub-SELECTS can be added to the above SELECT command, but turning the SELECT command into a view makes the problem simple. Use this command to generate the view: CREATE VIEW Rates_View2 AS + SELECT Account#, RatePerHour FROM rates_view + GROUP BY Account#, RatePerHour One feature of the GROUP BY option of SELECT is the HAVING clause. The HAVING clause is a way to specify conditions on groups, much like the WHERE clause specifies conditions on rows. The HAVING clause counts the rows in a group, for example, and only returns data where there is more than one row in the group. We can't use a HAVING clause on the original SELECT command because each group is for a single rate; remember the data is grouped by both account number and rate. What we can do is take the results of this SELECT (now a view) and group the data just by account number. SELECT Account#, COUNT(*) FROM Rates_View2 + GROUP BY Account# Account# COUNT (*) -------- ---------- AA-0001 2 AA-0002 1 AA-0003 1 Note that only the account charged more than one rate, AA-0001, has more than one row in the view data and has a count greater than one. Adding a HAVING clause onto this SELECT returns just the data we need. SELECT Account# FROM Rates_View2 + GROUP BY Account# HAVING COUNT(*) > 1 account# -------- AA-0001 Only the one account that was charged two different rates is returned. By using sequences of views we are able to get the information we need from the database quickly and easily. Once we can retrieve the account numbers, the SELECT command can be used in a sub-SELECT with other R:BASE commands to edit data or print reports.