825.TXT ===================================================================== Finding The Top "n" ===================================================================== PRODUCT: R:BASE VERSION: 4.5 or Higher ===================================================================== CATALOG: Programming in R:BASE AREA : Data Manipulation ===================================================================== Using views, the "top n" problem (you want to find the top 3 or the top 5 records) is a snap. The view makes it easy to select and rank the data, then display just the top records. Ranking the data is easy, you just order the data. The difficult part of the problem is displaying only the top "n" rows. For example, to rank salespeople by sales amount, just select the data and order in reverse order_the highest amount first: SELECT empid, invoicetotal FROM transmaster ORDER BY invoicetotal+ DESC empid invoicetotal ---------- --------------- 102 $192,016.00 160 $182,469.75 129 $169,650.50 131 $166,104.75 160 $153,067.30 131 $118,646.25 129 $104,190.50 102 $95,462.50 133 $87,280.00 131 $81,006.75 129 $68,733.00 129 $61,368.75 131 $45,822.00 102 $32,730.00 129 $31,775.37 102 $31,639.00 133 $29,457.00 102 $24,547.50 The salesperson with the highest sale is listed first. If you don't have many rows of data, this command may be sufficient, but often you want to see just the top salesperson, or just the top 3. You can't use the WHERE clause conditions LIMIT and COUNT to retrieve the top "n" records. They always work on the data before it has been sorted; you need to retrieve the top "n" after the data is sorted. Using either of these conditions returns just the first 2 rows from the table. SELECT empid, invoicetotal FROM transmaster WHERE LIMIT = 2 ORDER BY+ invoicetotal DESC empid invoicetotal ---------- --------------- 133 $29,457.00 160 $10,910.00 SELECT empid, invoicetotal FROM transmaster WHERE COUNT <= 2 ORDER+ BY invoicetotal DESC empid invoicetotal ---------- --------------- 133 $29,457.00 160 $10,910.00 There is a technique using the SELECT command that ranks the data and selects the top "n", but only if there is no duplicate data in the rows. If you have duplicate data values in the comparison column, then this technique returns incorrect data. SELECT empid,invoicetotal FROM transmaster t1 WHERE 2 >= (SELECT COUNT(*) FROM transmaster t2 WHERE t1.invoicetotal <= t2.invoicetotal) empid invoicetotal ---------- --------------- 102 $192,016.00 160 $182,469.75 The easy way to solve the problem is to create a view with the desired data in sorted order. Since the data retrieved by the view is already sorted, the LIMIT operator quickly retrieves the top "n" rows. CREATE VIEW rank1 AS SELECT empid,invoicetotal FROM transmaster ORDER BY invoicetotal DESC SELECT * FROM rank1 empid invoicetotal ---------- --------------- 102 $192,016.00 160 $182,469.75 129 $169,650.50 131 $166,104.75 160 $153,067.30 131 $118,646.25 129 $104,190.50 102 $95,462.50 133 $87,280.00 131 $81,006.75 129 $68,733.00 129 $61,368.75 131 $45,822.00 102 $32,730.00 129 $31,775.37 102 $31,639.00 133 $29,457.00 102 $24,547.50 160 $10,910.00 SELECT * FROM rank1 WHERE LIMIT = 2 empid invoicetotal ---------- --------------- 102 $192,016.00 160 $182,469.75 You can also use a view to find the "top n" when you want the ranking based on a sum or other calculation. CREATE VIEW rank2 (empid, total) AS SELECT empid, SUM(invoicetotal) FROM transmaster GROUP BY empid ORDER BY 2 DESC The ORDER BY clause supports ordering by expressions, so we can order by total sales in reverse order (DESC is for descending). And once the data in the view is in the correct order, the LIMIT operator retrieves just the specified rows.