"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" MAKING SUB-SELECTS, OUTER JOINS, & RULES FASTER """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" PRODUCT : R:BASE VERSION : 3.1 CATEGORY : PROGRAMMING SUBCATEGORY : TOOLS """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" By making sub-SELECTs faster, you can speed up the commands, outer joins, and rules that use them in their WHERE clauses. Introduction to Sub-SELECTs """"""""""""""""""""""""""" For an introduction to sub-SELECTs, outer joins, and other SQL SELECT concepts, read the September/October 1990 R:BASE EXCHANGE, especially "Introducing SQL SELECT" and "SQL Outer Join Makes Budget Management Easier." A sub-SELECT is a SELECT command enclosed in parentheses and used in a WHERE clause on any command. A sub-SELECT returns a list of values that are compared to another value using this structure: ...WHERE colname OP (sub-SELECT) "OP" is the operator--IN or NOT IN are the most frequently used operators with a sub-SELECT. It's called a sub-SELECT because it's not the main command. It's subordinate to the main command. The main command could be SELECT, EDIT USING, PRINT, EDIT, BROWSE or any other command that can use a WHERE clause. Faster Sub-SELECTs """""""""""""""""" The operator tells R:BASE how to compare the list of items returned by the sub-SELECT to the list of rows requested by the main command. R:BASE doesn't use indexes to speed up this comparison, so it's important to use other techniques. Here are four ways to speed up a sub-SELECT: o Replace the sub-SELECT with a multi-table SELECT. o Correlate the sub-SELECT to the main part of the query using an indexed WHERE clause. o Use EXISTS or NOT EXISTS as the operator instead of IN or NOT IN when it only matters whether or not a sub-SELECT finds a value. Note that neither EXISTS nor NOT EXISTS use a column name. o Add an indexed WHERE clause to the sub-SELECT to limit the number of items in the list it returns. R:BASE will use the index in the sub-SELECT's WHERE clause. Multi-table SELECT """""""""""""""""" If you're using the sub-SELECT with the IN operator in a SELECT command's WHERE clause, replace the sub-SELECT with a multi-table SELECT. Incorporate the sub-SELECT's table and conditions into the main SELECT. R:BASE uses indexes when joining tables in a multi-table SELECT. Correlated Sub-SELECT """"""""""""""""""""" If you can't use a multi-table SELECT, correlate the sub-SELECT to the main part of the query using an indexed column. That is, add a WHERE clause to the sub-SELECT (or add onto an existing sub-SELECT WHERE clause). Make sure the WHERE clause will use indexes to create the comparison list by choosing rows that match on a linking column with the main part of the command. When a sub-SELECT can stand on its own, it isn't correlated. A correlated sub-SELECT uses a "table.column" item in the sub-SELECT's WHERE clause that is not in the sub-SELECT's list, so only the main command has access to the item. In other words, to correlate a sub- SELECT to the main query, add a WHERE clause to the sub-SELECT that uses an indexed item that only the main query can access. In some cases, you can do this by using a copy of the table (T2) in the sub-SELECT and the table itself in the main command. Here's an example using RULES as the main command. The sub-SELECT is correlated to the main command making the uniqueness rule faster. RULES 'Value must be unique.' + FOR tblname SUCCEEDS + WHERE colname IS NOT NULL + AND colname NOT IN + (SELECT colname FROM tblname t2 + WHERE t2.colname = tblname.colname) The sub-SELECT is correlated because it can't stand on its own. It uses TBLNAME.COLNAME in its WHERE clause, which only the main command can access. The sub-SELECT's table is T2, not TBLNAME. The correlated sub-SELECT is faster because it forces R:BASE to do an internal join, which always use indexes. EXISTS or NOT EXISTS """""""""""""""""""" Use EXISTS or NOT EXISTS as the operator instead of using IN or NOT IN. For example, when you use a rule to ensure that a new value is unique, you don't care what the value is, just whether or not it already exists. Therefore, you can speed up the same rule by using the NOT EXISTS operator instead of NOT IN, as in the example shown on the below. RULES 'Value must be unique.' + FOR tblname SUCCEEDS + WHERE colname IS NOT NULL + AND NOT EXISTS + (SELECT colname FROM tblname t2 + WHERE t2.colname = tblname.colname) Limit the Sub-SELECT's List """"""""""""""""""""""""""" If you can't use a multi-table SELECT, and you can't correlate the sub-SELECT, add an indexed WHERE clause to the sub-SELECT to limit the number of items in the list it returns. The fewer items that need to be compared, the faster the sub-SELECT. Speedy Outer Joins """""""""""""""""" Multi-table SELECTs join tables together. A join can be a self join, an inner join, or an outer join. A self join, like the old APPEND command, appends rows from a single table onto itself. When the multiple tables are different, they're joined by a linking column in the WHERE clause. An inner join, like the INTERSECT command, includes only those rows that match on the linking columns. An outer join, like SUBTRACT, includes only those rows that don't match on the linking columns. Most of the time, you'll do a self join or an inner join, but sometimes it's useful to do an outer join. For example, you need both an inner and outer join (like the UNION command) to get all the rows in these cases: o Join a CUSTOMER table with an ORDERS table and list the customers who ordered something this month (inner join) as well as those who didn't order anything (outer join). o Join a BUDGET table with an EXPENSE table and list each budget item whether or not there was an expense for that item this month. o Check a master (header) table against a transaction (detail) table to see all the headers whether or not they have associated details. Three Examples Show How """"""""""""""""""""""" Below, from slowest to fastest, are three examples of how to list all the invoice numbers in an INVOICE table whether or not they have related rows in a TRANSX table. Here INVOICE has 1,000 rows and TRANSX has 8,000 rows. There are 20 matches and 980 non-matches. In other words, the first SELECT (inner join) in each example finds 20 rows and the second SELECT (outer join) in each example finds 980 rows. Uncorrelated Sub-SELECT--Slow """"""""""""""""""""""""""""" This first example shows how to do it with a simple sub-SELECT that doesn't have a WHERE clause correlating it to the main SELECT. It's slow, taking one hour, seven minutes and 39 seconds to complete. SELECT t2.invid, SUM(t3.tprice) + FROM invoice t2, transx t3 + WHERE t3.invid = t2.invid + GROUP BY t2.invid + UNION + SELECT invoice.invid, $0.00 + FROM invoice + WHERE invid NOT IN + (SELECT invid FROM transx) Correlated Sub-SELECT--Faster """"""""""""""""""""""""""""" Make it 20 times faster by adding a correlated WHERE clause to the sub-SELECT. Now, it takes only three minutes and 48 seconds to complete. SELECT t2.invid, SUM(t3.tprice) + FROM invoice t2, transx t3 + WHERE t3.invid = t2.invid + GROUP BY t2.invid + UNION + SELECT t1.invid, $0.00 + FROM invoice t1 + WHERE invid NOT IN + (SELECT invid FROM transx + WHERE transx.invid = t1.invid ) Correlated & NOT EXISTS--Fastest """""""""""""""""""""""""""""""" By changing NOT IN to NOT EXISTS for use with the correlated sub- SELECT, you can add a little more speed. Now it takes only three minutes and 37 seconds to complete. SELECT t2.invid, SUM(t3.tprice) + FROM invoice t2, transx t3 + WHERE t3.invid = t2.invid + GROUP BY t2.invid + UNION + SELECT t1.invid, $0.00 + FROM invoice t1 + WHERE NOT EXISTS + (SELECT invid FROM transx + WHERE transx.invid = t1.invid ) BUDCOMP.CMD Example """"""""""""""""""" "SQL Outer Join Makes Budget Management Easier" in the September/October 1990 issue has a command file named BUDCOMP.CMD that demonstrates an outer join. Here's the SELECT command from BUDCOMP.CMD: SELECT t1.bud_item, + (MAX(t1.bud_amt * .vmonths))=s, + (SUM(t2.chk_amt))=s, + (MAX(t1.bud_amt) * .vmonths + - SUM(t2.chk_amt))=s + FROM budget t1, checks t2 + WHERE t1.bud_code = t2.bud_code + AND t2.chk_date BETWEEN + .vsdate AND .vedate + GROUP BY t1.bud_item + UNION + SELECT bud_item, + (bud_amt * .vmonths), '$0.00', + (bud_amt * .vmonths) + FROM budget + WHERE bud_code NOT IN + (SELECT bud_code + FROM checks WHERE + chk_date BETWEEN + .vsdate AND .vedate) This sub-SELECT isn't correlated because it can stand on its own. To speed up the sub-SELECT, turn it into a correlated sub-SELECT and use NOT EXISTS instead of NOT IN. Here's the revised SELECT command: SELECT t1.bud_item, + (MAX(t1.bud_amt * .vmonths))=s, + (SUM(t2.chk_amt))=s, + (MAX(t1.bud_amt) * .vmonths + - SUM(t2.chk_amt))=s + FROM budget t1, checks t2 + WHERE t1.bud_code = t2.bud_code + AND t2.chk_date BETWEEN + .vsdate AND .vedate + GROUP BY t1.bud_item + UNION + SELECT t3.bud_item, + (t3.bud_amt * .vmonths), '$0.00', + (t3.bud_amt * .vmonths) + FROM budget t3 + WHERE NOT EXISTS + (SELECT bud_code + FROM checks WHERE + checks.bud_code = t3.bud_code + AND chk_date BETWEEN + .vsdate AND .vedate) Now the sub-SELECT is correlated because it uses the T3 table, a table that exists only in the main SELECT. By applying these techniques to your applications, you'll speed up sub-SELECTs, outer joins, and rules.