""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   MAKING SUB-SELECTS, OUTER JOINS, & RULES FASTER
   """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   PRODUCT   :  R:BASE                  VERSION      :  3.1
   CATEGORY  :  PROGRAMMING             SUBCATEGORY  :  TOOLS
   """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   By making sub-SELECTs faster, you can speed up the commands, outer
   joins, and rules that use them in their WHERE clauses.
 
 
   Introduction to Sub-SELECTs
   """""""""""""""""""""""""""
   For an introduction to sub-SELECTs, outer joins, and other SQL SELECT
   concepts, read the September/October 1990 R:BASE EXCHANGE, especially
   "Introducing SQL SELECT" and "SQL Outer Join Makes Budget Management
   Easier."
 
   A sub-SELECT is a SELECT command enclosed in parentheses and used in a
   WHERE clause on any command. A sub-SELECT returns a list of values
   that are compared to another value using this structure:
 
     ...WHERE colname OP (sub-SELECT)
 
   "OP" is the operator--IN or NOT IN are the most frequently used
   operators with a sub-SELECT.
 
   It's called a sub-SELECT because it's not the main command. It's
   subordinate to the main command. The main command could be SELECT,
   EDIT USING, PRINT, EDIT, BROWSE or any other command that can use a
   WHERE clause.
 
 
   Faster Sub-SELECTs
   """"""""""""""""""
   The operator tells R:BASE how to compare the list of items returned by
   the sub-SELECT to the list of rows requested by the main command.
   R:BASE doesn't use indexes to speed up this comparison, so it's
   important to use other techniques. Here are four ways to speed up a
   sub-SELECT:
 
     o  Replace the sub-SELECT with a multi-table SELECT.
     o  Correlate the sub-SELECT to the main part of the query using an
        indexed WHERE clause.
     o  Use EXISTS or NOT EXISTS as the operator instead of IN or NOT IN
        when it only matters whether or not a sub-SELECT finds a value.
        Note that neither EXISTS nor NOT EXISTS use a column name.
     o  Add an indexed WHERE clause to the sub-SELECT to limit the number
        of items in the list it returns. R:BASE will use the index in the
        sub-SELECT's WHERE clause.
 
 
   Multi-table SELECT
   """"""""""""""""""
   If you're using the sub-SELECT with the IN operator in a SELECT
   command's WHERE clause, replace the sub-SELECT with a multi-table
   SELECT. Incorporate the sub-SELECT's table and conditions into the
   main SELECT. R:BASE uses indexes when joining tables in a multi-table
   SELECT.
 
 
   Correlated Sub-SELECT
   """""""""""""""""""""
   If you can't use a multi-table SELECT, correlate the sub-SELECT to the
   main part of the query using an indexed column. That is, add a WHERE
   clause to the sub-SELECT (or add onto an existing sub-SELECT WHERE
   clause). Make sure the WHERE clause will use indexes to create the
   comparison list by choosing rows that match on a linking column with
   the main part of the command.
 
   When a sub-SELECT can stand on its own, it isn't correlated. A
   correlated sub-SELECT uses a "table.column" item in the sub-SELECT's
   WHERE clause that is not in the sub-SELECT's list, so only the main
   command has access to the item. In other words, to correlate a sub-
   SELECT to the main query, add a WHERE clause to the sub-SELECT that
   uses an indexed item that only the main query can access.
 
   In some cases, you can do this by using a copy of the table (T2) in
   the sub-SELECT and the table itself in the main command. Here's an
   example using RULES as the main command. The sub-SELECT is correlated
   to the main command making the uniqueness rule faster.
 
     RULES 'Value must be unique.' +
       FOR tblname SUCCEEDS +
       WHERE colname IS NOT NULL +
         AND colname NOT IN +
         (SELECT colname FROM tblname t2 +
         WHERE t2.colname = tblname.colname)
 
   The sub-SELECT is correlated because it can't stand on its own. It
   uses TBLNAME.COLNAME in its WHERE clause, which only the main command
   can access. The sub-SELECT's table is T2, not TBLNAME.
 
   The correlated sub-SELECT is faster because it forces R:BASE to do an
   internal join, which always use indexes.
 
 
   EXISTS or NOT EXISTS
   """"""""""""""""""""
   Use EXISTS or NOT EXISTS as the operator instead of using IN or NOT
   IN. For example, when you use a rule to ensure that a new value is
   unique, you don't care what the value is, just whether or not it
   already exists. Therefore, you can speed up the same rule by using the
   NOT EXISTS operator instead of NOT IN, as in the example shown on the
   below.
 
     RULES 'Value must be unique.' +
       FOR tblname SUCCEEDS +
       WHERE colname IS NOT NULL +
         AND NOT EXISTS +
         (SELECT colname FROM tblname t2 +
         WHERE t2.colname = tblname.colname)
 
 
   Limit the Sub-SELECT's List
   """""""""""""""""""""""""""
   If you can't use a multi-table SELECT, and you can't correlate the
   sub-SELECT, add an indexed WHERE clause to the sub-SELECT to limit the
   number of items in the list it returns. The fewer items that need to
   be compared, the faster the sub-SELECT.
 
 
   Speedy Outer Joins
   """"""""""""""""""
   Multi-table SELECTs join tables together. A join can be a self join,
   an inner join, or an outer join.
 
   A self join, like the old APPEND command, appends rows from a single
   table onto itself. When the multiple tables are different, they're
   joined by a linking column in the WHERE clause. An inner join, like
   the INTERSECT command, includes only those rows that match on the
   linking columns. An outer join, like SUBTRACT, includes only those
   rows that don't match on the linking columns.
 
   Most of the time, you'll do a self join or an inner join, but
   sometimes it's useful to do an outer join. For example, you need both
   an inner and outer join (like the UNION command) to get all the rows
   in these cases:
 
     o  Join a CUSTOMER table with an ORDERS table and list the customers
        who ordered something this month (inner join) as well as those
        who didn't order anything (outer join).
     o  Join a BUDGET table with an EXPENSE table and list each budget
        item whether or not there was an expense for that item this
        month.
     o  Check a master (header) table against a transaction (detail)
        table to see all the headers whether or not they have associated
        details.
 
 
   Three Examples Show How
   """""""""""""""""""""""
   Below, from slowest to fastest, are three examples of how to list all
   the invoice numbers in an INVOICE table whether or not they have
   related rows in a TRANSX table.
 
   Here INVOICE has 1,000 rows and TRANSX has 8,000 rows. There are 20
   matches and 980 non-matches. In other words, the first SELECT (inner
   join) in each example finds 20 rows and the second SELECT (outer join)
   in each example finds 980 rows.
 
 
   Uncorrelated Sub-SELECT--Slow
   """""""""""""""""""""""""""""
   This first example shows how to do it with a simple sub-SELECT that
   doesn't have a WHERE clause correlating it to the main SELECT. It's
   slow, taking one hour, seven minutes and 39 seconds to complete.
 
     SELECT t2.invid, SUM(t3.tprice) +
       FROM invoice t2, transx t3 +
       WHERE t3.invid = t2.invid +
       GROUP BY t2.invid +
       UNION +
         SELECT invoice.invid, $0.00 +
           FROM invoice +
           WHERE invid NOT IN +
             (SELECT invid FROM transx)
 
 
   Correlated Sub-SELECT--Faster
   """""""""""""""""""""""""""""
   Make it 20 times faster by adding a correlated WHERE clause to the
   sub-SELECT. Now, it takes only three minutes and 48 seconds to
   complete.
 
     SELECT t2.invid, SUM(t3.tprice) +
       FROM invoice t2, transx t3 +
       WHERE t3.invid = t2.invid +
       GROUP BY t2.invid +
       UNION +
         SELECT t1.invid, $0.00 +
           FROM invoice t1 +
           WHERE invid NOT IN +
             (SELECT invid FROM transx +
             WHERE transx.invid = t1.invid )
 
 
   Correlated & NOT EXISTS--Fastest
   """"""""""""""""""""""""""""""""
   By changing NOT IN to NOT EXISTS for use with the correlated sub-
   SELECT, you can add a little more speed. Now it takes only three
   minutes and 37 seconds to complete.
 
     SELECT t2.invid, SUM(t3.tprice) +
       FROM invoice t2, transx t3 +
       WHERE t3.invid = t2.invid +
       GROUP BY t2.invid +
       UNION +
         SELECT t1.invid, $0.00 +
           FROM invoice t1 +
           WHERE NOT EXISTS +
             (SELECT invid FROM transx +
             WHERE transx.invid = t1.invid )
 
 
   BUDCOMP.CMD Example
   """""""""""""""""""
   "SQL Outer Join Makes Budget Management Easier" in the
   September/October 1990 issue has a command file named BUDCOMP.CMD that
   demonstrates an outer join. Here's the SELECT command from
   BUDCOMP.CMD:
 
     SELECT t1.bud_item, +
       (MAX(t1.bud_amt * .vmonths))=s, +
       (SUM(t2.chk_amt))=s, +
       (MAX(t1.bud_amt) * .vmonths +
       - SUM(t2.chk_amt))=s +
       FROM budget t1, checks t2 +
       WHERE t1.bud_code = t2.bud_code +
         AND t2.chk_date BETWEEN +
           .vsdate AND .vedate +
       GROUP BY t1.bud_item +
       UNION +
         SELECT bud_item, +
           (bud_amt * .vmonths), '$0.00', +
           (bud_amt * .vmonths) +
           FROM budget +
           WHERE bud_code NOT IN +
             (SELECT bud_code +
             FROM checks WHERE +
             chk_date BETWEEN +
             .vsdate AND .vedate)
 
   This sub-SELECT isn't correlated because it can stand on its own. To
   speed up the sub-SELECT, turn it into a correlated sub-SELECT and use
   NOT EXISTS instead of NOT IN.
 
   Here's the revised SELECT command:
 
     SELECT t1.bud_item, +
       (MAX(t1.bud_amt * .vmonths))=s, +
       (SUM(t2.chk_amt))=s, +
       (MAX(t1.bud_amt) * .vmonths +
       - SUM(t2.chk_amt))=s +
       FROM budget t1, checks t2 +
       WHERE t1.bud_code = t2.bud_code +
         AND t2.chk_date BETWEEN +
           .vsdate AND .vedate +
       GROUP BY t1.bud_item +
       UNION +
         SELECT t3.bud_item, +
           (t3.bud_amt * .vmonths), '$0.00', +
           (t3.bud_amt * .vmonths) +
           FROM budget t3 +
           WHERE NOT EXISTS +
             (SELECT bud_code +
             FROM checks WHERE +
             checks.bud_code = t3.bud_code +
             AND chk_date BETWEEN +
             .vsdate AND .vedate)
 
   Now the sub-SELECT is correlated because it uses the T3 table, a table
   that exists only in the main SELECT.
 
   By applying these techniques to your applications, you'll speed up
   sub-SELECTs, outer joins, and rules.