791.TXT
     =====================================================================
     Using SELECT
     =====================================================================
     PRODUCT:  R:BASE                   VERSION: 3.1 or Higher
     =====================================================================
     CATALOG:  Programming in R:BASE    AREA   : Select 
     =====================================================================
 
 
     SELECT is a very powerful and useful command. It is used throughout 
     R:BASE, not just in commands at the R> prompt. For example, the 
     SELECT command is used when you save a view in QBE, create a rule, 
     define a cursor, and anytime R:BASE retrieves data. Understanding 
     the workings of the SELECT command improves your ability to use 
     R:BASE effectively.
 
     SELECT Command Parts
     Although the SELECT command syntax looks intimidating, you can master 
     it by breaking it down into its component parts. The SELECT command 
     is made up of six parts or clauses, each defined by a keyword: 
     SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.
 
       SELECT_defines what is displayed by the command. The SELECT 
       clause lists the columns, literal text, SELECT functions 
       (SUM, AVG, COUNT, MIN, and MAX), and expressions that are 
       manipulated and displayed by the SELECT command. The 
       columns listed in this clause must be in the tables listed 
       in the FROM clause. The SELECT clause is required in all 
       SELECT commands.
       FROM_lists the tables from which data is to be displayed.
       Tables listed in this clause must match the columns listed 
       in the SELECT clause. More than one table can be listed. 
       If more than one table is listed, a WHERE clause is required 
       to link the tables. The FROM clause is required in all 
       SELECT commands.
       WHERE_limits the rows of data retrieved from the tables
       listed in the FROM clause and links the tables in a multi-
       table SELECT. Columns used in this clause must be in the
       tables listed in the FROM clause, but do not need to be
       included in the SELECT clause. You can put conditions on
       columns that are not displayed. The WHERE clause is
       optional unless more than one table is listed in the FROM
       clause, then the WHERE clause is required.
       GROUP BY_groups data for display and performs computations 
       using the SELECT functions. All columns in the SELECT clause 
       must be listed here unless they are used in one of the SELECT 
       functions. The GROUP BY clause is optional.
       HAVING_assigns WHERE clause type criteria to each group defined 
       by a GROUP BY clause. The WHERE clause limits the rows of data 
       selected by the command based on tables in the FROM clause; 
       the HAVING clause limits the groups of data selected based on 
       columns in the GROUP BY clause.
       The HAVING clause is optional.
       ORDER BY_sorts the rows of data retrieved by the other clauses 
       of the command. The sorting is done on the final set of data 
       after all the clauses of the command have been completed. The 
       ORDER BY clause is optional.
 
     These are the basic clauses of the SELECT command. A complete SELECT 
     command can have many clauses. One option of a WHERE clause, for 
     example, is a sub-SELECT, another entire SELECT command. To make a 
     SELECT command easy to read and understand, place each clause on a 
     separate line, using the continuation character (+) to tell R:BASE 
     that the entire command encompasses more than one line. For example,
 
     SELECT empid, custid, COUNT(transid), SUM(netamount) +
     FROM transmaster +
     WHERE netamount < $100,000 +
     GROUP BY empid, custid +
     HAVING COUNT(transid) > 1 +
       AND SUM(netamount) > $100,000 ORDER BY 4 DESC
 
     By writing the command like this, it is easy to see that the columns 
     in the SELECT clause are in the GROUP BY, and the FROM tables match 
     the column list. This SELECT command is relatively simple even though 
     it uses all the clauses; only one table and a few columns are being 
     selected. The benefits are even more obvious with SELECT commands 
     that retrieve many columns from more than one table and have complex 
     WHERE clauses.
 
     The SELECT clause
     In addition to columns, variables, text strings and expressions can 
     be included in the SELECT clause part of the SELECT command. Any of 
     the R:BASE SuperMath functions can be used to perform calculations on 
     the data retrieved and columns can be concatenated for better display. 
     Often, when using variables or text strings in a SELECT, you also SET
     HEADINGS OFF. This setting turns off the display of the default column 
     headings of the SELECT command. Instead, use the WRITE command to 
     generate your own headings. For example:
 
     SET HEADINGS OFF
     WRITE 'Employee Name                 Phone Number  Ext'
     WRITE '----------------------------  ------------  ---'
     SELECT (empfname & emplname)=30, empphone=12, empext=3 +
     FROM employee
 
     Employee Name                  Phone Number  Ext
     ----------------------------   ------------  ---
      Peter Coffin                   649-4567     305
      Mary Simpson                   649-4567     303
      Sandi Watson                   649-4567     292
 
     Distinct
     By default, the SELECT command returns all rows of data. The DISTINCT 
     option can be specified in the SELECT clause to return only unique 
     rows of data. The DISTINCT option is followed by a list of columns, 
     and other valid items and operates only on the columns listed. If two 
     columns are listed, the DISTINCT option returns one row for each set 
     of rows where both column values are the same, ignoring data that is 
     not selected. For example:
 
     emptitle       emplname       empfname
     --------       --------       --------
     Manager        Johnson        Jerry
     Manager        Johnson        Susan
     Representative Johnson        Michael
 
     There are two rows where the values for emptitle and emplname are 
     identical. These rows compress into one row in the final result set 
     when the DISTINCT option is used and only the title and last name 
     columns are selected. If the first name was also selected, all three 
     rows would be returned since there are no duplicates.
 
     SELECT DISTINCT emptitle, emplname FROM employee 
     emptitle       emplname
     --------       --------
     Manager        Johnson
     Representative Johnson
 
     Into
     Data is selected directly from a table into variables by using the 
     INTO option, but you must include a WHERE clause to ensure that the 
     SELECT command finds only one row. The number of items in the SELECT 
     clause must match the number of variables in the INTO variable list. 
     The INTO variable list immediately follows the complete list of 
     columns. Don't put the corresponding variable next to the column. 
     For example, this syntax is wrong:
 
     SELECT custid INTO vcustid, company INTO vcompany FROM customer
 
     This syntax is correct:
 
     SELECT custid, company INTO vcustid, vcompany FROM customer
 
     The INTO variable list is often used with a GROUP BY clause to 
     retrieve summary data into variables.
 
     The FROM clause
     You can retrieve data from more than one table in the SELECT command. 
     To use more than one table, list all the columns you want to display 
     in the SELECT clause, list the tables in the FROM clause, and then 
     link the tables together in the WHERE clause. This is called a 
     multi-table SELECT.
 
     Multi-table SELECT commands are widely used in relational databases. 
     Information is cross-referenced among many tables, but detailed, 
     descriptive information is only stored in one table. For example, 
     a customer number is referenced in an orders table, the customer name 
     and address information is stored separately in a customer table. 
     When retrieving order data, a multi-table select is used to display 
     the customer name along with the order data.
 
     Multi-table SELECT commands commonly use alias names to identify the 
     tables. An alias name identifies which column goes with which table. 
     For example, T_1 is an alias name for the first table in the FROM 
     clause and T_2 is the alias name for the second table listed. Instead 
     of an alias name, you can use just the table name. These two commands 
     are equivalent:
 
      SELECT T_1.empid, T_1.hiredate, T_2.salesbonus +
      FROM employee T_1, salesbonus T_2 +
      WHERE T_1.empid = T_2.empid
 
      SELECT employee.empid, employee.hiredate,salesbonus.salesbonus +
      FROM employee, salesbonus +
      WHERE employee.empid = salesbonus.empid
 
     Either the alias name or the table name explicitly identifies from 
     which table to retrieve the data. This is necessary when linking the 
     tables in the WHERE clause of the command. If an alias name or table 
     name is not specified, R:BASE retrieves the data from the first 
     table in the FROM clause that contains the column. For example, the 
     following command always retrieves data for the empid column from 
     the Employee table.
 
      SELECT empid, hiredate, salesbonus +
      FROM employee, salesbonus +
      WHERE empid = empid
 
     R:BASE never looks at the data for the empid column from the 
     Salesbonus table. The link in the WHERE clause doesn't work correctly 
     because it compares the empid value from Employee with itself. Change 
     the command to use either the table name or an alias name in the 
     WHERE clause and the correct data is retrieved.
 
      SELECT empid, hiredate, salesbonus +
      FROM employee, salesbonus +
      WHERE employee.empid = salesbonus.empid
 
     The WHERE clause
     The WHERE clause of the SELECT command is used to put conditions on 
     the rows of data to be retrieved. Instead of getting all the rows, 
     you can specify conditions to identify a particular subset of data. 
     Multiple conditions are combined with AND or OR. The Command 
     Dictionary section in the R:BASE Reference Manual lists all of the 
     WHERE clause operators and conditions under the keyword WHERE.
 
     In addition, the WHERE clause links the tables in a multi-table 
     SELECT command. Here's an example of a three-table SELECT command 
     displaying companies, order numbers, and order detail information. 
     The WHERE clause links the three tables together:
 
      SELECT T_1.company, T_2.transid, T_3.model, T_3.units +
      FROM customer T_1, transaction T_2, transdetail T_3 +
      WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid
 
     The WHERE clause can include conditions to qualify the rows as well 
     as linking tables. For example, to select just data from September, 
     the above command is modified to include a condition on the transdate 
     column.
 
      SELECT T_1.company, T_2.transid, T_3.model, T_3.units +
      FROM customer T_1, transaction T_2, transdetail T_3 +
      WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid+
       AND transdate BETWEEN 9/1/94 AND 9/30/94
 
     An easy way to structure the WHERE clause of a multi-table SELECT 
     command is to put the links first, and then add other conditions.
 
     Sub-SELECTs
     A special option of a WHERE clause is the sub-SELECT. A sub-SELECT 
     creates a list of values that qualify a row for inclusion in the 
     final result set. A sub-SELECT uses any of the six clauses of the 
     SELECT command except ORDER BY. A sub-SELECT is always enclosed in 
     parentheses. Only one column, expression, or SELECT function is used 
     in the SELECT clause of a sub-SELECT. Usually the IN or NOT IN 
     operator is used to compare a column with a sub-SELECT although 
     other operators can be used. The IN and NOT IN operators return rows 
     when the comparison column is in (or not in) the list of values 
     returned by the sub-SELECT; it is very easy to understand the result 
     of the comparison.
 
     A sub-SELECT is often used to compare a column value to data in a 
     table not included in the FROM clause. For example, to see order 
     information about customers from California, a sub-SELECT is used 
     because the state information is not stored in the transmaster 
     table. 
 
     SELECT * +
     FROM transmaster +
     WHERE custid IN +
          (SELECT custid +
           FROM customer +
           WHERE custstate = 'CA')
 
     In many cases a sub-SELECT and multi-table SELECT are 
     interchangeable. Which one you use depends on the data values 
     displayed by the SELECT clause. In the above command, for example, 
     only data stored in the transmaster table is displayed. The company 
     name is not available, only the transmaster table is listed in the 
     FROM clause. To view the company name along with the transaction 
     data, a multi-table SELECT is used instead of a sub-SELECT and both 
     the transmaster and the customer tables are listed in the FROM clause.
 
     A sub-SELECT is very useful when a GROUP BY clause is used to 
     restrict the data, but a number of columns needs to be displayed, not 
     all of which should be included in the GROUP BY clause. For example, 
     the following command returns a list of customer ID numbers where the 
     customer has more than one order, and the total of all their orders 
     is more than $100,000.00.
 
      SELECT custid +
      FROM transmaster +
      WHERE netamount < $100,000 +
      GROUP BY custid +
      HAVING COUNT(*) > 1 +
             AND SUM(netamount) > $100,000
 
     Including other columns in the SELECT clause and thus in the GROUP BY 
     clause changes the grouping criteria. If you include the transaction 
     number, for example, no rows are returned because the transaction 
     number is unique and no single row meets the qualifying criteria of 
     both the WHERE and the HAVING clauses. A row becomes a group by 
     itself. Instead, use the SELECT with the GROUP BY clause in a sub-
     SELECT to view all the data in the transmaster table for those 
     customers meeting the sub-SELECT criteria.
 
     SELECT * +
     FROM transmaster +
     WHERE custid IN +
          ( SELECT custid +
            FROM transmaster +
            WHERE netamount < $100,000 +
            GROUP BY custid +
            HAVING COUNT(*) > 1 AND +
                   SUM(netamount) > $100,000)
 
     The GROUP BY clause
     The GROUP BY clause of the SELECT command is like a break point in a 
     report. This clause allows you to group the data, returning one row 
     of output for each different group of values. It's common to use 
     SELECT functions with the GROUP BY to get subtotals, averages, 
     minimums, maximums, and counts for the groups.
 
     All items listed in the SELECT clause must also be included in the 
     GROUP BY clause unless you use a SELECT function on the column. 
     Forgetting this rule is the most common mistake made with the GROUP 
     BY clause. For example, the following command won't work; a column 
     is included in the SELECT clause that is not included in the GROUP 
     BY clause. 
 
      SELECT col1, col2 FROM tblname GROUP BY col1
 
     Both of the following commands work:
 
      SELECT col1, col2 FROM tblname GROUP BY col1, col2
      SELECT col1, SUM(col2) FROM tblname GROUP BY col1
 
     Either both columns are included in the GROUP BY clause, or the 
     second column is used in a SELECT function. 
 
     A GROUP BY clause is similar to using the DISTINCT option in the 
     SELECT clause. The result set has one row for each unique set of 
     values. One difference is that a GROUP BY clause allows you to use 
     the SELECT functions in the SELECT clause.
 
     SELECT Functions
     The SELECT functions (SUM, AVG, MIN, MAX, and COUNT) are placed in 
     the SELECT clause of the SELECT command, but once you add a SELECT 
     function, you can't specify any other columns unless you use the 
     GROUP BY clause. A SELECT function returns a single value, not a 
     list of values. You can return a single value for the entire table, 
     or a single value for each group of rows.
 
     The following command is valid because it returns a single value, the 
     sum of all rows in the table. 
 
     SELECT SUM(netamount) FROM transmaster
 
     This next command is invalid because the SUM function returns a 
     single value, but the transid column returns a list of values. There 
     is no way R:BASE can display the output.
 
     SELECT transid, SUM(netamount) FROM transmaster
 
     Adding a GROUP BY clause allows R:BASE to calculate and display a sum 
     for each unique transid value.
 
     SELECT transid, SUM(netamount) FROM transmaster GROUP BY transid
 
     The SELECT functions do not include NULL values in their computations. 
     Generally this is fine unless you want to count all the rows in a 
     table. COUNT (colname) excludes NULL values from the count; COUNT (*) 
     includes NULL values in the count. For example, the following command 
     counts all rows including null values:
 
       SELECT COUNT(*) FROM tblname
 
     In general, unless you want to specifically count the non-NULL data 
     values in a column, always use COUNT(*). A common technique using the 
     COUNT(*) is to emulate the results of a TALLY command. The following 
     commands are equivalent:
 
     TALLY empid FROM transmaster
     SELECT empid, COUNT(*) FROM transmaster GROUP BY empid
 
     The advantage of using SELECT over TALLY is that the count can be 
     calculated for more than one column, or for data from more than one 
     table. Using SELECT, the employee name could be displayed with the 
     result, for example:
 
     SELECT empid, emplname, COUNT(*) +
     FROM transmaster, employee +
     WHERE transmaster.empid = employee.empid +
     GROUP BY empid, emplname
 
     Note that two tables are now listed in the FROM clause. They are 
     linked in the WHERE clause and both columns displayed are listed in 
     the GROUP BY clause. The result set is much easier to read than a 
     TALLY, which just lists the employee ID number and the count.
 
      empid      emplname         COUNT (*)
      ---------- ---------------- ----------
             102 Wilson                    6
             129 Hernandez                 2
             131 Simpson                   4
             133 Coffin                    2
             165 Williams                  5
             167 Watson                    1
 
     The HAVING clause
     The HAVING clause which is like a WHERE clause for the GROUP BY 
     clause is used to exclude certain groups from the final result set. 
     A WHERE clause excludes individual rows; a HAVING clause excludes 
     groups of rows_groups created by the GROUP BY clause.
 
     The HAVING clause syntax is identical to the WHERE clause syntax 
     except that the HAVING clause allows SELECT functions to be used in 
     comparisons. For example, to retrieve only groups having a sum 
     greater than a certain amount use this command:
 
      SELECT custid +
      FROM transmaster +
      GROUP BY custid +
      HAVING SUM(netamount) > $100,000
 
     A list of customers whose total transactions are more than $100,000 
     is returned.
 
     The HAVING clause is often used with a GROUP BY clause to check for 
     duplicate entries. The SELECT syntax that emulates a TALLY command is 
     used with a HAVING clause to return only groups with more than one 
     value. For example, to check for duplicate invoice entries use the 
     command:
 
     SELECT transid +
     FROM transmaster +
     GROUP BY transid +
     HAVING COUNT(*) > 1
 
     Only invoice numbers that occur more than once in the transmaster 
     table are returned. To view all the data about these invoices, use 
     the SELECT command in a sub-SELECT as in the following example:
 
     EDIT * FROM transmaster +
     WHERE transid IN +
           (SELECT transid +
            FROM transmaster +
            GROUP BY transid +
            HAVING COUNT(*) > 1)
 
 
     The ORDER BY clause
     The ORDER BY clause of the SELECT command is used to order the final 
     result set. After all the other clauses have executed, the resulting 
     data is ordered as specified.
 
     You can use the position number from the SELECT clause in the ORDER 
     BY clause to order by an expression or a function.
     For example, the following two commands are equivalent:
 
       SELECT custid, custstate FROM customer ORDER BY 2 
       SELECT custid, custstate FROM customer ORDER BY custstate
 
     The 2 in the ORDER BY clause represents the second item in the 
     SELECT clause of the command. This feature is used to sort by 
     expressions, SELECT functions, or literal text. For example, to sort 
     tally results in descending order, use the following SELECT command:
 
     SELECT empid, emplname, COUNT(*) +
     FROM transmaster, employee +
     WHERE transmaster.empid = employee.empid +
     GROUP BY empid, emplname +
     ORDER BY 2 DESC
 
     The COUNT function has no column name associated with it that can be 
     used in the ORDER BY clause. Instead, use its position in the SELECT 
     clause_2. The DESC indicates to put the largest number at the top of 
     the list.
 
      empid      emplname         COUNT (*)
      ---------- ---------------- ----------
             102 Wilson                    6
             165 Williams                  5
             131 Simpson                   4
             129 Hernandez                 2
             133 Coffin                    2
             167 Watson                    1
 
     UNION SELECT
     The UNION operator in a SELECT command joins two distinct SELECT 
     commands. The SELECT commands can be from the same or different 
     tables. Generally, use UNION SELECT when you want to append the 
     results of one SELECT onto the bottom of the results of another 
     SELECT to select mutually exclusive groups of data. By default, 
     UNION is DISTINCT; duplicate rows are not displayed. If you want 
     to see duplicate rows, use UNION ALL. The SELECT clauses of the 
     commands that you join with the UNION operator must each contain the 
     same number of items, and the data types of the items must match. 
     The SELECT clauses can, however, contain constants and expressions as 
     well as column names.
 
     The UNION operator joins the two SELECT command results into one 
     result set. A UNION SELECT can be used to perform an outer join, or 
     you can use the new outer join syntax of the SELECT command added 
     in R:BASE 4.5 Plus!. Refer to the article "Easy Outer Joins" in the 
     January/February 1994 Exchange for information about the new outer 
     join syntax and the article "Working With Outer Joins" in the 
     January/February 1993 Exchange (document #682 on the FAX server) for 
     more information on using UNION SELECT to create an outer join.
 
     Using SELECT in other commands
     Other commands also use SELECT to retrieve rows of data. The CREATE 
     VIEW, INSERT, and DECLARE CURSOR commands use SELECT to select the 
     rows of data used by that command. By using the #VALUES option on the 
     CHOOSE command, you can take advantage of some of the SELECT command 
     features to make better menus. The multi-table UPDATE command joins 
     the tables like a SELECT command. When you build and save a query 
     from QBE, you're actually building and saving a SELECT statement. The 
     SELECT command built by QBE is a subset of the SELECT command 
     available at the R> prompt.
 
     By using the SELECT command, you create faster and more efficient 
     code. In addition, you can test your command first by using only the 
     SELECT command. If data is correctly retrieved, then you know the 
     view is correct, the appropriate rows are inserted, and the 
     operations performed on each row by the DECLARE CURSOR command 
     operate on the right rows. This is an easy and quick way to verify 
     data before modifying it.
 
     The SELECT command is a language in itself. By using it, you'll write 
     faster and more efficient code, and often accomplish tasks with one 
     command instead of writing program code.