791.TXT ===================================================================== Using SELECT ===================================================================== PRODUCT: R:BASE VERSION: 3.1 or Higher ===================================================================== CATALOG: Programming in R:BASE AREA : Select ===================================================================== SELECT is a very powerful and useful command. It is used throughout R:BASE, not just in commands at the R> prompt. For example, the SELECT command is used when you save a view in QBE, create a rule, define a cursor, and anytime R:BASE retrieves data. Understanding the workings of the SELECT command improves your ability to use R:BASE effectively. SELECT Command Parts Although the SELECT command syntax looks intimidating, you can master it by breaking it down into its component parts. The SELECT command is made up of six parts or clauses, each defined by a keyword: SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. SELECT_defines what is displayed by the command. The SELECT clause lists the columns, literal text, SELECT functions (SUM, AVG, COUNT, MIN, and MAX), and expressions that are manipulated and displayed by the SELECT command. The columns listed in this clause must be in the tables listed in the FROM clause. The SELECT clause is required in all SELECT commands. FROM_lists the tables from which data is to be displayed. Tables listed in this clause must match the columns listed in the SELECT clause. More than one table can be listed. If more than one table is listed, a WHERE clause is required to link the tables. The FROM clause is required in all SELECT commands. WHERE_limits the rows of data retrieved from the tables listed in the FROM clause and links the tables in a multi- table SELECT. Columns used in this clause must be in the tables listed in the FROM clause, but do not need to be included in the SELECT clause. You can put conditions on columns that are not displayed. The WHERE clause is optional unless more than one table is listed in the FROM clause, then the WHERE clause is required. GROUP BY_groups data for display and performs computations using the SELECT functions. All columns in the SELECT clause must be listed here unless they are used in one of the SELECT functions. The GROUP BY clause is optional. HAVING_assigns WHERE clause type criteria to each group defined by a GROUP BY clause. The WHERE clause limits the rows of data selected by the command based on tables in the FROM clause; the HAVING clause limits the groups of data selected based on columns in the GROUP BY clause. The HAVING clause is optional. ORDER BY_sorts the rows of data retrieved by the other clauses of the command. The sorting is done on the final set of data after all the clauses of the command have been completed. The ORDER BY clause is optional. These are the basic clauses of the SELECT command. A complete SELECT command can have many clauses. One option of a WHERE clause, for example, is a sub-SELECT, another entire SELECT command. To make a SELECT command easy to read and understand, place each clause on a separate line, using the continuation character (+) to tell R:BASE that the entire command encompasses more than one line. For example, SELECT empid, custid, COUNT(transid), SUM(netamount) + FROM transmaster + WHERE netamount < $100,000 + GROUP BY empid, custid + HAVING COUNT(transid) > 1 + AND SUM(netamount) > $100,000 ORDER BY 4 DESC By writing the command like this, it is easy to see that the columns in the SELECT clause are in the GROUP BY, and the FROM tables match the column list. This SELECT command is relatively simple even though it uses all the clauses; only one table and a few columns are being selected. The benefits are even more obvious with SELECT commands that retrieve many columns from more than one table and have complex WHERE clauses. The SELECT clause In addition to columns, variables, text strings and expressions can be included in the SELECT clause part of the SELECT command. Any of the R:BASE SuperMath functions can be used to perform calculations on the data retrieved and columns can be concatenated for better display. Often, when using variables or text strings in a SELECT, you also SET HEADINGS OFF. This setting turns off the display of the default column headings of the SELECT command. Instead, use the WRITE command to generate your own headings. For example: SET HEADINGS OFF WRITE 'Employee Name Phone Number Ext' WRITE '---------------------------- ------------ ---' SELECT (empfname & emplname)=30, empphone=12, empext=3 + FROM employee Employee Name Phone Number Ext ---------------------------- ------------ --- Peter Coffin 649-4567 305 Mary Simpson 649-4567 303 Sandi Watson 649-4567 292 Distinct By default, the SELECT command returns all rows of data. The DISTINCT option can be specified in the SELECT clause to return only unique rows of data. The DISTINCT option is followed by a list of columns, and other valid items and operates only on the columns listed. If two columns are listed, the DISTINCT option returns one row for each set of rows where both column values are the same, ignoring data that is not selected. For example: emptitle emplname empfname -------- -------- -------- Manager Johnson Jerry Manager Johnson Susan Representative Johnson Michael There are two rows where the values for emptitle and emplname are identical. These rows compress into one row in the final result set when the DISTINCT option is used and only the title and last name columns are selected. If the first name was also selected, all three rows would be returned since there are no duplicates. SELECT DISTINCT emptitle, emplname FROM employee emptitle emplname -------- -------- Manager Johnson Representative Johnson Into Data is selected directly from a table into variables by using the INTO option, but you must include a WHERE clause to ensure that the SELECT command finds only one row. The number of items in the SELECT clause must match the number of variables in the INTO variable list. The INTO variable list immediately follows the complete list of columns. Don't put the corresponding variable next to the column. For example, this syntax is wrong: SELECT custid INTO vcustid, company INTO vcompany FROM customer This syntax is correct: SELECT custid, company INTO vcustid, vcompany FROM customer The INTO variable list is often used with a GROUP BY clause to retrieve summary data into variables. The FROM clause You can retrieve data from more than one table in the SELECT command. To use more than one table, list all the columns you want to display in the SELECT clause, list the tables in the FROM clause, and then link the tables together in the WHERE clause. This is called a multi-table SELECT. Multi-table SELECT commands are widely used in relational databases. Information is cross-referenced among many tables, but detailed, descriptive information is only stored in one table. For example, a customer number is referenced in an orders table, the customer name and address information is stored separately in a customer table. When retrieving order data, a multi-table select is used to display the customer name along with the order data. Multi-table SELECT commands commonly use alias names to identify the tables. An alias name identifies which column goes with which table. For example, T_1 is an alias name for the first table in the FROM clause and T_2 is the alias name for the second table listed. Instead of an alias name, you can use just the table name. These two commands are equivalent: SELECT T_1.empid, T_1.hiredate, T_2.salesbonus + FROM employee T_1, salesbonus T_2 + WHERE T_1.empid = T_2.empid SELECT employee.empid, employee.hiredate,salesbonus.salesbonus + FROM employee, salesbonus + WHERE employee.empid = salesbonus.empid Either the alias name or the table name explicitly identifies from which table to retrieve the data. This is necessary when linking the tables in the WHERE clause of the command. If an alias name or table name is not specified, R:BASE retrieves the data from the first table in the FROM clause that contains the column. For example, the following command always retrieves data for the empid column from the Employee table. SELECT empid, hiredate, salesbonus + FROM employee, salesbonus + WHERE empid = empid R:BASE never looks at the data for the empid column from the Salesbonus table. The link in the WHERE clause doesn't work correctly because it compares the empid value from Employee with itself. Change the command to use either the table name or an alias name in the WHERE clause and the correct data is retrieved. SELECT empid, hiredate, salesbonus + FROM employee, salesbonus + WHERE employee.empid = salesbonus.empid The WHERE clause The WHERE clause of the SELECT command is used to put conditions on the rows of data to be retrieved. Instead of getting all the rows, you can specify conditions to identify a particular subset of data. Multiple conditions are combined with AND or OR. The Command Dictionary section in the R:BASE Reference Manual lists all of the WHERE clause operators and conditions under the keyword WHERE. In addition, the WHERE clause links the tables in a multi-table SELECT command. Here's an example of a three-table SELECT command displaying companies, order numbers, and order detail information. The WHERE clause links the three tables together: SELECT T_1.company, T_2.transid, T_3.model, T_3.units + FROM customer T_1, transaction T_2, transdetail T_3 + WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid The WHERE clause can include conditions to qualify the rows as well as linking tables. For example, to select just data from September, the above command is modified to include a condition on the transdate column. SELECT T_1.company, T_2.transid, T_3.model, T_3.units + FROM customer T_1, transaction T_2, transdetail T_3 + WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid+ AND transdate BETWEEN 9/1/94 AND 9/30/94 An easy way to structure the WHERE clause of a multi-table SELECT command is to put the links first, and then add other conditions. Sub-SELECTs A special option of a WHERE clause is the sub-SELECT. A sub-SELECT creates a list of values that qualify a row for inclusion in the final result set. A sub-SELECT uses any of the six clauses of the SELECT command except ORDER BY. A sub-SELECT is always enclosed in parentheses. Only one column, expression, or SELECT function is used in the SELECT clause of a sub-SELECT. Usually the IN or NOT IN operator is used to compare a column with a sub-SELECT although other operators can be used. The IN and NOT IN operators return rows when the comparison column is in (or not in) the list of values returned by the sub-SELECT; it is very easy to understand the result of the comparison. A sub-SELECT is often used to compare a column value to data in a table not included in the FROM clause. For example, to see order information about customers from California, a sub-SELECT is used because the state information is not stored in the transmaster table. SELECT * + FROM transmaster + WHERE custid IN + (SELECT custid + FROM customer + WHERE custstate = 'CA') In many cases a sub-SELECT and multi-table SELECT are interchangeable. Which one you use depends on the data values displayed by the SELECT clause. In the above command, for example, only data stored in the transmaster table is displayed. The company name is not available, only the transmaster table is listed in the FROM clause. To view the company name along with the transaction data, a multi-table SELECT is used instead of a sub-SELECT and both the transmaster and the customer tables are listed in the FROM clause. A sub-SELECT is very useful when a GROUP BY clause is used to restrict the data, but a number of columns needs to be displayed, not all of which should be included in the GROUP BY clause. For example, the following command returns a list of customer ID numbers where the customer has more than one order, and the total of all their orders is more than $100,000.00. SELECT custid + FROM transmaster + WHERE netamount < $100,000 + GROUP BY custid + HAVING COUNT(*) > 1 + AND SUM(netamount) > $100,000 Including other columns in the SELECT clause and thus in the GROUP BY clause changes the grouping criteria. If you include the transaction number, for example, no rows are returned because the transaction number is unique and no single row meets the qualifying criteria of both the WHERE and the HAVING clauses. A row becomes a group by itself. Instead, use the SELECT with the GROUP BY clause in a sub- SELECT to view all the data in the transmaster table for those customers meeting the sub-SELECT criteria. SELECT * + FROM transmaster + WHERE custid IN + ( SELECT custid + FROM transmaster + WHERE netamount < $100,000 + GROUP BY custid + HAVING COUNT(*) > 1 AND + SUM(netamount) > $100,000) The GROUP BY clause The GROUP BY clause of the SELECT command is like a break point in a report. This clause allows you to group the data, returning one row of output for each different group of values. It's common to use SELECT functions with the GROUP BY to get subtotals, averages, minimums, maximums, and counts for the groups. All items listed in the SELECT clause must also be included in the GROUP BY clause unless you use a SELECT function on the column. Forgetting this rule is the most common mistake made with the GROUP BY clause. For example, the following command won't work; a column is included in the SELECT clause that is not included in the GROUP BY clause. SELECT col1, col2 FROM tblname GROUP BY col1 Both of the following commands work: SELECT col1, col2 FROM tblname GROUP BY col1, col2 SELECT col1, SUM(col2) FROM tblname GROUP BY col1 Either both columns are included in the GROUP BY clause, or the second column is used in a SELECT function. A GROUP BY clause is similar to using the DISTINCT option in the SELECT clause. The result set has one row for each unique set of values. One difference is that a GROUP BY clause allows you to use the SELECT functions in the SELECT clause. SELECT Functions The SELECT functions (SUM, AVG, MIN, MAX, and COUNT) are placed in the SELECT clause of the SELECT command, but once you add a SELECT function, you can't specify any other columns unless you use the GROUP BY clause. A SELECT function returns a single value, not a list of values. You can return a single value for the entire table, or a single value for each group of rows. The following command is valid because it returns a single value, the sum of all rows in the table. SELECT SUM(netamount) FROM transmaster This next command is invalid because the SUM function returns a single value, but the transid column returns a list of values. There is no way R:BASE can display the output. SELECT transid, SUM(netamount) FROM transmaster Adding a GROUP BY clause allows R:BASE to calculate and display a sum for each unique transid value. SELECT transid, SUM(netamount) FROM transmaster GROUP BY transid The SELECT functions do not include NULL values in their computations. Generally this is fine unless you want to count all the rows in a table. COUNT (colname) excludes NULL values from the count; COUNT (*) includes NULL values in the count. For example, the following command counts all rows including null values: SELECT COUNT(*) FROM tblname In general, unless you want to specifically count the non-NULL data values in a column, always use COUNT(*). A common technique using the COUNT(*) is to emulate the results of a TALLY command. The following commands are equivalent: TALLY empid FROM transmaster SELECT empid, COUNT(*) FROM transmaster GROUP BY empid The advantage of using SELECT over TALLY is that the count can be calculated for more than one column, or for data from more than one table. Using SELECT, the employee name could be displayed with the result, for example: SELECT empid, emplname, COUNT(*) + FROM transmaster, employee + WHERE transmaster.empid = employee.empid + GROUP BY empid, emplname Note that two tables are now listed in the FROM clause. They are linked in the WHERE clause and both columns displayed are listed in the GROUP BY clause. The result set is much easier to read than a TALLY, which just lists the employee ID number and the count. empid emplname COUNT (*) ---------- ---------------- ---------- 102 Wilson 6 129 Hernandez 2 131 Simpson 4 133 Coffin 2 165 Williams 5 167 Watson 1 The HAVING clause The HAVING clause which is like a WHERE clause for the GROUP BY clause is used to exclude certain groups from the final result set. A WHERE clause excludes individual rows; a HAVING clause excludes groups of rows_groups created by the GROUP BY clause. The HAVING clause syntax is identical to the WHERE clause syntax except that the HAVING clause allows SELECT functions to be used in comparisons. For example, to retrieve only groups having a sum greater than a certain amount use this command: SELECT custid + FROM transmaster + GROUP BY custid + HAVING SUM(netamount) > $100,000 A list of customers whose total transactions are more than $100,000 is returned. The HAVING clause is often used with a GROUP BY clause to check for duplicate entries. The SELECT syntax that emulates a TALLY command is used with a HAVING clause to return only groups with more than one value. For example, to check for duplicate invoice entries use the command: SELECT transid + FROM transmaster + GROUP BY transid + HAVING COUNT(*) > 1 Only invoice numbers that occur more than once in the transmaster table are returned. To view all the data about these invoices, use the SELECT command in a sub-SELECT as in the following example: EDIT * FROM transmaster + WHERE transid IN + (SELECT transid + FROM transmaster + GROUP BY transid + HAVING COUNT(*) > 1) The ORDER BY clause The ORDER BY clause of the SELECT command is used to order the final result set. After all the other clauses have executed, the resulting data is ordered as specified. You can use the position number from the SELECT clause in the ORDER BY clause to order by an expression or a function. For example, the following two commands are equivalent: SELECT custid, custstate FROM customer ORDER BY 2 SELECT custid, custstate FROM customer ORDER BY custstate The 2 in the ORDER BY clause represents the second item in the SELECT clause of the command. This feature is used to sort by expressions, SELECT functions, or literal text. For example, to sort tally results in descending order, use the following SELECT command: SELECT empid, emplname, COUNT(*) + FROM transmaster, employee + WHERE transmaster.empid = employee.empid + GROUP BY empid, emplname + ORDER BY 2 DESC The COUNT function has no column name associated with it that can be used in the ORDER BY clause. Instead, use its position in the SELECT clause_2. The DESC indicates to put the largest number at the top of the list. empid emplname COUNT (*) ---------- ---------------- ---------- 102 Wilson 6 165 Williams 5 131 Simpson 4 129 Hernandez 2 133 Coffin 2 167 Watson 1 UNION SELECT The UNION operator in a SELECT command joins two distinct SELECT commands. The SELECT commands can be from the same or different tables. Generally, use UNION SELECT when you want to append the results of one SELECT onto the bottom of the results of another SELECT to select mutually exclusive groups of data. By default, UNION is DISTINCT; duplicate rows are not displayed. If you want to see duplicate rows, use UNION ALL. The SELECT clauses of the commands that you join with the UNION operator must each contain the same number of items, and the data types of the items must match. The SELECT clauses can, however, contain constants and expressions as well as column names. The UNION operator joins the two SELECT command results into one result set. A UNION SELECT can be used to perform an outer join, or you can use the new outer join syntax of the SELECT command added in R:BASE 4.5 Plus!. Refer to the article "Easy Outer Joins" in the January/February 1994 Exchange for information about the new outer join syntax and the article "Working With Outer Joins" in the January/February 1993 Exchange (document #682 on the FAX server) for more information on using UNION SELECT to create an outer join. Using SELECT in other commands Other commands also use SELECT to retrieve rows of data. The CREATE VIEW, INSERT, and DECLARE CURSOR commands use SELECT to select the rows of data used by that command. By using the #VALUES option on the CHOOSE command, you can take advantage of some of the SELECT command features to make better menus. The multi-table UPDATE command joins the tables like a SELECT command. When you build and save a query from QBE, you're actually building and saving a SELECT statement. The SELECT command built by QBE is a subset of the SELECT command available at the R> prompt. By using the SELECT command, you create faster and more efficient code. In addition, you can test your command first by using only the SELECT command. If data is correctly retrieved, then you know the view is correct, the appropriate rows are inserted, and the operations performed on each row by the DECLARE CURSOR command operate on the right rows. This is an easy and quick way to verify data before modifying it. The SELECT command is a language in itself. By using it, you'll write faster and more efficient code, and often accomplish tasks with one command instead of writing program code.