DOCUMENT #680 ======================================================================= BASIC STATISTICS: COUNT, AVERAGE, SUM, MINIMUM, MAXIMUM ======================================================================= PRODUCT: R:BASE VERSION : 3.1 or Higher ======================================================================= CATALOG: Programming In R:BASE AREA : Other ======================================================================= R:BASE contains over 70 SuperMath functions for computations in addition to the simple addition, subtraction, multiplication and division oper- ations. However, the SuperMath functions don't apply when doing a count, average, sum, minimum, or maximum on a group of rows from a table. The SuperMath functions LAVG, LMIN, and LMAX work across a row, i.e. they compute the average, minimum and maximum column values for a group of columns from a given row, they do not return the average, minimum or maximum value for a group of rows from a given column. Use the AVG, MIN or MAX function with the SELECT or COMPUTE command to return the value for a column. The LAVG, LMIN and LMAX SuperMath functions are not interchangeable with the AVG, MIN and MAX functions. The SELECT and COMPUTE commands also have COUNT and SUM functions. In addition to the functions used with SELECT and COMPUTE, count, average, sum, minimum, and maximum are calculated using a row-by-row processing method and the R:BASE programming language. A program does row-by-row processing using DECLARE CURSOR or performs the calculations with one command using SELECT or COMPUTE. Note that COMPUTE does provide additional calculations not available with SELECT: standard deviation and variance. A report is processed row-by-row; expressions are evaluated with each row that is read from the driving table or view. A report uses basically the same procedure to calculate the count, average, sum, minimum, or m aximum as in a program. Functions in the SELECT command can also be used in reports, but are evaluated on every row, not just once. This can slow down a report. The COMPUTE command functions are not available in reports. Before you start ================ Before you start, you need to know a little bit about your data. Does the column you're doing computations on contain NULL values? A NULL value is how R:BASE represents no data entered in a field. The SELECT and COMPUTE functions do not include NULL values in the computation, but with row-by-row processing methods you need to consider NULLs and explicitly exclude them, if necessary, to return a correct result. Also, when doing row-by-row processing, you must SET ZERO ON. The setting of ZERO tells R:BASE how to treat NULLs in calculations. If ZERO is OFF, then calculations involving NULLs return NULL. When ZERO is ON, NULLs are treated as if they were 0. SET ZERO ON to ensure totals accumulate correctly and comparisons evaluate properly. The count, average, sum, minimum, and maximum described below work on INTEGER, REAL, DOUBLE or CURRENCY datatype columns. They can be done on all the rows in a table or on a group of rows by adding an appropriate WHERE clause to the command. In a report, rows are grouped by using breakpoints. COUNT ===== Using one command ----------------- SELECT COUNT(*) FROM tblname - counts all the rows in a table, including NULL values - same as COMPUTE ROWS FROM tblname - can be used in a report, is evaluated for every row SELECT COUNT(colname) FROM tblname - counts all the rows where the specified column is not NULL - store the result in a variable SELECT COUNT(colname) INTO varname FROM tblname - same as COMPUTE COUNT colname FROM tblname - can be used in a report, is evaluated for every row; format is similar to a lookup expression varname INTEGER = COUNT(colname) IN tblname WHERE linkcol = linkcol Using row-by-row processing --------------------------- SET VAR vcount INTEGER = (.vcount + 1) - often called a counter variable - used in a program in WHILE or other looping structure - counts the number of rows processed - increments itself by one as each row is read - counts all rows, including NULLs - must use SET ZERO ON to accumulate the count - in a report is used without the SET VAR vcount INTEGER = (.vcount + 1) Conditional count ----------------- SET VAR vcount INTEGER = (IFEQ(name,value,(.vcount+1),.vcount)) - compares name and value, if they are equal the count is incremented by one; name and value can be columns (reports only), variables or constants - NULL values are considered equal - function can be nested if name and value are non-TEXT datatypes - use in a report without the SET VAR vcount INTEGER = (IFEQ(name,value,(.vcount+1),.vcount)) SUM === Using one command ----------------- SELECT SUM(colname) FROM tblname - calculates the sum of all non-NULL values in the specified column - store the result in a variable SELECT SUM(colname) INTO varname FROM tblname - variable is automatically datatyped based on the column - same as COMPUTE SUM colname FROM tblname - can be used in a report, is evaluated for every row, format is similar to a lookup expression varname = SUM(colname) IN tblname WHERE linkcol = linkcol Using row-by-row processing --------------------------- SET VAR vtotal= (.vtotal + value) - must initialize the variables - often called an accumulator variable or a running total - value can be a column (reports only), variable or constant - adds value to the variable vtotal on a row-by-row basis - must use SET ZERO ON to accumulate the total - use in a report without the SET VAR vtotal = (.vtotal + value) Conditional sum --------------- SET VAR vtotal = (IFEQ(name, value1,(.vtotal + value2), .vtotal)) - compares name and value1, if they are equal, adds value2 to the variable vtotal - name, value1 and value2 can be columns (reports only), variables or constants - function can be nested if name and value1 are non-TEXT datatypes - use in a report without the SET VAR vtotal = (IFEQ(name, value1,(.vtotal + value2),.vtotal)) AVERAGE ======= Using one command ----------------- SELECT AVG(colname) FROM tblname - calculates the average of all non-NULL values in the specified column - store the result in a variable SELECT AVG(colname) INTO varname FROM tblname - variable is automatically datatyped based on the column - same as COMPUTE AVG colname FROM tblname - can be used in a report, is evaluated for every row, format is similar to a lookup expression varname = AVG(colname) IN tblname WHERE linkcol = linkcol Using row-by-row processing --------------------------- An average is a sum divided by a count. First count the rows, then sum the rows, then divide the sum by the count. The sum and the count can be the result of SELECT functions, of row-by-row processing, or of a conditional operation. If there are NULLs in the data, be sure to exclude them from the count. If you don't, the average will be wrong. They will be counted and included in the average as zero values. SELECT COUNT(*) INTO var_count FROM tblname SELECT SUM(colname) INTO var_sum FROM tblname SET VAR var_avg = (.var_sum/.var_count) MINIMUM ======= Using one command ----------------- SELECT MIN(colname) FROM tblname - calculates the minimum value of all non-NULL values in the specified column - store the result in a variable SELECT MIN(colname) INTO varname FROM tblname - variable is automatically datatyped based on the column - same as COMPUTE MIN colname FROM tblname - can be used in a report, is evaluated for every row, format is similar to a lookup expression variable = MIN(colname) IN tblname WHERE linkcol = linkcol Using row-by-row processing --------------------------- In a program, you get the first row from the table, then compare that value with each succeeding row. If the new value is less, it becomes the minimum. For example, SET VAR vmin = colname IN tblname WHERE LIMIT=1 DECLARE c1 CURSOR FOR SELECT colname FROM tblname OPEN c1 FETCH c1 INTO vcol ind1 WHILE SQLCODE <> 100 THEN SET VAR vmin = (IFLT(.vcol,.vmin,.vcol,.vmin)) FETCH c1 INTO vcol ind1 ENDW In a report, the process is similar, but because you can only set variables, the logic is a little bit different. Use a counter variable to determine the initial value; if the counter is 1, it is the first row. Then, compare each row, replacing the minimum if the new value is smaller. If computing the minimum for each group, reset the variable vcount at the breakpoint. vcount INTEGER = (.vcount + 1) vmin = (IFEQ(.vcount,1,colname,(IFLT(colname,.vmin, colname,.vmin)) )) MAXIMUM ======= SELECT MAX(colname) FROM tblname - calculates the maximum of all non-NULL values in the specified column - store the result in a variable SELECT MAX(colname) INTO varname FROM tblname - variable is automatically datatyped based on the column - same as COMPUTE MAX colname FROM tblname - can be used in a report, is evaluated every row, format is similar to a lookup expression varname = MAX(colname) IN tblname WHERE linkcol = linkcol Using row-by-row processing --------------------------- In a program you get the first value from the table, then compare that with each succeeding row. If the new value is greater, it becomes the maximum For example, SET VAR vmax = colname IN tblname WHERE LIMIT=1 DECLARE c1 CURSOR FOR SELECT colname FROM tblname OPEN c1 FETCH c1 INTO vcol ind1 WHILE SQLCODE <> 100 THEN SET VAR vmax = (IFGT(.vcol,.vmax,.vcol,.vmax)) FETCH c1 INTO vcol ind1 ENDW In a report, the process is similar, but because you can only set variables, the logic is a little bit different. Use a counter variable to determine the initial value, if the counter is 1, it is the first row. Then compare each row, replacing the maximum if the new value is greater. If computing the maximum for each group, reset the variable vcount at the breakpoint. vcount INTEGER = (.vcount + 1) vmax = (IFEQ(.vcount,1,colname,(IFGT(colname,.vmax, colname,.vmax)) ))