820.TXT ===================================================================== Optimizing Application Code ===================================================================== PRODUCT: R:BASE VERSION: 4.5+ and above ===================================================================== CATALOG: General Inforation AREA : Memory Management ===================================================================== Because the most obvious or logical approach to a programming task is not necessarily the fastest, optimizing application code is a necessary skill. This article gives some specific examples of different ways to accomplish the same tasks and shows the speed differences between them. It also provides some general techniques for improving the performance of your applications. How can you tell what is the fastest way to accomplish a programming task? The only way to tell for sure is to try each command or set of commands, time it, and compare the times. Often a sequence of commands will be fast on one database but slow on another because the structure (tables and columns) and data of the databases are different. Speed can vary from database to database and from machine to machine. Times given below for different programming techniques are for comparison purposes only. Actual times that you will see using the same methods will vary on the basis of your hardware and your database. Timing Code Use the following commands to find the execution time for a section of application code: SET TIME FORMAT HH:MM:SS.sss SET VAR vstart = .#time -- commands to time go here SET VAR vend = .#time SET VAR vdiff = (.vend - .vstart) SET VAR vtime = (RTIME(0,0,0,.vdiff)) The result, stored in the variable vtime, is in an easy to read hours, minutes, and seconds format. When checking the performance of a SELECT or a PRINT command, set the output to a temporary file and set LINES to 0. This tests the time to select all the data for the command. For example: SET LINES 0 SET TIME FORMAT HH:MM:SS.sss SET VAR vstart = .#time OUTPUT temp.dat PRINT invoices OUTPUT SCREEN SET VAR vend = .#time SET VAR vdiff = (.vend - .vstart) SET VAR vtime = (RTIME(0,0,0,.vdiff)) SET LINES 20 To test how long it takes to retrieve a single row of data, use the WHERE clause, WHERE LIMIT = 1. The LIMIT keyword can be combined with other WHERE clause conditions to limit the number of rows retrieved. For example: SELECT * FROM transmaster WHERE transdate + BETWEEN 1/1/95 AND 6/31/95 AND LIMIT=1 This command retrieves the first row that meets the WHERE clause condition. Use the Most Current Version of R:BASE Microrim is continually optimizing R:BASE and each new release is faster than the previous one. In addition to performance enhancements included in a new release, there are feature enhancements that enable you to modify and reduce your program code for further performance increases. An application written in R:BASE 3.1, for example, used DECLARE CURSOR to update a column in one table with values from another table. This process took 0:00:07.14. DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx + GROUP BY invid OPEN c1 FETCH c1 INTO vinvid ind1,vprice ind2 WHILE sqlcode <> 100 THEN UPDATE invoice SET invtotal = .vprice WHERE invid=.vinvid FETCH c1 INTO vinvid ind1,vprice ind2 ENDW R:BASE 4.0 added the multi-table option to the UPDATE command. Instead of a DECLARE CURSOR, you now can do the update using a view and one UPDATE command. By keeping abreast of feature enhancements in new versions and updating application code, the process time was reduced almost in half to 0:00:04.29. CREATE VIEW v_trans (invid, amount) AS SELECT invid, SUM(tprice) + FROM transx GROUP BY invid UPDATE invoice SET invtotal = amount FROM invoice ,v_trans T2 + WHERE invoice .invid = T2.invid Think Globally Instead of Row-by-Row One of the best ways to achieve significant speed improvements is to use a global command rather than a row-by- row command. Instead of doing row-by-row processing using a DECLARE CURSOR, for example, try using a single UPDATE or INSERT command, which takes advantage of the SQL select capabilities of R:BASE. For example, suppose you want to create a table to hold information about how many of each product a customer has purchased. The most logical approach might be to set up a cursor to go through the detail table and count the rows for each customer, product combination and then insert that data into the new table. On a table with 3500 rows, processing all the rows took 1:15:00 (one hour, 15 minutes). By replacing the DECLARE CURSOR loop with a single insert command, the time required to process all the rows was cut to 0:09:00 (nine minutes). Compare the two pieces of code: 1. The DECLARE CURSOR code. Commands are repeated for each customer, product combination. DECLARE c1 CURSOR FOR SELECT cust#, partno + FROM cproddet GROUP BY cust# ,partno OPEN c1 FETCH c1 INTO vcust ind1, vpartno ind2 WHILE SQLCODE <> 100 THEN SELECT COUNT(*) INTO vcount FROM cproddet + WHERE cust# = .vcust AND partno = .vpartno INSERT INTO custprod (cust#, partno, copies) + VALUES (.vcust, .vpartno, .vcount) FETCH c1 INTO vcust ind1, vpartno ind2 ENDW DROP CURSOR c1 2. The SQL code. A single command is used instead of a series of repeated commands. One command is almost always faster than repeating a group of commands. INSERT INTO custprod (cust#, partno, copies) + SELECT cust#, partno, count(*) + FROM cproddet GROUP BY cust# ,partno Reduce the Number of Expressions in Reports Because all report expressions are processed for each row of data read from the driving table or view of the report, reducing the number of expressions increases the printing speed of a report. An alternative to reducing the number of expressions is to use the section evaluation feature added in R:BASE 4.5 Plus!. This feature allows you to indicate the section, F1, for example, at which an expression is evaluated. The expression is then not evaluated at every row, but only when that section is processed. Either method can reduce printing time almost by half. If you have many lookups in a report, basing the report on a view instead of on a table will usually be faster. A view combines all the data just once; a lookup needs to find the data for each row in the driving table, even if the lookup is printing only on a header line. A sample report, based on a table with 8000 rows and containing 14 lookup expressions (24 total expressions), took 0:20:45.54 to print. As an alternative, create a four-table view and base the report on the view. Remove the lookup expressions from the report, leaving only 10 expressions to be processed each row. Printing the report by using a view instead of lookup expressions takes only 0:15:16.98, a definite performance improvement. Using the report section evaluation feature introduced in 4.5 Plus! is even better. The original report with 14 lookups prints in only 0:12:57.41 when the lookup expressions are specified to evaluate only at the section (header or footer) where they are located. The report based on the view does not have a significant performance increase using the section evaluation as most of the expressions are summing data and must be evaluated at the detail section. The performance increase you see depends on your report design and data. R:BASE does not require a global variable to be equated to a report variable; you can locate global variables on the report or reference them directly in an expression. For example, you don't need to define RPTDATE = . #DATE and then locate RPTDATE; just locate #DATE directly. Set constant variables outside the report. They are evaluated only once instead of for every row. Abbreviate Commands to Four Characters R:BASE parses command lines into 4-byte tokens_the fewer tokens to read, the faster the command executes. Abbreviating commands to three characters does not provide any additional performance. Combine Commands Whenever Possible Many R:BASE commands allow access to more than one column or variable and don't require using separate commands for each. This reduces the number of times R:BASE must parse and read a command. The fewer commands to read, the faster R:BASE executes. The most frequently used commands that fit into this category are SET VAR, UPDATE, SELECT, COMPUTE, and WRITE. For example, assign data types to variables and set the variables to values in one command, like this: SET VAR v1 TEXT = 'abcd', v2 INTEGER = 100, + v3 DATE = .#date Make Cursors Faster Sometimes a task can be accomplished only by stepping through each row in a table. If you need to use a DECLARE CURSOR routine, make it as fast as possible by using the following tips: Include all columns and computations in the DECLARE CURSOR statement so you can FETCH the values once instead of doing SET VAR commands repeatedly in the WHILE loop. Take advantage of SQL SELECT when doing your DECLARE CURSOR. This can reduce the number of times the commands in the WHILE loop must be repeated, thus improving performance. Compare the following two code excerpts: 1.This example using the SUM function directly in the DECLARE CURSOR statement took about three minutes on an 8,000-row transaction table: DECLARE C1 CURSOR FOR SELECT invid, SUM(tprice) + FROM transx GROUP BY invid OPEN c1 FETCH c1 INTO vinvid ind1, vprice ind2 WHILE sqlcode <> 100 THEN UPDATE invoice SET invtotal = .vprice + WHERE invid = .vinvid FETCH c1 INTO vinvid ind1, vprice ind2 ENDW 2.By contrast, doing the SUM within the WHILE loop for each row took about 12 minutes, or four times as long. SET VAR vtotprice CURR = 0 SET VAR voldinv = invid IN transx WHERE LIMIT=1 DECLARE C1 CURSOR FOR SELECT invid, tprice + FROM transx ORDER BY invid OPEN c1 FETCH c1 INTO vinvid ind1, vprice ind2 WHILE sqlcode <> 100 THEN SET VAR vtotprice = (.vtotprice + .vprice) IF vinvid <> voldinv THEN UPDATE invoice SET invtotal = .vtotprice WHERE invid = .voldinv SET VAR vtotprice = 0 ENDIF SET VAR voldinv = .vinvid FETCH c1 INTO vinvid ind1, vprice ind2 ENDW Fetch the index columns into variables and then use WHERE colname=.varname instead of WHERE CURRENT OF cursor. Using an indexed where clause is slightly faster. When nesting cursors, do all the DECLAREs at the top of the file. Nest the OPEN and FETCH commands, not the DECLARE CURSOR. When nesting cursors, use the RESET option on the OPEN cursor command. R:BASE does not need to reprocess the entire query when you use RESET, instead, R:BASE just re-evaluates the WHERE clause. Try Both Sub-Selects and Multi-Table Selects In R:BASE 4.5 and higher, a multi-table select uses indexes to join the tables, a correlated sub-select uses indexes to restrict the list of values to be compared within the sub-select, and a plain sub-select uses a temporary index created by R:BASE. Because the index is already created, a multi-table select or a correlated sub-select is slightly faster than a plain sub-select. The amount of data selected, number of matches, and number of unique matches all affect the performance. To illustrate, compare the results obtained by using the different techniques on two tables, each with 1,000 unique rows (each row in table1 has only one match in table2). 1.A multi-table select, shown below, took 0:00:01.54. SELECT * FROM table1,table2 + WHERE table1.linkcol=table2.linkcol 2.A correlated sub-select, which looked at every row in table1 but needed to look at only one row in table2 for each row in table1, took 0:00:01.37. The code is shown below: SELECT collist FROM table1 WHERE linkcol IN + (SELECT linkcol FROM table2 + WHERE table2.linkcol=table1.linkcol) 3.A plain sub-select, using a temporary index built by R:BASE when the command executed took slightly longer, 0:00:2.31. This example code, is shown below: SELECT collist FROM table1 WHERE linkcol IN + (SELECT linkcol FROM table2) SELECT is a command in which your data can affect the performance. The results you see and the method you find that works the best depends on the number of rows and distribution of data in the tables you are combining. Also, the different commands retrieve different columns of data for display. Use the MICRORIM variables In R:BASE 4.5, optimization variables were added to allow application developers more control over the R:BASE environment. The ones most commonly used allow you to increase the default buffers for reading in data and indexes. The variables MICRORIM_F2MAXBLK and MICRORIM_TMPMAXBLK increase the number of buffers for reading in data. The variables MICRORIM_F3MAXBLK and MICRORIM_BTMAXBLK increase the number of buffers for reading in indexes. To use the variables, set them to the desired number of buffers before connecting to a database. Usually you set all four variables. For example: SET VAR MICRORIM_F2MAXBLK = 64, + MICRORIM_TMPMAXBLK = 64, + MICRORIM_F3MAXBLK = 256, + MICRORIM_F2MAXBLK = 256 CONNECT concomp In the above example of joining two 1,000 row tables, the performance of the plain sub-select was significantly improved by setting the MICRORIM variables. The multi-table select and the correlated sub-select are already fairly well optimized with their use of indexes and the performance improvement is not as great. Experiment with using these variables to see if they will increase performance of your application. For more information about the MICRORIM variables refer to the R:BASE 4.5 Plus! Startup & New Features Guide, Inside R:BASE, or R:BASE 5.1 on-line Help (Reference Topics, Optimizing R:BASE 4.5 and Higher) Databases, Cache Memory Buffers). Practice Smart Indexing In a WHERE clause, R:BASE uses only one index (except when joining tables). If more than one indexed column is referenced in a WHERE clause, R:BASE looks for what it considers the "best" index_the one that places the greatest restriction on the number of rows returned. An index on a column that contains unique values is always better than an index on a column that can have duplicate values. R:BASE also looks to see what operator is being used. R:BASE first looks for an indexed column using =, then checks for IS NULL, and finally looks at BETWEEN. The following example uses the index on indexcol2 because it uses = and R:BASE assumes that comparison restricts the query to fewer rows than the BETWEEN: WHERE indexcol1 BETWEEN .var1 AND .var2 + AND indexcol2 = .var3 The next example uses the index on indexcol2 because both conditions use = and indexcol2 is an index with unique data and indexcol1 is an index on a column with duplicate data. The unique index is the "best" index. WHERE indexcol1 =.var1 AND indexcol2 = .var2 This example won't use indexes because the conditions are joined with OR. R:BASE does not use indexes to retrieve data if conditions are joined with OR, or if the indexed column is being compared to an expression. WHERE indexcol1 =.var1 OR indexcol2 = .var2 You can force R:BASE to use a particular index by placing the comparison value in parentheses, making it an expression. WHERE indexcol1 =.var1 AND indexcol2 = (.var2) By knowing which indexed column R:BASE will use in a WHERE clause, you can structure your conditions so that R:BASE uses the most unique (and thus most helpful) index. Knowing your data helps you to modify commands and gain speed. Use Multi-Column Indexes If you routinely use the same set of columns in the WHERE clause, consider defining them as a multi-column index rather than as separate indexes. A multi-column index is faster when a set of columns is used in a sort clause or in a WHERE clause. While the columns in a multi-column index can be used separately, the index is most effective when all the columns in the index are used. For example, if you always retrieve records using a job# and a po#, define a multi-column index for the two columns, rather than two, single-column indexes. Drop Indexes to Load Data If you are loading or inserting many rows of data at one time, it can be faster to drop the indexes, load the data, and then rebuild the indexes. It is much faster for R:BASE to build the indexes in one chunk rather than having them updated with each row as it is loaded. Be Creative R:BASE almost always offers two or more ways of doing something. Look for an alternative method for accomplishing a task_it just might be considerably faster. Review Your Code Did you write that piece of code three months ago, six months ago, or perhaps longer ago still? As you learn more about R:BASE, about your data, and about how people are using the database and your application, you'll easily recognize areas in your code that can be modified for performance improvements. You'll also want to incorporate new techniques and new features from the current version of R:BASE. Have someone else review your code. Someone else might see areas for improvement and suggest other techniques that you miss because you are so familiar with your work. It is true that two heads are better than one. Other Sources These are just some of the techniques you can use to optimize your code to achieve the maximum speed potential of R:BASE. Some of these techniques provide dramatic improvements in speed; others provide smaller and less noticeable speed improvements. Taken together, these techniques can significantly increase the overall speed of your application. Remember that the times given here are relative_the actual times you see vary on the basis of your database (structure and data) and your hardware. You can find more information about optimization techniques in Inside R:BASE (available from the Microrim Sales Department, 1-800-628-6990), in various issues of the R:BASE Exchange, in the R:BASE Reference Manual, and in on-line Help.