================================================================== OPTIMIZING APPLICATION CODE FOR SPEED ================================================================== PRODUCT : R:BASE VERSION : 3.1 AND HIGHER CATEGORY: PROGRAMMING SUBCATEGORY: PERFORMANCE ================================================================== Because the most obvious or logical way to approach a programming task is not necessarily the fastest, optimizing application code is a necessary skill. This article gives some specific examples of different ways to accomplish the same tasks and shows the speed differences between them. It also provides other general techniques for improving the perfor- mance of your applications. How can you tell what the fastest way to accomplish a programming task is? The only way to tell for sure is to try each command or set of commands, time it, and compare the times. Often a sequence of commands will be fast on one database but slow on another because the structure (tables and columns) and data of the databases are different. Speed can vary from database to database and from machine to machine. The times given below for different programming techniques are for comparison purposes only. Actual times that you will see using the same methods will vary on the basis of your hardware and your database. OPTIMIZATION TECHNIQUES Use the most current version of R:BASE 3.1 ========================================== Microrim is continually making speed improvements, and each new release is faster than the previous one. For example, building indexes in R:BASE 3.1C is significantly faster than in previous versions. PACKing a data- base in 3.1C took just 2 1/2 hours; R:BASE 3.1B took almost six hours to pack the same database. Think globally instead of row by row ==================================== This is one of the best ways to achieve some significant speed improve- ments. Instead of doing row-by-row processing using a DECLARE CURSOR, try using a single UPDATE or INSERT command, which will take advantage of the SQL select capabilities of 3.1C. <> For example, suppose you want to create a table to hold information about how many of each product a customer has purchased. The most logical approach is to set up a cursor to go through the detail table and count the rows for each customer/product combination and then insert that data into the new table. On a table with 3500 rows, processing all the rows took 1:15:00 (one hour, 15 minutes). By replacing the DECLARE CURSOR loop with a single insert command, the time required to process all the rows was cut to 0:09:00 (nine minutes). Compare the two pieces of code: DECLARE c1 CURSOR FOR SELECT cust#,partno FROM cproddet + GROUP BY cust#,partno OPEN c1 FETCH c1 INTO vcust ind1,vpartno ind2 WHILE SQLCODE < > 100 THEN SELECT COUNT(*) INTO vcount FROM cproddet WHERE + cust#=.vcust AND partno=.vpartno INSERT INTO custprod (cust#,partno,copies) VALUES + (.vcust,.vpartno,.vcount) FETCH c1 INTO vcust ind1,vpartno ind2 ENDW DROP CURSOR c1 vs. INSERT INTO custprod (cust#,partno,copies) SELECT cust#,+ partno,count(*) FROM cproddet GROUP BY cust#,partno Be Creative =========== R:BASE almost always offers two or more ways of doing something. Look for an alternative method for accomplishing a task - it might just be considerably faster. <> For example, following are three ways to delete 600 rows from a table of 8,000 rows. Notice the time differential among the three methods. We can cut the time in half (or more) by being creative. First, a straightforward use of the DELETE ROWS command, shown below, took 06:40; SET CLEAR OFF reduced the time slightly to 06:29. DELETE ROWS FROM transx WHERE invid=10 Second, using the PROJECT command to create a new table with the rows to keep took 06:00; SET CLEAR OFF significantly reduced the time to just 01:02. This method is shown below: PROJECT newtrans FROM transx USI ALL WHERE invid < > 10 DROP TAB transx REN TABLE newtrans TO transx NOCHECK CREATE INDEX ON transx invid CREATE INDEX ON transx prodid Finally, UNLOADing the data to an ASCII file, then reloading the table with that data using the LOAD AS ASCII feature of 3.1B took only 03:56; SET CLEAR OFF cut the time about in half to 02:01. This method is shown below: OUTPUT transx.dat UNLOAD DATA FROM transx AS ASCII WHERE invid < > 10 OUTPUT SCREEN DEL ROW FROM transx SET RULE OFF DROP INDEX invid IN transx DROP INDEX prodid IN transx LOAD TRANSX FROM transx.dat AS ASCII CREATE INDEX ON transx invid CREATE INDEX ON transx prodid DEL transx.dat <> Here's another example of creative programming. Use the SELECT command to write a file of R:BASE commands, then RUN that file. This is a faster method of posting (updating a column in one table with values from another table) than using a DECLARE CURSOR command. Use the SELECT command to create a file of UPDATE commands, then execute the UPDATE commands by running the file. This technique, shown below, took just 00:40 (40 seconds). See the January/February 1992 and November/ December 1990 issues of the R:BASE Exchange for other examples. SET HEADINGS OFF OUTPUT updinv.dat SELECT 'UPD INVOICE SET INVTOTAL = '=28,sum(tprice), + 'WHERE invid = '=16,invid FROM transx GROUP BY invid OUTPUT SCREEN SET HEADINGS ON RUN updinv.dat In contrast, doing a DECLARE CURSOR loop took 01:54, about three times as long. This code is shown below: DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx + GROUP BY invid OPEN c1 FETCH c1 INTO vinvid ind1,vprice ind2 WHILE sqlcode < > 100 THEN UPDATE invoice SET invtotal=.vprice WHERE invid=.vinvid FETCH c1 INTO vinvid ind1,vprice ind2 ENDW Change the environment ====================== The R:BASE environment settings that have the most effect on performance are SET RULES, SET CLEAR, and SET MESSAGES. <> If you can do a process in a single-user environment, you'll improve performance by using SET CLEAR OFF before executing commands that change or add rows to the database. SET CLEAR OFF sets up a 5K buffer to hold changes. Changes are written to disk only when you SET CLEAR ON, when the buffer is full or is needed for the next page of data, or when you disconnect the database or exit from R:BASE. <> If you know that the data being loaded or changed is correct, SET RULES OFF. You can see a small performance increase even if you don't have any rules. If many rules have to be checked you will see a larger performance increase. <> SET MESSAGES OFF eliminates the time required for screen display of messages. This is particularly noticeable if you are doing repetitive UPDATE or INSERT commands. Reduce Number of Expressions in Reports ======================================= Because all report expressions are processed for each row of data read from the reports driving table or view, reducing the number of expressions in a report increases the printing speed of a report. <> If you have many lookups in a report, basing the report on a view instead of on a table will usually be faster. A view combines all the data just once; a lookup needs to find the data for each row in the driving table, even if the lookup is printing only on a header line. A sample report, based on a table with 8000 rows and containing 14 lookup expressions (24 total expressions), took 37:00 to print. As an alternative, create a four-table view and base the report on the view. Remove the lookup expressions from the report, leaving only 10 expressions to be processed each row. Printing the report by using a view instead of lookup expressions takes only 23:00, almost twice as fast. <> Because R:BASE 3.1 no longer requires a global variable to be equated to a report variable, you can locate global variables on the report or reference them directly in an expression. For example, you don't need to define RPTDTE=. #DATE and then locate RPTDATE; just locate #DATE directly. Set constant variables outside the report; they'll be evaluated only once instead of for every row. Abbreviate commands to four characters ====================================== R:BASE parses command lines into 4-byte tokens - the fewer tokens to read, the faster the command executes. Combine commands whenever possible ================================== Many R:BASE commands allow access to more than one column or variable and don't require using separate commands for each. This reduces the number of times R:BASE must parse and read a command. The fewer commands to read, the faster R:BASE executes. <> The most frequently used commands that fit into this category are SET VAR, UPDATE, SELECT, COMPUTE, and WRITE. For example, in 3.1 you can assign data types variables and set the variables to values in one command, like this: SET VAR v1 TEXT = 'abcd', v2 INTEGER=100, v3 DATE = .#date The COMPUTE command was modified in R:BASE 3.1A to allow multiple computations in one command, as was the WRITE command. See the Upgrade Express booklet for syntax. Make cursors faster =================== Sometimes a task can be accomplished only by stepping through each row in a table. If you need to use a DECLARE CURSOR routine, make it as fast as possible by using the following tips: <> Include all columns and computations in the DECLARE CURSOR statement so you can FETCH the values once instead of doing SET VAR commands repeatedly in the WHILE loop. Take advantage of SQL SELECT when doing your DECLARE CURSOR. This can reduce the number of times the commands in the WHILE loop must be repeated, thus improving performance. The following example took about three minutes on an 8,000-row trans- action table: DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx + GROUP BY invid OPEN c1 FETCH c1 INTO vinvid ind1,vprice ind2 WHILE sqlcode < > 100 THEN UPDATE invoice SET invtotal=.vprice WHERE invid=.vinvid FETCH c1 INTO vinvid ind1,vprice ind2 ENDW This example, by contrast, took about 12 minutes on an 8,000-row trans- action table: SET VAR vtotprice CURR=0 SET VAR voldinv=invid IN transx WHERE LIMIT=1 DECLARE C1 CURSOR FOR SELECT invid,tprice FROM transx ORDER BY invid OPEN c1 FETCH c1 INTO vinvid ind1,vprice ind2 WHILE sqlcode < > 100 THEN SET VAR vtotprice=(.vtotprice+.vprice) IF vinvid < > voldinv THEN UPDATE invoice SET invtotal=.vtotprice WHERE invid=.voldinv SET VAR vtotprice=0 ENDIF SET VAR voldinv=.vinvid FETCH c1 INTO vinvid ind1,vprice ind2 ENDW Fetch the index columns into variables and then use WHERE colname= .varname instead of WHERE CURRENT OF. Using an indexed where clause is slightly faster. <> When nesting cursors, do all the DECLAREs at the top of the file. Nest the OPEN and FETCH commands, not the DECLARE CURSOR. A good example of nesting cursors is presented in Supercharging R:BASE. <> When using DECLARE CURSOR routines, SET CLEAR to OFF. Doing so typically improves performance by at least half. Use correlated sub-selects or multi-table selects instead of sub-selects ======================================================================== A multi-table select uses indexes to join the tables. A correlated sub- select uses indexes to restrict the list of values to be compared within the sub-select. A plain sub-select will not use indexes. Each row of data from the main query must be compared to every row selected by the sub- select. To illustrate, compare the results obtained by using the different techniques on two tables, each with 1,000 unique rows (each row in table1 has only one match in table2). (1) A multi-table select, shown below, took 00:05. SELECT collist FROM table1,table2 WHERE table1.linkcol=table2.linkcol (2) A correlated sub-select, which looked at every row in table1 but needed to look at only one row in table2 for each row in table1, took 00:10. The example is shown below: SELECT collist FROM table1 WHERE linkcol IN + (SELECT linkcol FROM table2 WHERE table2.linkcol=table1.linkcol) (3) A plain sub-select, which had to look for a match in every row in table2 for each of the 1,000 rows of table1, took 3:00 (three minutes) to complete. This example, by far the slowest, is shown below: SELECT collist FROM table1 WHERE linkcol IN + (SELECT linkcol FROM table2) SELECT is a command in which your data really affects the performance. The results you see and the method you find best will depend on the number of rows and distribution of data in the tables you are combining. Try using manual optimization ============================= The multi-table selects discussed above were done with MANOPT set ON to show time differences based on table order. With MANOPT set ON, R:BASE joins tables in the order they are listed in the FROM clause of the SELECT command. The R:BASE optimizer generally picks the most efficient way to join the tables. You have the option in R:BASE 3.1C of manually specifying the order by using the MANOPT setting. Doing so might improve the performance of your multi-table selects and views. See your Upgrade Express booklet for complete information on using MANOPT. Practice smart indexing ======================= In a WHERE clause, R:BASE uses only one index (except when joining tables). If more than one indexed column is referenced in a WHERE clause, R:BASE looks for the one that places the greatest restriction on the number of rows returned. It first looks for an indexed column using =, then checks for IS NULL, and finally looks at BETWEEN. It uses the first index it finds with the most restrictive operator. <> The following example uses the index on indexcol2 because it uses = and R:BASE assumes that that will restrict the query to fewer rows than the BETWEEN: WHERE indexcol1 BETWEEN .var1 AND + .var2 AND indexcol2 = .var3 The next example uses the index on indexcol1 because both conditions use = and indexcol1 is first in the WHERE clause: WHERE indexcol1 =.var1 AND indexcol2 = .var2 The third example won't use indexes because the conditions are joined with OR (R:BASE won't use indexes if conditions are joined with OR or if the indexed column is being compared to an expression): WHERE indexcol1 =.var1 OR indexcol2 = .var2 By knowing which indexed column R:BASE will use in a WHERE clause, you can structure your conditions so that R:BASE uses the most unique (and thus most helpful) index. This is where knowing your data helps you to modify commands and gain speed. Check your WHERE clauses to make sure the most unique index is the one being used. See INDEXES entry in the Command Dictionary section of the Reference Manual for more information on how R:BASE chooses indexed columns. <> Version 3.1A added the IHASH function to R:BASE. You can use IHASH instead of text indexes for faster access to data. See your Upgrade Express booklet for complete information on using IHASH. <> If you are loading many rows of data at one time, it is considerably faster to drop the indexes, load the data, and then rebuild the indexes. It is much faster for R:BASE to build the indexes in one chunk rather than having them updated with each row as it is loaded. And 3.1C has improved the speed of the CREATE INDEX command by about half over previous versions. The four-minute process shown in the delete row example would have taken 18 minutes had the indexes not been dropped. Use UDFs ======== R:BASE 3.1C includes the UDF function, which allows you to call C routines from within your R:BASE appli- cation, form, report, or table. A C routine can be faster than the same function written in the R:BASE program- ming language. See the article "UDFs Now Available" in this issue of the R:BASE Exchange for a listing of ready-made UDFs available for purchase from Microrim. Review your code ================ Did you write that piece of code three months ago, six months ago, or perhaps longer ago still? As you learn more about R:BASE, about your data, and about how people are using the database and your application, you'll easily recognize areas in your code you can modify for performance improvements. You'll want to incorporate new techniques and new features from the current version of R:BASE. <> For example, the January/February 1991 R:BASE Exchange included an article showing how to create a Yes/No dialog box. It's about 70 lines of code. That same option was added to the DIALOG command in R:BASE 3.1B. You could replace 70 lines of code with a single line and thus speed up that section of the application<197>but you have to review your code to find such opportunities. <> Also, have someone else review your code. Others can see areas for improvement and suggest other techniques that you miss because you are so familiar with your work. It is true that two heads are better than one. Other Sources ============= These are just some of the techniques you can use to optimize your code to achieve the maximum speed potential of R:BASE 3.1. Some of these techniques will provide dramatic improvements in speed; others will provide smaller and less noticeable speed improvements. Taken together, these techniques will significantly increase the overall speed of your application. Remember that the times given here are relative - the actual times you see will vary on the basis of your database (structure and data) and your hardware. You can find other information about optimization techniques in SUPER- CHARGING R:BASE (available from the Microrim Sales Department, 1-425-649-9500) and in various issues of the R:BASE EXCHANGE.