==================================================================
OPTIMIZING APPLICATION CODE FOR SPEED
==================================================================
PRODUCT : R:BASE VERSION : 3.1 AND HIGHER
CATEGORY: PROGRAMMING SUBCATEGORY: PERFORMANCE
==================================================================
Because the most obvious or logical way to approach a programming task
is not necessarily the fastest, optimizing application code is a
necessary skill. This article gives some specific examples of different
ways to accomplish the same tasks and shows the speed differences between
them. It also provides other general techniques for improving the perfor-
mance of your applications.
How can you tell what the fastest way to accomplish a programming task
is? The only way to tell for sure is to try each command or set of
commands, time it, and compare the times. Often a sequence of commands
will be fast on one database but slow on another because the structure
(tables and columns) and data of the databases are different. Speed can
vary from database to database and from machine to machine.
The times given below for different programming techniques are for
comparison purposes only. Actual times that you will see using the same
methods will vary on the basis of your hardware and your database.
OPTIMIZATION TECHNIQUES
Use the most current version of R:BASE 3.1
==========================================
Microrim is continually making speed improvements, and each new release
is faster than the previous one. For example, building indexes in R:BASE
3.1C is significantly faster than in previous versions. PACKing a data-
base in 3.1C took just 2 1/2 hours; R:BASE 3.1B took almost six hours to
pack the same database.
Think globally instead of row by row
====================================
This is one of the best ways to achieve some significant speed improve-
ments. Instead of doing row-by-row processing using a DECLARE CURSOR,
try using a single UPDATE or INSERT command, which will take advantage
of the SQL select capabilities of 3.1C.
<> For example, suppose you want to create a table to hold information
about how many of each product a customer has purchased. The most
logical approach is to set up a cursor to go through the detail table
and count the rows for each customer/product combination and then
insert that data into the new table. On a table with 3500 rows,
processing all the rows took 1:15:00 (one hour, 15 minutes). By
replacing the DECLARE CURSOR loop with a single insert command, the
time required to process all the rows was cut to 0:09:00 (nine
minutes).
Compare the two pieces of code:
DECLARE c1 CURSOR FOR SELECT cust#,partno FROM cproddet +
GROUP BY cust#,partno
OPEN c1
FETCH c1 INTO vcust ind1,vpartno ind2
WHILE SQLCODE < > 100 THEN
SELECT COUNT(*) INTO vcount FROM cproddet WHERE +
cust#=.vcust AND partno=.vpartno
INSERT INTO custprod (cust#,partno,copies) VALUES +
(.vcust,.vpartno,.vcount)
FETCH c1 INTO vcust ind1,vpartno ind2
ENDW
DROP CURSOR c1
vs.
INSERT INTO custprod (cust#,partno,copies) SELECT cust#,+
partno,count(*) FROM cproddet GROUP BY cust#,partno
Be Creative
===========
R:BASE almost always offers two or more ways of doing something. Look
for an alternative method for accomplishing a task - it might just be
considerably faster.
<> For example, following are three ways to delete 600 rows from a
table of 8,000 rows. Notice the time differential among the three
methods. We can cut the time in half (or more) by being creative.
First, a straightforward use of the DELETE ROWS command, shown below,
took 06:40; SET CLEAR OFF reduced the time slightly to 06:29.
DELETE ROWS FROM transx WHERE invid=10
Second, using the PROJECT command to create a new table with the rows to
keep took 06:00; SET CLEAR OFF significantly reduced the time to just
01:02. This method is shown below:
PROJECT newtrans FROM transx USI ALL WHERE invid < > 10
DROP TAB transx
REN TABLE newtrans TO transx NOCHECK
CREATE INDEX ON transx invid
CREATE INDEX ON transx prodid
Finally, UNLOADing the data to an ASCII file, then reloading the table
with that data using the LOAD AS ASCII feature of 3.1B took only 03:56;
SET CLEAR OFF cut the time about in half to 02:01. This method is shown
below:
OUTPUT transx.dat
UNLOAD DATA FROM transx AS ASCII WHERE invid < > 10
OUTPUT SCREEN
DEL ROW FROM transx
SET RULE OFF
DROP INDEX invid IN transx
DROP INDEX prodid IN transx
LOAD TRANSX FROM transx.dat AS ASCII
CREATE INDEX ON transx invid
CREATE INDEX ON transx prodid
DEL transx.dat
<> Here's another example of creative programming. Use the SELECT command
to write a file of R:BASE commands, then RUN that file. This is a
faster method of posting (updating a column in one table with values
from another table) than using a DECLARE CURSOR command.
Use the SELECT command to create a file of UPDATE commands, then execute
the UPDATE commands by running the file. This technique, shown below,
took just 00:40 (40 seconds). See the January/February 1992 and November/
December 1990 issues of the R:BASE Exchange for other examples.
SET HEADINGS OFF
OUTPUT updinv.dat
SELECT 'UPD INVOICE SET INVTOTAL = '=28,sum(tprice), +
'WHERE invid = '=16,invid FROM transx GROUP BY invid
OUTPUT SCREEN
SET HEADINGS ON
RUN updinv.dat
In contrast, doing a DECLARE CURSOR loop took 01:54, about three times
as long. This code is shown below:
DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx +
GROUP BY invid
OPEN c1
FETCH c1 INTO vinvid ind1,vprice ind2
WHILE sqlcode < > 100 THEN
UPDATE invoice SET invtotal=.vprice WHERE invid=.vinvid
FETCH c1 INTO vinvid ind1,vprice ind2
ENDW
Change the environment
======================
The R:BASE environment settings that have the most effect on performance
are SET RULES, SET CLEAR, and SET MESSAGES.
<> If you can do a process in a single-user environment, you'll improve
performance by using SET CLEAR OFF before executing commands that
change or add rows to the database. SET CLEAR OFF sets up a 5K buffer
to hold changes. Changes are written to disk only when you SET CLEAR
ON, when the buffer is full or is needed for the next page of data,
or when you disconnect the database or exit from R:BASE.
<> If you know that the data being loaded or changed is correct, SET
RULES OFF. You can see a small performance increase even if you don't
have any rules. If many rules have to be checked you will see a larger
performance increase.
<> SET MESSAGES OFF eliminates the time required for screen display of
messages. This is particularly noticeable if you are doing repetitive
UPDATE or INSERT commands.
Reduce Number of Expressions in Reports
=======================================
Because all report expressions are processed for each row of data
read from the reports driving table or view, reducing the number of
expressions in a report increases the printing speed of a report.
<> If you have many lookups in a report, basing the report on a view
instead of on a table will usually be faster. A view combines all the
data just once; a lookup needs to find the data for each row in the
driving table, even if the lookup is printing only on a header line.
A sample report, based on a table with 8000 rows and containing 14 lookup
expressions (24 total expressions), took 37:00 to print.
As an alternative, create a four-table view and base the report on the
view. Remove the lookup expressions from the report, leaving only 10
expressions to be processed each row. Printing the report by using a view
instead of lookup expressions takes only 23:00, almost twice as fast.
<> Because R:BASE 3.1 no longer requires a global variable to be equated
to a report variable, you can locate global variables on the report or
reference them directly in an expression. For example, you don't need
to define RPTDTE=. #DATE and then locate RPTDATE; just locate #DATE
directly. Set constant variables outside the report; they'll be
evaluated only once instead of for every row.
Abbreviate commands to four characters
======================================
R:BASE parses command lines into 4-byte tokens - the fewer tokens to
read, the faster the command executes.
Combine commands whenever possible
==================================
Many R:BASE commands allow access to more than one column or variable
and don't require using separate commands for each. This reduces the
number of times R:BASE must parse and read a command. The fewer commands
to read, the faster R:BASE executes.
<> The most frequently used commands that fit into this category are SET
VAR, UPDATE, SELECT, COMPUTE, and WRITE.
For example, in 3.1 you can assign data types variables and set the
variables to values in one command, like this:
SET VAR v1 TEXT = 'abcd', v2 INTEGER=100, v3 DATE = .#date
The COMPUTE command was modified in R:BASE 3.1A to allow multiple
computations in one command, as was the WRITE command. See the Upgrade
Express booklet for syntax.
Make cursors faster
===================
Sometimes a task can be accomplished only by stepping through each row
in a table. If you need to use a DECLARE CURSOR routine, make it as fast
as possible by using the following tips:
<> Include all columns and computations in the DECLARE CURSOR statement
so you can FETCH the values once instead of doing SET VAR commands
repeatedly in the WHILE loop. Take advantage of SQL SELECT when doing
your DECLARE CURSOR. This can reduce the number of times the commands
in the WHILE loop must be repeated, thus improving performance.
The following example took about three minutes on an 8,000-row trans-
action table:
DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx +
GROUP BY invid
OPEN c1
FETCH c1 INTO vinvid ind1,vprice ind2
WHILE sqlcode < > 100 THEN
UPDATE invoice SET invtotal=.vprice WHERE invid=.vinvid
FETCH c1 INTO vinvid ind1,vprice ind2
ENDW
This example, by contrast, took about 12 minutes on an 8,000-row trans-
action table:
SET VAR vtotprice CURR=0
SET VAR voldinv=invid IN transx WHERE LIMIT=1
DECLARE C1 CURSOR FOR SELECT invid,tprice FROM transx ORDER BY invid
OPEN c1
FETCH c1 INTO vinvid ind1,vprice ind2
WHILE sqlcode < > 100 THEN
SET VAR vtotprice=(.vtotprice+.vprice)
IF vinvid < > voldinv THEN
UPDATE invoice SET invtotal=.vtotprice WHERE invid=.voldinv
SET VAR vtotprice=0
ENDIF
SET VAR voldinv=.vinvid
FETCH c1 INTO vinvid ind1,vprice ind2
ENDW
Fetch the index columns into variables and then use WHERE colname=
.varname instead of WHERE CURRENT OF. Using an indexed where clause is
slightly faster.
<> When nesting cursors, do all the DECLAREs at the top of the file.
Nest the OPEN and FETCH commands, not the DECLARE CURSOR. A good
example of nesting cursors is presented in Supercharging R:BASE.
<> When using DECLARE CURSOR routines, SET CLEAR to OFF. Doing so
typically improves performance by at least half.
Use correlated sub-selects or multi-table selects instead of sub-selects
========================================================================
A multi-table select uses indexes to join the tables. A correlated sub-
select uses indexes to restrict the list of values to be compared within
the sub-select. A plain sub-select will not use indexes. Each row of data
from the main query must be compared to every row selected by the sub-
select.
To illustrate, compare the results obtained by using the different
techniques on two tables, each with 1,000 unique rows (each row in table1
has only one match in table2).
(1) A multi-table select, shown below, took 00:05.
SELECT collist FROM table1,table2 WHERE table1.linkcol=table2.linkcol
(2) A correlated sub-select, which looked at every row in table1 but
needed to look at only one row in table2 for each row in table1, took
00:10. The example is shown below:
SELECT collist FROM table1 WHERE linkcol IN +
(SELECT linkcol FROM table2 WHERE table2.linkcol=table1.linkcol)
(3) A plain sub-select, which had to look for a match in every row in
table2 for each of the 1,000 rows of table1, took 3:00 (three minutes)
to complete. This example, by far the slowest, is shown below:
SELECT collist FROM table1 WHERE linkcol IN +
(SELECT linkcol FROM table2)
SELECT is a command in which your data really affects the performance.
The results you see and the method you find best will depend on the
number of rows and distribution of data in the tables you are combining.
Try using manual optimization
=============================
The multi-table selects discussed above were done with MANOPT set ON to
show time differences based on table order. With MANOPT set ON, R:BASE
joins tables in the order they are listed in the FROM clause of the
SELECT command. The R:BASE optimizer generally picks the most efficient
way to join the tables. You have the option in R:BASE 3.1C of manually
specifying the order by using the MANOPT setting. Doing so might improve
the performance of your multi-table selects and views. See your Upgrade
Express booklet for complete information on using MANOPT.
Practice smart indexing
=======================
In a WHERE clause, R:BASE uses only one index (except when joining
tables). If more than one indexed column is referenced in a WHERE
clause, R:BASE looks for the one that places the greatest restriction
on the number of rows returned. It first looks for an indexed column
using =, then checks for IS NULL, and finally looks at BETWEEN. It uses
the first index it finds with the most restrictive operator.
<> The following example uses the index on indexcol2 because it uses =
and R:BASE assumes that that will restrict the query to fewer rows
than the BETWEEN:
WHERE indexcol1 BETWEEN .var1 AND + .var2 AND indexcol2 = .var3
The next example uses the index on indexcol1 because both conditions use
= and indexcol1 is first in the WHERE clause:
WHERE indexcol1 =.var1 AND indexcol2 = .var2
The third example won't use indexes because the conditions are joined
with OR (R:BASE won't use indexes if conditions are joined with OR or if
the indexed column is being compared to an expression):
WHERE indexcol1 =.var1 OR indexcol2 = .var2
By knowing which indexed column R:BASE will use in a WHERE clause, you
can structure your conditions so that R:BASE uses the most unique (and
thus most helpful) index. This is where knowing your data helps you to
modify commands and gain speed. Check your WHERE clauses to make sure the
most unique index is the one being used. See INDEXES entry in the Command
Dictionary section of the Reference Manual for more information on how
R:BASE chooses indexed columns.
<> Version 3.1A added the IHASH function to R:BASE. You can use IHASH
instead of text indexes for faster access to data. See your Upgrade
Express booklet for complete information on using IHASH.
<> If you are loading many rows of data at one time, it is considerably
faster to drop the indexes, load the data, and then rebuild the
indexes. It is much faster for R:BASE to build the indexes in one
chunk rather than having them updated with each row as it is loaded.
And 3.1C has improved the speed of the CREATE INDEX command by about
half over previous versions. The four-minute process shown in the
delete row example would have taken 18 minutes had the indexes not
been dropped.
Use UDFs
========
R:BASE 3.1C includes the UDF function, which allows you to call C
routines from within your R:BASE appli- cation, form, report, or table.
A C routine can be faster than the same function written in the R:BASE
program- ming language. See the article "UDFs Now Available" in this
issue of the R:BASE Exchange for a listing of ready-made UDFs available
for purchase from Microrim.
Review your code
================
Did you write that piece of code three months ago, six months ago, or
perhaps longer ago still? As you learn more about R:BASE, about your
data, and about how people are using the database and your application,
you'll easily recognize areas in your code you can modify for performance
improvements. You'll want to incorporate new techniques and new features
from the current version of R:BASE.
<> For example, the January/February 1991 R:BASE Exchange included an
article showing how to create a Yes/No dialog box. It's about 70 lines
of code. That same option was added to the DIALOG command in R:BASE
3.1B. You could replace 70 lines of code with a single line and thus
speed up that section of the application<197>but you have to review
your code to find such opportunities.
<> Also, have someone else review your code. Others can see areas for
improvement and suggest other techniques that you miss because you
are so familiar with your work. It is true that two heads are better
than one.
Other Sources
=============
These are just some of the techniques you can use to optimize your code to
achieve the maximum speed potential of R:BASE 3.1. Some of these
techniques will provide dramatic improvements in speed; others will
provide smaller and less noticeable speed improvements. Taken together,
these techniques will significantly increase the overall speed of your
application. Remember that the times given here are relative - the
actual times you see will vary on the basis of your database (structure
and data) and your hardware.
You can find other information about optimization techniques in SUPER-
CHARGING R:BASE (available from the Microrim Sales Department, 1-425-649-9500)
and in various issues of the R:BASE EXCHANGE.