783.TXT ===================================================================== OPTIMIZING CURSORS ===================================================================== PRODUCT: R:BASE VERSION: 4.5+ or Higher ===================================================================== CATALOG: PROGRAMMING IN R:BASE AREA : Logic & Data Manipulation ===================================================================== DECLARE CURSOR is not always the fastest way to accomplish a task, particularly an UPDATE or an INSERT. If you can replace your DECLARE CURSOR routine with a single SQL command, you will dramatically improve performance. However, some tasks require a DECLARE CURSOR. Let the cursor do the work To improve the performance of a DECLARE CURSOR routine, do as much work in the DECLARE CURSOR as possible. This is the single most important factor in improving cursor performance. Do whatever work can be done in the SELECT command part of the DECLARE CURSOR - select as many columns of data as possible and also do calculations there if you can. The DECLARE CURSOR does the operation only once; inside the WHILE loop, the command is repeated for each row that is stepped through. To do actions for unique rows only, use SELECT DISTINCT in the cursor definition instead of adding code to your WHILE loop to test the row values to see if they are the same or different. Use the SELECT functions to sum, average, count and so on in the cursor definition instead of for each row in the WHILE loop. Select as many columns as possible in the DECLARE CURSOR rather than retrieve the data each row in the WHILE loop. The fewer commands repeated in the WHILE loop, the faster your DECLARE CURSOR will run. Remember that each command in the WHILE loop is repeated for each row retrieved by the DECLARE CURSOR. Use optimized variables in the WHILE loop -initialize each variable outside the WHILE loop, and do not change the data type of variables in the loop. Following are two examples showing progressive changes made to a DECLARE CURSOR routine to improve performance. Example 1 - posting. The task is to sum the extended price column in the transaction detail, Transdetail, table for each transaction ID, then update the transaction header, Transmaster, table with the sum. An initial approach is to declare a cursor on the header table, then step through all matching rows in the detail table. After all the matching detail rows have been processed, the header table is updated. *(POST1.CMD -- the worst case) -- nested declare cursors -- strictly linear programming SET VAR vtotal CURR DECLARE c1 CURSOR FOR SELECT transid, netamount + FROM transmaster OPEN c1 FETCH c1 INTO vtransid vind1,vnetamount vind2 WHILE SQLCODE <> 100 THEN DECLARE c2 CURSOR FOR SELECT extprice + FROM transdetail WHERE transid = .vtransid OPEN c2 FETCH c2 INTO vprice vind3 WHILE SQLCODE <> 100 THEN SET VAR vtotal = (.vtotal + .vprice) FETCH c2 INTO vprice vind3 ENDWHILE DROP CURSOR c2 UPDATE transmaster SET netamount = .vtotal + WHERE CURRENT OF c1 SET VAR vtotal = NULL FETCH c1 INTO vtransid vind1,vnetamount vind2 ENDWHILE DROP CURSOR c1 We can speed up this code by following the recommended structure for nested cursors. If we move the second DECLARE CURSOR out of the WHILE loop and reset the cursor instead of dropping it, this command file will execute faster. However, the best way to improve this code is by removing the second DECLARE CURSOR altogether. We don't need to step through all the rows in the detail table - we can compute the sum with a single SELECT command. *(POST2.CMD - a little bit better) -- use the SELECT or COMPUTE command -- to calculate the sum instead of a nested cursor SET VAR vprice CURR = NULL DECLARE c1 CURSOR FOR SELECT transid, netamount + FROM transmaster OPEN c1 FETCH c1 INTO vtransid vind1,vamount vind2 WHILE SQLCODE <> 100 THEN SELECT SUM(extprice) INTO vprice + FROM transdetail WHERE transid = .vtransid -- if no matching rows in the Transdetail table, -- vprice is null IF vprice IS NOT NULL THEN UPDATE transmaster SET netamount = .vprice + WHERE CURRENT OF c1 ENDIF SET VAR vprice = NULL FETCH c1 INTO vtransid vind1,vamount vind2 ENDWHILE DROP CURSOR c1 This simple change reduced the number of commands in the program, which in turn improved performance. All the commands inside the WHILE loop still need to be executed for as many rows as are in the Transmaster table, however. The Transmaster table has fewer rows than the Transdetail table, so a valid assumption is to place the cursor on the Transmaster table to repeat the WHILE loop the fewest times. However, if we place the cursor on the detail table instead of on the header table, the sum can be calculated directly in the DECLARE CURSOR. Because the command is grouped by the transaction ID, the same number of rows is retrieved by the cursor. The only commands to repeat in the WHILE loop are the UPDATE and the FETCH to get the next row. At first this might seem backwards, but computing the sum in the DECLARE CURSOR is much faster. *(POST3.CMD - better yet) -- declare the cursor on the detail table and -- do the sum directly in the cursor definition DECLARE c1 CURSOR FOR SELECT transid, SUM(extprice) + FROM transdetail GROUP BY transid OPEN c1 FETCH c1 INTO vtransid vind1,vprice vind2 WHILE SQLCODE <> 100 THEN -- this is a non-updatable cursor so an explicit -- WHERE clause is used UPDATE transmaster SET netamount = .vprice + WHERE transid = .vtransid FETCH c1 INTO vtransid vind1,vprice vind2 ENDWHILE DROP CURSOR c1 The number of commands has been reduced by over half from the first program, and performance by more than that. In R:BASE 4.0, 4.5 and 4.5 Plus! the multi-table update command is actually the fastest way to accomplish this task. *(POST4.CMD - do a multi-table update if you can) -- multi table update command, a view is used -- to first calculate the sum and create a -- one-one relationship DROP VIEW v_trans CREATE VIEW v_trans (transid, amount) AS + SELECT transid, SUM(extprice) + FROM transdetail GROUP BY transid UPDATE transmaster SET netamount = amount + FROM transmaster ,v_trans t2 + WHERE transmaster.transid = t2.transid Example 2 - a quick report. The task here is to create a quick report of companies from the Customer table and their corresponding contact names from the Contact table. Using nested cursors makes printing the company information once followed by the many rows of contact information easier. *(CUSTREP1.CMD - the worst case) -- nested cursors are used with the declare for -- the second cursor inside the while loop of -- the first cursor. Also, the data is retrieved -- with a SELECT command instead of in the -- cursor definition -- Dropping a cursor before you declare it is a -- technique used to guarantee that the cursor does -- not exist in memory. The DROP CURSOR normally -- returns an error message, that is ok. DROP CURSOR c1 DROP CURSOR c2 -- Only the unique row identifier is specified in -- the cursor definition DECLARE c1 CURSOR FOR SELECT custid FROM customer + ORDER BY custid OPEN c1 FETCH c1 INTO vcustid ind1 WHILE SQLCODE <> 100 THEN -- Retrieve and display the rest of the -- data for a customer SELECT company, custaddress, custcity, + custstate, custzip, custphone INTO + vcompany vi1, vaddress vi2, vcity vi3, + vstate vi4, vzipcode vi5, vphone vi6 + FROM customer WHERE custid = .vcustid SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode) WRITE .vcustid, .vcompany WRITE .vaddress WRITE .vcsz -- Declare a cursor to identify matching contact rows DECLARE c2 CURSOR FOR SELECT contfname, contlname + FROM contact WHERE custid = .vcustid OPEN c2 FETCH c2 INTO vfname i1, vlname i2 WHILE SQLCODE <> 100 THEN SET VAR vfullname = (.vfname & .vlname) WRITE .vfullname FETCH c2 INTO vfname i1, vlname i2 ENDWHILE DROP CURSOR c2 FETCH c1 INTO vcustid ind1 ENDWHILE DROP CURSOR c1 The next code segment shows the recommended structure for nested cursors. The second DECLARE CURSOR is moved to the top of the program, and the second cursor is opened and closed, not declared and dropped. Just this simple change improves performance. *(CUSTREP2.CMD - move cursor out of WHILE loop) DROP CURSOR c1 DROP CURSOR c2 SET VAR vcustid INTEGER DECLARE c1 CURSOR FOR SELECT custid + FROM customer ORDER BY custid DECLARE c2 CURSOR FOR SELECT contfname, contlname + FROM contact WHERE custid = .vcustid -- Get the first row of data for a customer OPEN c1 FETCH c1 INTO vcustid ind1 WHILE SQLCODE <> 100 THEN -- Retrieve and display the rest of the -- data for a customer SELECT company, custaddress, custcity, + custstate, custzip, custphone INTO + vcompany vi1, vaddress vi2, vcity vi3, + vstate vi4, vzipcode vi5, vphone vi6 + FROM customer WHERE custid = .vcustid SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode) WRITE .vcustid, .vcompany WRITE .vaddress WRITE .vcsz -- Open cursor c2, retrieve and display -- the matching contact data OPEN c2 FETCH c2 INTO vfname i1, vlname i2 WHILE SQLCODE <> 100 THEN SET VAR vfullname = (.vfname & .vlname) WRITE .vfullname FETCH c2 INTO vfname i1, vlname i2 ENDWHILE -- Close cursor c2 and get the next row of -- customer data CLOSE c2 FETCH c1 INTO vcustid ind1 ENDWHILE DROP CURSOR c1 DROP CURSOR c2 Moving the data retrieval to the DECLARE CURSOR command instead of using a separate SELECT command again improves performance. *(CUSTREP3.CMD retrieve data through DECLARE CURSOR) DROP CURSOR c1 DROP CURSOR c2 -- retrieve all the data through the DECLARE CURSOR -- command instead of SELECT SET VAR vcustid INTEGER DECLARE c1 CURSOR FOR SELECT custid, company, + custaddress, custcity ,custstate, custzip, + custphone FROM customer ORDER BY custid DECLARE c2 CURSOR FOR SELECT contfname, contlname + FROM contact WHERE custid = .vcustid OPEN c1 -- Get the first row of customer data FETCH c1 INTO vcustid ind1, vcompany ind2,+ vaddress ind3, vcity ind4, vstate ind5, + vzip ind6, vphone ind7 WHILE SQLCODE <> 100 THEN -- Display the customer data and open cursor c2 to -- retrieve the matching contact data SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode) WRITE .vcustid, .vcompany WRITE .vaddress WRITE .vcsz OPEN c2 FETCH c2 INTO vfname i1, vlname i2 WHILE SQLCODE <> 100 THEN SET VAR vfullname = (.vfname & .vlname) WRITE .vfullname FETCH c2 INTO vfname i1, vlname i2 ENDWHILE -- Close cursor c2 and get the next row of -- customer data CLOSE c2 FETCH c1 INTO vcustid ind1, vcompany ind2,+ vaddress ind3, vcity ind4, vstate ind5, + vzip ind6, vphone ind7 ENDWHILE DROP CURSOR c1 DROP CURSOR c2 Another small change also improves performance - instead of using SET VAR commands within the WHILE loops to concatenate city, state and zipcode together, and first and last name together, the concatenation operation can be done in the DECLARE CURSOR command. The concatenation in the DECLARE CURSOR reduces the number of commands that are repeated for each row and moves the work to the DECLARE CURSOR command. *(CUSTREP4.CMD add the concatenation to the DECLARE CURSOR) DROP CURSOR c1 DROP CURSOR c2 SET VAR vcustid INTEGER -- Replace SET VAR commands with expressions in -- the DECLARE CURSOR DECLARE c1 CURSOR FOR SELECT custid, company, + custaddress, (custcity + ',' & custstate & custzip), + custphone FROM customer ORDER BY custid DECLARE c2 CURSOR FOR SELECT (contfname & contlname) + FROM contact WHERE custid = .vcustid OPEN c1 -- Retrieve and display the customer data FETCH c1 INTO vcustid ind1, vcompany ind2, + vaddress ind3, vcsz ind4, vphone ind5 WHILE SQLCODE <> 100 THEN WRITE .vcustid, .vcompany WRITE .vaddress WRITE .vcsz -- Retrieve and display the contact data OPEN c2 FETCH c2 INTO vfullname i1 WHILE SQLCODE <> 100 THEN WRITE .vfullname FETCH c2 INTO vfullname i1 ENDWHILE -- Close cursor c2 and get the next row of -- customer data CLOSE c2 FETCH c1 INTO vcustid ind1, vcompany ind2, + vaddress ind3, vcsz ind4, vphone ind5 ENDWHILE DROP CURSOR c1 DROP CURSOR c2 The final change to improve performance is to use the RESET option on the OPEN c2 command instead of CLOSE c2. Overall, we have improved performance on this small set of rows by a full second. On a larger data set you can expect to see a greater performance improvement. *(CUSTREP5.CMD reset cursor 2 instead of close and open) DROP CURSOR c1 DROP CURSOR c2 SET VAR vcustid INTEGER DECLARE c1 CURSOR FOR SELECT custid, company, + custaddress, (custcity + ',' & custstate & custzip), + custphone FROM customer ORDER BY custid DECLARE c2 CURSOR FOR SELECT (contfname & contlname) + FROM contact WHERE custid = .vcustid OPEN c1 FETCH c1 INTO vcustid ind1, vcompany ind2, + vaddress ind3, vcsz ind4, vphone ind5 WHILE SQLCODE <> 100 THEN WRITE .vcustid, .vcompany WRITE .vaddress WRITE .vcsz -- Open cursor c2 with the RESET option, -- no CLOSE command is needed OPEN c2 RESET FETCH c2 INTO vfullname i1 WHILE SQLCODE <> 100 THEN WRITE .vfullname FETCH c2 INTO vfullname i1 ENDWHILE FETCH c1 INTO vcustid ind1, vcompany ind2, + vaddress ind3, vcsz ind4, vphone ind5 ENDWHILE DROP CURSOR c1 DROP CURSOR c2 As you can see from the above examples, maximizing the work of the DECLARE CURSOR command provides significant performance improvements. The changes were small and they didn't involve a lot of time or programming effort, but these changes did result in definite performance benefits. Customize the environment In addition to optimizing your programming code, you can improve cursor performance by optimizing the environment. Obviously, code runs faster in the 386/486 version of R:BASE than in the 286 version, and a 486 computer runs programs faster than a 386. Outside of upgrading your hardware, however, certain R:BASE environment settings can be used to improve performance. These settings generally improve overall performance as well as cursor performance. In single-user mode, SET CLEAR OFF, which is really useful when updating data because disk writes are buffered. This setting is not available in multi-user mode. In either single- or multi-user mode, use the Microrim variables; a cursor is the same as any other query. MICRORIM_F2MAXBLK and MICRORIM_TMPMAXBLK affect actual data retrieval; MICRORIM_F3MAXBLK and MICRORIM_BTMAXBLK are used with index retrieval. A scrolling cursor uses a temporary internal table and can be sped up by using MICRORIM_TMPMAXBLK. Refer to Chapter 6 in the Startup and New Features Guide for more information about using these variables. Note that you may not see any performance improvement unless you are working with a large amount of data. If you never retrieve more data than what fits in the default buffers, you won't see a visible performance increase. Look at the EXPLAIN.DAT file generated by the MICRORIM_EXPLAIN variable to see the cursor query optimization. The OPEN command actually executes the query. Each query executed in your program puts an entry in EXPLAIN.DAT; for example, SELECT or UPDATE commands in the WHILE loop are reflected. You might also see a query reference to the SYS_RULES table, which is used for multi-user locking control. By using EXPLAIN.DAT, you can easily see why using the RESET option on OPEN is faster. Normally, each OPEN redoes the query. When RESET is used, the query is only optimized once. The EXPLAIN.DAT entries for the last two command files from Example 2 earlier in this article are shown here. The first entry shows nested cursors using the OPEN and CLOSE commands. The second entry shows using the RESET option on OPEN. Cursor c1 on the Customer table is accessed sequentially, all rows in the table are retrieved, and no WHERE clause is used. If an indexed WHERE clause was used, EXPLAIN.DAT would show the index used. The second cursor on the Contact table does use an indexed WHERE clause to define the query. This query is redone each time the cursor is opened with a different vcustid value. D:\XCHNG\JULAUG\CODE\custrep4.cmd 417 05/20/94 14:56:37.580 SelectCost=1 (OptimizationTime=0ms) customer Sequential D:\XCHNG\JULAUG\CODE\custrep4.cmd 822 05/20/94 14:56:37.970 SelectCost=0.06962963 (OptimizationTime=0ms) contact (ColumnName=custid,Type=I) Random Dup=1.666667 Adj=1 D:\XCHNG\JULAUG\CODE\custrep4.cmd 822 05/20/94 14:56:38.240 SelectCost=0.06962963 (OptimizationTime=50ms) contact (ColumnName=custid,Type=I) Random Dup=1.666667 Adj=1 D:\XCHNG\JULAUG\CODE\custrep4.cmd 822 05/20/94 14:56:38.460 SelectCost=0.06962963 (OptimizationTime=0ms) contact (ColumnName=custid,Type=I) Random Dup=1.666667 Adj=1 ............ D:\XCHNG\JULAUG\CODE\custrep4.cmd 822 05/20/94 14:56:40.110 SelectCost=0.06962963 (OptimizationTime=0ms) contact (ColumnName=custid,Type=I) Random Dup=1.666667 Adj=1 D:\XCHNG\JULAUG\CODE\custrep4.cmd 836 05/20/94 14:56:40.330 SelectCost=1 (OptimizationTime=0ms) SYS_RULES Sequential The following EXPLAIN.DAT entry uses OPEN c2 RESET. The same query is used each time cursor c2 is accessed. The query does not need to be reoptimized each time the cursor is opened. D:\XCHNG\JULAUG\CODE\custrep5.cmd 1441 05/20/94 14:56:40.770 SelectCost=1 (OptimizationTime=60ms) customer Sequential D:\XCHNG\JULAUG\CODE\custrep5.cmd 1839 05/20/94 14:56:41.100 SelectCost=0.06962963 (OptimizationTime=60ms) contact (ColumnName=custid,Type=I) Random Dup=1.666667 Adj=1 D:\XCHNG\JULAUG\CODE\custrep5.cmd 1867 05/20/94 14:56:42.530 SelectCost=1 (OptimizationTime=0ms) SYS_RULES Sequential