"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" INDEXES & IHASH ADD SPEED """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" PRODUCT : R:BASE VERSION : 3.1 CATEGORY : SPEED & PERFORMANCE SUBCATEGORY : INDEXES & IHASH """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" By indexing the right columns, you can speed up your applications. Processes that used to take 20 minutes or more may be completed in only a few seconds. An index on an R:BASE column speeds up access to a row of data in much the same way an index in a book speeds up access to a page. Get a good introduction to indexing from your R:BASE manuals and "Maximize R:BASE 3.0 Performance with Index Processing" in the May/June 1990 R:BASE EXCHANGE. Deciding What to Index """""""""""""""""""""" It's a good idea to put an index on key columns and on linking columns. A key column is a column that uniquely identifies rows. A linking or common column is a column that exists in two or more tables in order to establish a relationship between the tables. If you need to use more than one column to uniquely identify a row, it's a good idea to combine them into a computed column, and then index the computed column. Use CREATE INDEX to index a column or to rebuild existing indexes. Optimizing Indexes """""""""""""""""" Optimize indexes by paying attention to the data type of the indexed column and by minimizing the number of duplicate values in the column. The fastest, most efficient data types for indexed access are INTEGER, REAL, DATE, TIME, TEXT with a defined length of four characters or less. or NUMERIC with a defined precision of two or less. Indexes on other columns will be less effective. The more unique the column's values, the more effective the index will be. The best columns to index are those that don't have duplicate values. Hashing Long TEXT Values """""""""""""""""""""""" R:BASE has to hash (convert) TEXT values that contain more than four characters to create a four-byte index called a "hash value." Even though a TEXT column contains completely unique values, it can end up with duplicate hash values. How R:BASE Hashes TEXT """""""""""""""""""""" R:BASE uses the following technique to make the hash values as unique as possible: o R:BASE uses the binary equivalents of the first and second characters in the TEXT string to fill the first and second bytes of the four-byte hash value. o R:BASE sums the binary values of all the odd-numbered characters in the TEXT string beginning with the third. Then it divides the sum by 256 and uses the remainder to fill the third byte of the hash value. o R:BASE sums the binary values of all the even-numbered characters in the TEXT string beginning with the fourth. Then it divides the sum by 256 and uses the remainder to fill the fourth byte of the hash value. As a result, two hash values, created by hashing two different TEXT strings, may be identical. In that case, R:BASE builds a "multiple occurrence table" to store the duplicate hash values. This is why TEXT strings that begin with the same two characters (often the case with part numbers) that are otherwise unique may have the same hash value, thereby reducing the effectiveness of the index. If you can ensure that the first two characters of each TEXT column value are unique, you can improve the performance of an index on that TEXT column because you'll eliminate all the "multiple occurrence tables." To make hash values as unique as possible, try to define indexed TEXT columns with as few characters as possible, and make the column values unique, especially with regard to the first two characters. If you must define an indexed TEXT column with a width greater than four and you can't ensure the uniqueness of the first two characters, that column may be a good candidate for use with the IHASH function added to R:BASE in R:BASE 3.1A (the first release of Upgrade "EXPRESS"). IHASH to the Rescue """"""""""""""""""" If the first two characters are identical, as is often true with part numbers (AB-100, AB-101, AB-102, and so on), use the new IHASH function and a computed column to make your index more unique. It's important to understand that IHASH doesn't guarantee a unique hash value. IHASH may hash two different TEXT values to the same hash value (index). IHASH makes hash values more unique but doesn't guarantee complete uniqueness. If you have R:SCOPE (available for purchase by calling 800-248-2001), you can check to see how unique the index values (hash values) are. Choose Indexes under Check, then choose the table and column you want to check. R:SCOPE will tell you how many unique and duplicate hash values there are. How to Use IHASH """""""""""""""" IHASH is a conversion function, not a command. It's syntax is simple: (IHASH(arg,n)) The TEXT argument (ARG) can be a TEXT column, a dotted variable, or a value. The width (N) tells R:BASE how many characters, starting with the first, are needed to establish uniqueness in the string. If N is zero, R:BASE uses the entire length of the string. But, you need to know more than just the syntax to use IHASH correctly. First, you need to decide whether or not to use it. For example, IHASH may not improve performance when used on a TEXT column that holds unique text names. But it will improve performance on a TEXT column that has many values that have identical first and second characters. It all depends on whether IHASH can make the hash values for the column more unique. You can use R:SCOPE to check it out. IHASH Step by Step """""""""""""""""" As an example, you can use IHASH to speed up indexed access and joins by making a key, indexed TEXT column named (PARTID) faster. Follow these steps: STEP 1 ====== Define an INTEGER computed column including the IHASH function: ALTER TABLE tblname ADD hashid + = (IHASH(partid,0)) INTEGER If PARTID exists in other tables, you may want to add HASHID to them too. For example, if PARTID links a PARTS table with an INVOICE table, use these commands to add HASHID: ALTER TABLE parts ADD hashid + = (IHASH(partid,0)) INTEGER ALTER TABLE invoice ADD hashid STEP 2 ====== Drop the index from PARTID and index the computed column HASHID: DROP INDEX partid IN tblname CREATE INDEX ON tblname hashid Index HASHID in all the tables where it appears. STEP 3 ====== Use the newly indexed computed column in WHERE clauses and multi-table SELECT commands, but be sure to continue to include the actual TEXT column too. For example, to find a specific part number, instead of using this WHERE clause: Command ... WHERE partid = .vpartid use this SET VAR and WHERE clause: SET VAR vhash = (IHASH(.vpartid,0)) Command ... WHERE hashid = .vhash + AND partid = .vpartid You must use the dotted variable (.VHASH) instead of the using the expression directly in the WHERE clause, because R:BASE doesn't use indexes if the WHERE clause uses an expression. You must include "AND partid = .vpartid" to ensure that you get the right value. Remember, there's no guarantee that the IHASH value will be unique. If you use the computed IHASH column (HASHID) in a multi-table join, you must also remember to include both connections in the WHERE clause. For example, the following rule checks for a unique value: RULES 'Value must be unique.' + FOR tblname SUCCEEDS + WHERE partid IS NOT NULL + AND NOT EXISTS + (SELECT partid FROM tblname t2 + WHERE t2.hashid = tblname.hashid + AND t2.partid = tblname.partid) It uses a correlated sub-SELECT (explained in "Making Sub-SELECTs, Outer Joins, & Rules Faster"). Correlated sub-SELECTs force R:BASE to do an internal join using indexes. Here's another example of a multi-table join using HASHID: SELECT t1.partid, SUM(t2.price) + FROM parts t1, invoices t2 + WHERE t2.hashid = t1.hashid + AND t2.partid = t1.partid + GROUP BY t1.partid Speed Up IHASH """""""""""""" When you use IHASH, try not to use zero as the N width. Zero causes R:BASE to use the entire string. If you're sure, for example, that the first 10 characters in a TEXT 50 column are enough to establish uniqueness, use 10 rather than zero in the IHASH. WHERE Clauses & Indexes """"""""""""""""""""""" A WHERE clause can be in a command, a rule, or a lookup expression. If a WHERE clause has only one condition, R:BASE uses the index on the column if the conditional operator following the indexed column is EQ, =, BETWEEN, or IS NULL. The BETWEEN operator won't use the index on a data type that must be hashed, so if the operator is BETWEEN, the data type of the indexed column must be DATE, TIME, INTEGER, REAL, NUMERIC with a defined precision of two or less, or TEXT with defined length of four or less. R:BASE won't use the index if the WHERE clause contains a wildcard character or an expression. If you want R:BASE to use the index, replace expressions with constants or dotted variables, and get rid of the wildcards. If a WHERE clause contains more than one condition and all conditions are combined with AND, the clause must have at least one indexed column that uses EQ, =, BETWEEN, or IS NULL if you want R:BASE to use indexes. Under these conditions, R:BASE chooses the condition that places the greatest restriction on the WHERE clause for the indexed search. Here are the three levels of restriction: o = or EQ is most restrictive. o IS NULL is next most restrictive. o BETWEEN is least restrictive. If there are two conditions and both are at the same level of restriction, R:BASE uses the first one. For optimum performance, make the most restrictive indexed condition the first condition in the WHERE clause. ORDER BY Clauses & Indexes """""""""""""""""""""""""" Under the following conditions, R:BASE uses indexes to sort quickly: o The ORDER BY clause contains a single, indexed column that is DATE, TIME, INTEGER, REAL, NUMERIC with a defined precision of two or less, or TEXT with a defined length of four or less. o The command doesn't use a view. o If the command also includes a WHERE clause, the same indexed column used in the ORDER BY clause is also used in the WHERE clause along with one of the index supporting operators (=, IS NULL, or BETWEEN).