""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
INDEXES & IHASH ADD SPEED
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
PRODUCT : R:BASE VERSION : 3.1
CATEGORY : SPEED & PERFORMANCE SUBCATEGORY : INDEXES & IHASH
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
By indexing the right columns, you can speed up your applications.
Processes that used to take 20 minutes or more may be completed in
only a few seconds. An index on an R:BASE column speeds up access to a
row of data in much the same way an index in a book speeds up access
to a page.
Get a good introduction to indexing from your R:BASE manuals and
"Maximize R:BASE 3.0 Performance with Index Processing" in the
May/June 1990 R:BASE EXCHANGE.
Deciding What to Index
""""""""""""""""""""""
It's a good idea to put an index on key columns and on linking
columns. A key column is a column that uniquely identifies rows. A
linking or common column is a column that exists in two or more tables
in order to establish a relationship between the tables.
If you need to use more than one column to uniquely identify a row,
it's a good idea to combine them into a computed column, and then
index the computed column.
Use CREATE INDEX to index a column or to rebuild existing indexes.
Optimizing Indexes
""""""""""""""""""
Optimize indexes by paying attention to the data type of the indexed
column and by minimizing the number of duplicate values in the column.
The fastest, most efficient data types for indexed access are INTEGER,
REAL, DATE, TIME, TEXT with a defined length of four characters or
less. or NUMERIC with a defined precision of two or less. Indexes on
other columns will be less effective.
The more unique the column's values, the more effective the index will
be. The best columns to index are those that don't have duplicate
values.
Hashing Long TEXT Values
""""""""""""""""""""""""
R:BASE has to hash (convert) TEXT values that contain more than four
characters to create a four-byte index called a "hash value." Even
though a TEXT column contains completely unique values, it can end up
with duplicate hash values.
How R:BASE Hashes TEXT
""""""""""""""""""""""
R:BASE uses the following technique to make the hash values as unique
as possible:
o R:BASE uses the binary equivalents of the first and second
characters in the TEXT string to fill the first and second bytes
of the four-byte hash value.
o R:BASE sums the binary values of all the odd-numbered characters
in the TEXT string beginning with the third. Then it divides the
sum by 256 and uses the remainder to fill the third byte of the
hash value.
o R:BASE sums the binary values of all the even-numbered characters
in the TEXT string beginning with the fourth. Then it divides the
sum by 256 and uses the remainder to fill the fourth byte of the
hash value.
As a result, two hash values, created by hashing two different TEXT
strings, may be identical. In that case, R:BASE builds a "multiple
occurrence table" to store the duplicate hash values.
This is why TEXT strings that begin with the same two characters
(often the case with part numbers) that are otherwise unique may have
the same hash value, thereby reducing the effectiveness of the index.
If you can ensure that the first two characters of each TEXT column
value are unique, you can improve the performance of an index on that
TEXT column because you'll eliminate all the "multiple occurrence
tables."
To make hash values as unique as possible, try to define indexed TEXT
columns with as few characters as possible, and make the column values
unique, especially with regard to the first two characters.
If you must define an indexed TEXT column with a width greater than
four and you can't ensure the uniqueness of the first two characters,
that column may be a good candidate for use with the IHASH function
added to R:BASE in R:BASE 3.1A (the first release of Upgrade
"EXPRESS").
IHASH to the Rescue
"""""""""""""""""""
If the first two characters are identical, as is often true with part
numbers (AB-100, AB-101, AB-102, and so on), use the new IHASH
function and a computed column to make your index more unique.
It's important to understand that IHASH doesn't guarantee a unique
hash value. IHASH may hash two different TEXT values to the same hash
value (index). IHASH makes hash values more unique but doesn't
guarantee complete uniqueness.
If you have R:SCOPE (available for purchase by calling 800-248-2001),
you can check to see how unique the index values (hash values) are.
Choose Indexes under Check, then choose the table and column you want
to check. R:SCOPE will tell you how many unique and duplicate hash
values there are.
How to Use IHASH
""""""""""""""""
IHASH is a conversion function, not a command. It's syntax is simple:
(IHASH(arg,n))
The TEXT argument (ARG) can be a TEXT column, a dotted variable, or a
value. The width (N) tells R:BASE how many characters, starting with
the first, are needed to establish uniqueness in the string. If N is
zero, R:BASE uses the entire length of the string.
But, you need to know more than just the syntax to use IHASH
correctly. First, you need to decide whether or not to use it. For
example, IHASH may not improve performance when used on a TEXT column
that holds unique text names. But it will improve performance on a
TEXT column that has many values that have identical first and second
characters. It all depends on whether IHASH can make the hash values
for the column more unique. You can use R:SCOPE to check it out.
IHASH Step by Step
""""""""""""""""""
As an example, you can use IHASH to speed up indexed access and joins
by making a key, indexed TEXT column named (PARTID) faster. Follow
these steps:
STEP 1
======
Define an INTEGER computed column including the IHASH function:
ALTER TABLE tblname ADD hashid +
= (IHASH(partid,0)) INTEGER
If PARTID exists in other tables, you may want to add HASHID to them
too. For example, if PARTID links a PARTS table with an INVOICE table,
use these commands to add HASHID:
ALTER TABLE parts ADD hashid +
= (IHASH(partid,0)) INTEGER
ALTER TABLE invoice ADD hashid
STEP 2
======
Drop the index from PARTID and index the computed column HASHID:
DROP INDEX partid IN tblname
CREATE INDEX ON tblname hashid
Index HASHID in all the tables where it appears.
STEP 3
======
Use the newly indexed computed column in WHERE clauses and multi-table
SELECT commands, but be sure to continue to include the actual TEXT
column too. For example, to find a specific part number, instead of
using this WHERE clause:
Command ... WHERE partid = .vpartid
use this SET VAR and WHERE clause:
SET VAR vhash = (IHASH(.vpartid,0))
Command ... WHERE hashid = .vhash +
AND partid = .vpartid
You must use the dotted variable (.VHASH) instead of the using the
expression directly in the WHERE clause, because R:BASE doesn't use
indexes if the WHERE clause uses an expression.
You must include "AND partid = .vpartid" to ensure that you get the
right value. Remember, there's no guarantee that the IHASH value will
be unique.
If you use the computed IHASH column (HASHID) in a multi-table join,
you must also remember to include both connections in the WHERE
clause. For example, the following rule checks for a unique value:
RULES 'Value must be unique.' +
FOR tblname SUCCEEDS +
WHERE partid IS NOT NULL +
AND NOT EXISTS +
(SELECT partid FROM tblname t2 +
WHERE t2.hashid = tblname.hashid +
AND t2.partid = tblname.partid)
It uses a correlated sub-SELECT (explained in "Making Sub-SELECTs,
Outer Joins, & Rules Faster"). Correlated sub-SELECTs force R:BASE to
do an internal join using indexes.
Here's another example of a multi-table join using HASHID:
SELECT t1.partid, SUM(t2.price) +
FROM parts t1, invoices t2 +
WHERE t2.hashid = t1.hashid +
AND t2.partid = t1.partid +
GROUP BY t1.partid
Speed Up IHASH
""""""""""""""
When you use IHASH, try not to use zero as the N width. Zero causes
R:BASE to use the entire string. If you're sure, for example, that the
first 10 characters in a TEXT 50 column are enough to establish
uniqueness, use 10 rather than zero in the IHASH.
WHERE Clauses & Indexes
"""""""""""""""""""""""
A WHERE clause can be in a command, a rule, or a lookup expression. If
a WHERE clause has only one condition, R:BASE uses the index on the
column if the conditional operator following the indexed column is EQ,
=, BETWEEN, or IS NULL.
The BETWEEN operator won't use the index on a data type that must be
hashed, so if the operator is BETWEEN, the data type of the indexed
column must be DATE, TIME, INTEGER, REAL, NUMERIC with a defined
precision of two or less, or TEXT with defined length of four or less.
R:BASE won't use the index if the WHERE clause contains a wildcard
character or an expression. If you want R:BASE to use the index,
replace expressions with constants or dotted variables, and get rid of
the wildcards.
If a WHERE clause contains more than one condition and all conditions
are combined with AND, the clause must have at least one indexed
column that uses EQ, =, BETWEEN, or IS NULL if you want R:BASE to use
indexes. Under these conditions, R:BASE chooses the condition that
places the greatest restriction on the WHERE clause for the indexed
search. Here are the three levels of restriction:
o = or EQ is most restrictive.
o IS NULL is next most restrictive.
o BETWEEN is least restrictive.
If there are two conditions and both are at the same level of
restriction, R:BASE uses the first one. For optimum performance, make
the most restrictive indexed condition the first condition in the
WHERE clause.
ORDER BY Clauses & Indexes
""""""""""""""""""""""""""
Under the following conditions, R:BASE uses indexes to sort quickly:
o The ORDER BY clause contains a single, indexed column that is
DATE, TIME, INTEGER, REAL, NUMERIC with a defined precision of
two or less, or TEXT with a defined length of four or less.
o The command doesn't use a view.
o If the command also includes a WHERE clause, the same indexed
column used in the ORDER BY clause is also used in the WHERE
clause along with one of the index supporting operators (=, IS
NULL, or BETWEEN).