Because the defined length of TEXT values is often more than four characters, R:BASE must hash (convert) the TEXT values containing more than four characters to create a four-byte index called a hash value. Even though a TEXT column may contain unique text values, it may generate duplicate hash values.
How R:BASE Hashes TEXT Data
R:BASE attempts to create unique hash values by performing the following operations:
•Fills the first and second bytes of the four-byte hash value with the binary equivalents of the first and second characters in the TEXT string.
•Sums the binary values of all the odd-numbered characters in the TEXT string beginning with the third. Then it divides the sum by 256 and uses the remainder to fill the third byte of the hash value.
•Sums the binary values of all the even-numbered characters in the TEXT string beginning with the fourth. Then it divides the sum by 256 and uses the remainder to fill the fourth byte of the hash value.
If two hash values created by hashing two different TEXT strings are identical, R:BASE builds a multiple occurrence table in file 3 to store the duplicate hash values.
This is why TEXT strings that share only their beginning two characters (often the case with part numbers and with form, report, and label names) may have the same hash value, thereby reducing the effectiveness of the index. If you can ensure that the first two characters of each TEXT column value are unique, you can improve the performance of an index on that TEXT column, because you'll eliminate all of the multiple occurrence tables.
Another way to make hash values unique is to define indexed TEXT columns with as few characters as possible while remembering to make the column values unique, especially with regard to the first two characters.
IHASH to the Rescue
If the first two characters of TEXT values are identical, which is often true with part numbers, such as AB-100, AB-101, AB-102, and so on, try using the IHASH function and a computed column to reduce the likelihood of duplicates. It's important to understand that IHASH doesn't guarantee a unique hash value. IHASH may actually hash two different TEXT values to the same hash value (index). What IHASH does is to create more non-duplicate values, though it doesn't guarantee that all values will be unique.
How to Use IHASH
IHASH is a conversion function, not a command. Its syntax is simple:
(IHASH(arg,n))
Arg can be a TEXT column, a dotted variable, or a value. The width, n, tells R:BASE how many characters, starting with the first, are needed to establish a unique string. If n is zero, R:BASE uses the entire length of the string.
The first consideration in using IHASH is to decide whether to use it at all. For example, IHASH may not improve performance when used on a TEXT column that holds unique text names of four characters or less. But it will improve performance on a TEXT column that has many values with identical first and second characters. It all depends on whether IHASH can reduce the number of identical values in the index.
IHASH Step by Step
Here's an example that uses IHASH to speed up indexed access and joins by increasing the speed of a key, indexed TEXT column named partid. Follow these steps:
1. | Define an INTEGER computed column, including the IHASH function: |
ALTER TABLE tblname ADD hashid = (IHASH(partid,0)) INTEGER
If partid exists in other tables, you may want to add hashid to them too. For example, if partid links a parts table with an invoice table, use these commands to add hashid:
ALTER TABLE parts ADD hashid = (IHASH(partid,0)) INTEGER
ALTER TABLE invoice ADD hashid
2. | Drop the index from partid and index the computed column hashid: |
DROP INDEX partid IN tblname
CREATE INDEX ON tblname hashid
Index hashid in all the tables where it appears.
3. | Use the newly indexed computed column in WHERE clauses and multi-table SELECT commands, but be sure to continue to include the actual TEXT column also. For example, use it to find a specific part number instead of using this WHERE clause: |
command ... WHERE partid = .vpartid
Use this SET VAR and WHERE clause:
SET VAR vhash = (IHASH(.vpartid,0))
command ... WHERE (hashid = .vhash AND partid = .vpartid)
You must use the dotted variable (.vhash) instead of directly using the expression in the WHERE clause, because R:BASE doesn't use an index if the WHERE clause uses an expression.
You must include AND partid = . vpartid to ensure that you get the right value. Remember, there's no guarantee that the IHASH value will be unique.
If you use the computed IHASH column (hashid) in a multi-table join, you must also remember to include both conditions in the WHERE clause. For example, the following rule checks for a unique value:
RULES 'Value must be unique.' +
FOR tblname SUCCEEDS +
WHERE partid IS NOT NULL +
AND NOT EXISTS +
(SELECT partid FROM tblname t2 +
WHERE t2.hashid = tblname.hashid +
AND t2.partid = tblname.partid)
Correlated sub-SELECTs force R:BASE to do an internal join using indexes.
Here's another example of a multi-table join using hashid:
SELECT t1.partid, SUM(t2.price) +
FROM parts t1, invoices t2 +
WHERE t2.hashid = t1.hashid +
AND t2.partid = t1.partid +
GROUP BY t1.partid
Speeding Up IHASH
When you use IHASH, try not to use zero as the width (n). Zero causes R:BASE to use the entire string. For example, if you're sure that the first 10 characters in a TEXT 50 column are enough to establish it as unique, use 10 rather than zero with IHASH.