======================================================================== WORKING WITH TEXT STRINGS USING R:BASE FUNCTIONS ======================================================================== PRODUCT : R:BASE VERSION : 3.1 or Higher CATEGORY : PROGRAMMING SUBCATEGORY : STRING MANIPULATION ======================================================================== From William B. Driskell, 6536 20th Ave. N.E., Seattle, WA 98115. Bill is a marine biologist, a computer consultant, an active participant in the Seattle-area R:BASE user's group, and a frequent contributor to the R:BASE EXCHANGE. In computer jargon, a string is a sequence of characters that is handled as one chunk of data. R:BASE provides string functions to manipulate strings in various ways: you can peek inside, extract portions, or even move pieces around. In R:BASE, a string can be stored in a TEXT column or variable or it can be a quote-enclosed set of characters. R:BASE string functions can handle all three - variables, columns, and values - interchangeably, with only two restrictions: <> If you use a column name in a string function, R:BASE must be able to determine the table. Column names work if you use the string function in a computed column expression, a form expression, a report expression, or a SELECT command expression all places where R:BASE knows the table. But you can't use a column name in string manipulation functions in a SET VAR command because the table is unknown. <> Expressions are not allowed inside string functions. String Manipulation Functions ============================= These are all the string manipulation functions available in R:BASE 3.1C: <> (SFIL(chr,nchar)) to fill a string (CHR) with a specified number of characters (NCHAR). <> (SPUT(text,string,pos)) to put a string of characters (STRING) into another string (TEXT) beginning at a specified position number (POS) in the destination string (TEXT). <> (SGET(text,nchar,pos)) to get a specified number of characters (NCHAR) out of a string (TEXT) beginning at a specified position number (POS). <> (SLEN(text)) to find out how many characters are in a string (TEXT). <> (SSUB(text,n)) to pull the nth piece out of a string (TEXT) when each piece is separated from the next by a comma. <> (SLOC(text,string)) to locate a substring (STRING) inside another string (TEXT). If the substring (STRING) is found, SLOC returns a number representing the substring's starting position in the search string (TEXT). <> (CTR(text,width)) to center a string (TEXT) in a specified number of characters (WIDTH). <> (SMOVE(text,pos1,nchar,string,pos2)) to place a piece of one string (TEXT) into another string (STRING). The substring is defined by a specified starting position (POS1) in the source string (TEXT) along with a specified number of characters (NCHAR). R:BASE places that substring in the destination string (STRING) beginning at a specified position number (POS2). <> (LJS(text,width)) to left justify a string (TEXT) in a specified number of characters (WIDTH). <> (RJS(text,width)) to right justify a string (TEXT) in a specified number of characters (WIDTH). <> (FORMAT(value,'picture-format')) to use picture formats to format a string (VALUE). <> (STRIM(text)) to trim blanks from a string (TEXT). <> (ULC(text)) to change all the characters in a string (TEXT) to lowercase. <> (LUC(text)) to change all the characters in a string (TEXT) to uppercase. <> (ICAP1(text)) to make the first character in the string uppercase and all other characters lowercase. <> (ICAP2(text)) to make the first character of each word in the string uppercase and all other characters lowercase. Text Conversion Functions ========================= R:BASE also includes several conversion functions that either return text or accept text parameters: <> (ICHAR(chr)) returns the ASCII code for a character (CHR) <> (CTXT(arg)) changes a value (ARG), no matter what its data type, into a text string <> (CHAR(integer)) looks up the character defined by the specified ASCII code (INTEGER) <> (INT(text)) changes a non-decimal number (TEXT) that is currently defined as TEXT into an INTEGER <> (NINT(text)) rounds a decimal number (TEXT) that is currently defined as TEXT to the nearest INTEGER and returns an INTEGER <> (FLOAT(text)) changes a number (TEXT) that is currently defined as TEXT into the same number defined as a DOUBLE data type <> (IHASH(text,width)) returns a hash value for a specified number (WIDTH) of characters in a text string (TEXT) beginning with the first character Building a String ================= You can build a string in several ways. Each of the following commands builds a string: SET VAR vstring1 = ('explicit (literal) characters') SET VAR vstring2 = ('letters' & 'more letters') SET VAR vstring3 = (CTXT(#DATE)) SELECT textcolname INTO vstring4 FROM tblname WHERE... In the second example, the ampersand (&) pastes (concatenates) two strings together so that the result has a blank space between the two parts. If you don't want any space between the two parts, use the plus sign (+) instead of the ampersand (&). In the third example, the CTXT conversion function changes the current date into a text string. In the fourth example, because the variable is loaded from a query, the WHERE clause must be specific enough to find just one match or R:BASE will display this error message: -ERROR- Too many rows returned. You can also use the SFIL function, which creates a string of any desired length using just one character, to build a string: SET VAR vstring = (SFIL(.vchar,.vlength)) Locating & Measuring Strings ============================ Two primary questions exist for any string: <> How long is it? <> What's inside it? Use SLEN like this to find the length of a string: SET VAR vlen INTEGER = (SLEN(.vstring)) Use SLOC like this to find the starting location of one string embedded in another string: SET VAR vpos INTEGER = (SLOC(.vhaystack,.vneedle)) This SLOC example takes the value stored in vneedle and looks for that string in the value stored in vhaystack. If the search is successful, the function returns the starting location in vhaystack. If no match is found, the function returns a zero. For example, if vneedle is equal to 'tiny needle' and vhaystack is equal to 'big haystack,' the function will return a zero. But if vneedle is equal to 'hay' and vhaystack is equal to 'big haystack,' the function will return a five because 'hay' begins with the fifth character in 'big haystack.' The following example uses SLOC and SLEN to test a Social Security number's format. It produces a list of Social Security numbers that are missing hyphens, given that the correct format is 123-45-6789. SELECT ssn FROM employee WHERE (SLOC(ssn,'-')) = 0 OR (SLEN(ssn)) < 11 Building a Table of Contents ============================ By combining the SLEN and the SFIL functions, you can build a table-of- contents format in which each line has, for example, an item followed by a variable number of periods followed by a related currency value. Here's an example: SET VAR vaddlen = (40 - SLEN(.vitem)) SET VAR vdots = (SFIL('.',.vaddlen)) SET VAR voutput = (.vitem & .vdots & .vcost) WRITE .voutput This produces a list that looks like this: Arthur A. Aardvark ..................... $237.57 Bettsy Boop ............................ $199.95 Parsing a String ================ The word parsing means separating into elemental parts. For example, you can parse a name field by splitting it into the first, middle, and last names. For example, the following routine extracts a person's last name from the full name by using SLOC to locate the last blank space and SGET to extract the desired segment. The arguments in SGET tell it which string to use, how many characters to extract, and where to begin extracting. SET VAR vname TEXT = ('Arthur A. Aardvark') -- Locate first blank in the name. SET VAR vloc INTEGER = (SLOC(.vname,'~')) SET VAR vlen INTEGER = (SLEN(.vname)) -- Trim the name & locate next blank. WHILE vloc > 0 THEN SET VAR vlen = (.vlen - .vloc) SET VAR vloc = (.vloc + 1) SET VAR vname = (SGET(.vname,.vlen,.vloc)) SET VAR vloc = (SLOC(.vname,' ')) ENDWHILE When the routine finishes, the variable vname contains the last name (Aardvark). You could use a similar routine to parse a telephone number by looking for the embedded hyphen locations or parse an address by looking for the commas. Using SPUT as a Paste Editor ============================ The SPUT function puts characters into a string at specified locations, overwriting the previous characters in the process. Its functionality is best described as a paste editor. But it can also assist in parsing. For example, you can change the above routine to use SPUT to place an at symbol (@) on top of the blanks to allow the search to proceed to the next blank. -- Locate first blank in the name SET VAR vloc INTEGER = (SLOC(.vname,'~')) -- Fill it with @ & locate next blank. WHILE vloc > 0 THEN SET VAR vname = (SPUT(.vname,'@',.vloc)) SET VAR vlttrloc = (.vloc + 1) SET VAR vloc = (SLOC(.vname,' ')) ENDWHILE -- Extract the last name. SET VAR vlen = (SLEN(.vname) - .vlttrloc + 1) SET VAR vlastname = (SGET(.vname,.vlen,.vlttrloc)) Using SSUB to Parse a String ============================ You can also use the SSUB function to parse strings. SSUB extracts items in a comma-delimited string based on the item's numbered position in the list. In the next example, the commas in vaddr define a list of four items: SET VAR vaddr = ('15395 SE 30th Place,Bellevue,WA,98007') SET VAR vcity TEXT = (SSUB(.vaddr,2)) SET VAR vstate TEXT = (SSUB(.vaddr,3)) SET VAR vzip TEXT = (SSUB(.vaddr,4)) Now, for example, the variable vstate will contain WA. The variable vzip will contain the TEXT zip code, but you can convert it into a numeric value by using the new conversion capabilities of R:BASE 3.1B's FLOAT, INT, or NINT functions. The SSUB function is very handy for parsing out the selections from a two-column menu or from a checkbox menu. Moving From One String to Another ================================= The SMOVE function is an extract-and-merge command, combining the functions of SGET and SPUT. Consider the following example: SET VAR vdate=(CTXT(.#DATE)), + vproj# = ('64-951.00-SPD'), + vjobid = (SMOVE(.vdate,7,2,.vproj#,8)) The variable vjobid contains 64-951.91-SPD because two characters (the seventh and eighth) of the year were moved into vproj# starting in position number eight and vjobid was set to the result. Formatting Strings ================== R:BASE provides two groups of string formatting functions: the capitalization functions and the justification functions. The Capitalization Functions ============================ The capitalization functions cover most case conversion needs, as the following table shows: SET VAR vstring = ('Arthur A. Aardvark') Function Result ULC(.vstring) 'arthur a. aardvark' LUC(.vstring) 'ARTHUR A. AARDVARK' ICAP1(.vstring) 'Arthur a. aardvark' ICAP2(.vstring) 'Arthur A. Aardvark' Names like O'Toole or McDonald, which have embedded capitals, present special problems. ICAP2 won't work on names such as these, but you can use what I call brute-force ASCII manipulation - subtracting 32 from the ASCII value to force the capitalization. Here's a routine that deals with names containing an apostrophe by using a variety of functions. SET QUOTE=" *( In the next command, the item to the) *( right of the = is a single quotation) *( mark enclosed in double quotation marks.) SET VAR vappos TEXT = "'" SET VAR vloc = (SLOC(.vname,.vappos) + 1) IF vloc > 0 THEN SET VAR vcap = (SGET(.vname,1,.vloc)) SET VAR vcap = (LUC(.vcap)) SET VAR .vname = (SPUT(.vname,.vcap,.vloc)) ENDIF SET QUOTE=' Justification Functions ======================= The justification functions control placement of a string within a prescribed field length. These functions can be particularly handy for aligning columns on a decimal point, centering titles, or precisely placing a currency amount on a bank check. Function Result RJS(.vstring,26) 'Arthur A. Aardvark' LJS(.vstring,26) 'Arthur A. Aardvark' CTR(.vstring,26) 'Arthur A. Aardvark' With R:BASE 3.1C, you can use the new FORMAT function to format strings. FORMAT works like the WRITE command's USING format. For example, you can use the FORMAT function in commands like SELECT to align DOUBLE values on a decimal point or to place commas in INTEGER values. The final string function is STRIM. It trims trailing blanks from a text string. R:BASE currently automatically trims trailing blanks, so you won't need to use STRIM; it's a remnant from previous versions and products. These string functions can make your job easier and your output more attractive.