"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" BUILDING CONNECTED SOLUTIONS """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" PRODUCT : R:BASE VERSION : 3.1 CATEGORY : PROGRAMMING SUBCATEGORY : TOOLS """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" From Michael Mayer, founder of Mayer Labs, Ltd., 24063 Western Ave., Park Forest, IL 60466. Michael develops high performance R:BASE applications specializing in the Programmer Interface (PI). He is a member of the Steering Committee of the Chicago Area R:BASE User Group. He can be reached at 708-481-4536. About two years ago, a client contacted me for help with an unusual business problem. The client explained that they had a UNIX system in their factory that was controlling dozens of different analog and digital I/O (input/output) devices. He went on to say that their UNIX control system was working satisfactorily, but there were two small problems. The UNIX System didn't have a way to permanently store the data, and they could not create the exact report they desired. He pointed at a box of several hundred 1.2 megabyte diskettes on the floor. He mentioned that they have more boxes back at the factory just like that one. We both looked down at the box and shook our heads. The Client's Needs """""""""""""""""" Specifically, they needed to be able to capture the data permanently, and to be able to generate real-time reports from the data at any time. Sounded simple enough, so I asked, "Tell me, just how much data are we talking about each day?" The response was "Between 1.5 and 2 megabytes a day, right now, but we plan on adding four more UNIX systems and quadrupling our production capacity by the end of the year." They wanted to feed all of this data into a single computer for archival and reporting purposes. The total amount of data to be managed would be in the range of 2.5 to 3 gigabytes a year! Next, I asked, "And what would you like to do with all this data?" The response was that they simply wanted to go back to the data and create this one special report. He handed me a 70 page real-time printout that was well beyond even the capabilities of the Extended Report Writer (XRW). Report Requirements """"""""""""""""""" The report required two separate columns, linked throughout with the common thread of time stamps. The left side of the report carried what we call "process data" and the right side of the report carried "alarm data." It's sort of an action/reaction type of report. They needed to be synchronized in real time, but the two columns of data were coming from separate sources, so they may not have the exact same time stamps. Further, each report would need to consist of a combination of production phases, 34 in total, all of which could be called in any order at any time during the report. Data was gathered from each input source at variable rates, ranging from 15 seconds to 15 minutes. Another requirement was that the person requesting the report would identify the increments that were to be reported, such as every one, two, five, 10 or 15 minutes. As the conversation went on, many other major issues came out, like the fact that the report had to be completely printed in under 15 minutes from the time they started it. Also, that each report would require some 250,000 rows of data be loaded and processed in order to print. The only real questions in their mind was if I could do it and how long it would take, as they needed it last month. Eying the box of disks on the floor, I had a few more questions of my own, for example, how many more times in my life will I hear that line? There were other technical requirements, but the point here is that to the average person, this clearly is NOT an R:BASE database application. For that matter, it's probably impossible to handle it all on a PC in any sort of reasonable manner. As systems grow, there will be more and more times that we as developers will be faced with problems that require creative solutions that stretch our imaginations and capabilities. This was definitely one of them. The Solution """""""""""" To solve the problem, I broke it down into manageable segments. First, I needed to decide how to store gigabytes of data. The thought of trying to sort a multi-gigabyte database on a PC gave me the chills, let alone the backup, restoration, and integrity issues that such a large database would entail. My solution was to use a WORM (Write Once Read Many) optical disk drive. WORM drives are great for applications that have to store and report on a lot of data but don't need to have the data changed or edited. You write the data on the WORM drive once, and then you can read it as many times as you need for a report as long as you don't try to change what's already there. WORM drives are designed to handle large quantities of data with speed. Storing Data on the WORM """""""""""""""""""""""" I knew I couldn't store the data on the WORM disks in the traditional R:BASE file format. Too many problems would exist when data from multiple databases would have to be joined at report time. Storing data on the WORM disk would have to be handled in daily batches. I chose to write this data to disk in ASCII format after loading each batch of UNIX data. The R:BASE application generates a unique file name based on the date (for example, 11-21-90.DAT). We use these .DAT files to generate the "process data"--the left side of the report. We handle the "alarm data" in a similar fashion, using the date as the file name but giving the "alarm data" files a .PLM extension (for process alarm). Next Step of the Solution """"""""""""""""""""""""" At this point, we can store unlimited data, but we still needed a little more information on each WORM disk. Because each WORM disk could only store 470 megabytes per side (940 megabytes total), it was possible that a report would require that the user flip or change disks in order to get all of the required files for the report. To address this issue, I wrote a simple program in the C programming language. The C routine writes a VOLUME ID name on each side of the WORM disk whenever a new WORM disk is formatted. The VOLUME ID identifies each side of each disk. The C routine specifies the main UNIX input source, the year, and the quarter in a serialized format. Another C routine prompts the user to put the correct WORM disk in, and then verifies that it's the correct disk and disk side before loading data at report time. The VOLUME ID label provides the control method. The C program prompts the user when a new WORM disk needs to be formatted and places the appropriate VOLUME ID label name on the new disk. Now we have practically unlimited storage and control. We know exactly where any day of data is on any WORM disk. All five UNIX systems feed into this one DOS system. R:BASE's Part """"""""""""" R:BASE has two primary jobs: create the final report and hold the data while waiting for a complete day of information to exist before writing a new WORM disk file. The UNIX system feeds new data to the system. New diskettes with several files of ASCII data on them arrive on a daily basis. We load these from the disks into R:BASE by using a Programmer Interface (PI) routine I wrote. Microrim's PI allows programmers like me to write a routine in C and give the routine direct access to the database. The PI routine I wrote has a CRC (cyclic redundancy check) function to verify that R:BASE received the same data that the UNIX system sent. Once the R:BASE database holds a complete day of information, the R:BASE application creates the new file name, unloads the data to that file on the hard disk, prompts the user to insert the correct WORM disk, and then copies the file onto the WORM disk. This method works perfectly. We ended up using only a 60 megabyte hard disk on the DOS computer. From File to Database """"""""""""""""""""" Next, I needed to find a way to load 250,000 rows into the R:BASE database in three minutes so R:BASE could create the report. That's over 1,000 records a second. I couldn't use the LOAD command because I needed exceptional speed, so I wrote another fast C routine and used Microrim's PI to get direct access to the ASCII files and the R:BASE database. After careful study of the PI, I came up with a lightning fast program that met the loading specifications. The program loads some 65,000 records per minute into an R:BASE database. The PI is great for any application where you want to do a single function (like loading data) extremely fast. I have written many programs with the PI, and despite it's quirks, there's no better way to achieve the fastest possible program speed with complete device and user interface control. Process Flow """""""""""" The process flow now looks like this: +-------------+ | UNIX Data | | Diskettes | +-------------+ | | Daily Upload/Archive | +-------------<-----------+ | | | | | | | +-------------+ +-------------+ +-------> | R:BASE | | WORM Disk | | Databases | | Archive | +-------------+ +-------------+ | | | | | | | +------------->-----------+ | Report Creation Process | | To Printer Report Generation """"""""""""""""" Generating the report was the most complex part of the problem. First, R:BASE prompts for the days when a report cycle started and stopped. At report time, R:BASE translates these dates into file names to be loaded from the WORM disk with the PI routine. For example, report cycle number 9001-2 contained a start date of 11-20-90 and an end date of 11-22-90. In R:BASE, I translated each date value into a file name to load from the WORM disk. Next, R:BASE asks the person using the application for their preferences as to the period of time slices to report and whether they want temperature data displayed in Celsius or Fahrenheit degrees (another minor detail that came out later in the development process). Then the PI routine loads the R:BASE database with the rows and columns from the ASCII files on WORM disk required to create the report, converting the data on the fly during the loading process. Now the data is in the R:BASE database. The next step is to give some structure to the real time data. The report generator consists of 34 dynamic modules--one for each of the 34 phases. Any of the 34 modules could be called at any time during the report generation process. The data dictates which module is called. Certain rows in the data contain flags (marks or notifications) that indicate when the phase changes. When a flag is encountered one of three situations has just occurred: o A module change just happened--the phase just changed. o A module change is about to happen--the phase is about to change. o An error occurred in the flag sensor, in which case the flag should be ignored. I didn't know which of the three was true, so I wrote an R:BASE program that searches backward and forward in the data around that point to determine exactly when the phase change occurred and what the change was. Each phase is defined by a phase number (1 through 34), starting date, starting time, ending time, and ending date. R:BASE loads these values into a FLAG table in the R:BASE database as new flags were encountered in the data stream. If the R:BASE program finds out the phase change wasn't valid, it's row is removed from the flag table. Each R:BASE module has a number (1.CMD through 34.CMD) corresponding to the phase number stored in the flag table. For example, when the data indicates that we're now at phase 1, this means that R:BASE should run 1.CMD, passing that phase's starting date, starting time, ending time, and ending date into 1.CMD as parameters. To run the correct module, I go row by row through the flag table, copying its its values into variables. Then I append .CMD onto the end of the phase number to create the name of the correct program. Now, I can run the program 1.CMD passing the start and stop values as parameters. Here's a simplified example using an R:BASE 3.1 DECLARE CURSOR structure to demonstrate how I do this: *( Example DECLARE CURSOR structure) DECLARE cur1 CURSOR FOR SELECT + phase, sdate, stime, etime, edate + FROM flagtab OPEN cur1 FETCH cur1 INTO v_phase IND i1, + v_sdate IND i2, v_stime IND i3, + v_etime IND i4, v_edate IND i5 WHILE SQLCODE <> 100 THEN SET VAR v_phaset TEXT = + (CTXT(.v_phase) + '.CMD') RUN &v_phaset USING + .v_sdate, .v_stime, .v_edate, .v_etime FETCH cur1 INTO v_phase IND i1, + v_sdate IND i2, v_stime IND i3, + v_etime IND i4, v_edate IND i5 ENDWHILE DROP CURSOR cur1 Actually, this is only part of the report. In addition, introductory, summary, and sign-off sheets are generated. Conclusion """""""""" I used the strengths of R:BASE where I could and the speed and control of the C programming language and Microrim's PI where it was required. All of this combined produced a solution that met all my client's requirements and was completed in about a month. Now, almost two years later, the system still runs smoothly and has grown to be almost four gigabytes in size. Those boxes containing hundreds of diskettes are gone now. They've been replaced by four WORM disks. Here's an R:BASE system that doesn't store the majority of its data in R:BASE format or even on a hard disk for that matter. Future systems will undoubtedly take this concept further, storing data in many different locations and formats. System solutions are limited only by our imaginations.