DOCUMENT #690 ======================================================================= SPORTS STATISTICS ======================================================================= PRODUCT: R:BASE VERSION : 3.1B or Higher ======================================================================= CATALOG: Programming In R:BASE AREA : Statistical Analyst ======================================================================= R:BASE quickly and easily stores data about team or individual sports and computes statistics. Use forms to enter the data and then views and reports to display statistical information. Following are examples for both team (baseball) and individual (bicycling) sports. Baseball ======== A common team statistic is won/lost percentages. With team sports, you also want to compare players. Some of the individual player statistics are batting average and on-base percentage. All of these can be quickly calculated using views. Formulas to compute these statistics are as follows: won/lost percentage = (total games won) --------------- (total games played) batting average = (total hits) ---------- (total at bats) on-base percentage = (total hits + total bases on balls) --------------------------------- (total at bats + total bases on balls) The word total in the above formulas indicates where to do a sum. The rest is simple addition and division. Create two tables, one to hold team data, one to hold the individual player data. You'll have other tables to hold more information about the players and teams. For example, league, division, player's nickname, age, right or left handed, position etc. The team_stats table is loaded with one row for each team per game. The Won and Lost columns are auto- matically computed from the score. The player_stats table is loaded with one row of information per player per game. Table: team_stats Column definitions # Name Type Index Expression 1 Team_name TEXT 20 2 Team_score INTEGER 3 Opponent TEXT 20 4 Opp_score INTEGER 5 Won INTEGER (IFGT(team_score,opp_score,1,0)) 6 Lost INTEGER (IFLT(team_score,opp_score,1,0)) 8 Game_date DATE Sample data from team_stats Team_name Team_score Opponent Opp_score Won Lost Game_Date ---------------- ---------- ---------------- --------- --- ---- --------- Oakland A's 12 NY Yankees 7 1 0 04/12/1993 NY Yankees 7 Oakland A's 12 0 1 04/12/1993 Oakland A's 2 Seattle Mariners 3 0 1 04/20/1993 Seattle Mariners 3 Oakland A's 2 1 0 04/20/1993 NY Yankees 2 Seattle Mariners 9 0 1 04/21/1993 Seattle Mariners 9 NY Yankees 2 1 0 04/21/1993 Texas Rangers 0 Seattle Mariners 1 0 1 05/01/1993 Seattle Mariners 1 Texas Rangers 0 1 0 05/01/1993 NY Yankees 2 Texas Rangers 3 0 1 04/15/1993 Texas Rangers 3 NY Yankees 2 1 0 04/15/1993 Texas Rangers 2 Oakland A's 0 1 0 04/17/1993 Oakland A's 0 Texas Rangers 2 0 1 04/17/1993 Table: player_stats Column definitions # Name Type Index Expression 1 Player_Name TEXT 20 2 AtBats INTEGER 3 Hits INTEGER 4 HomeRuns INTEGER 5 Triples INTEGER 6 Doubles INTEGER 7 Singles INTEGER 8 BaseOnBalls INTEGER 9 Gamedate DATE Sample data from player_stats Player_Name AtBats Hits HomeRuns Triples Doubles Singles BaseOnBalls Gamedt -------------- ------ ---- -------- ------- ------- ------- ----------- ------ Ken Griffey Jr. 5 3 0 1 1 1 1 04/15/93 Mickey Mantle 3 1 0 0 0 1 0 04/15/93 Bobby Bonds 5 4 2 0 0 2 1 04/15/93 Willie Mays 3 3 0 1 2 0 2 04/17/93 Henry Aaron 5 4 2 1 1 0 1 04/17/93 Joe DiMaggio 4 3 0 0 0 3 2 04/17/93 Ken Griffey Jr. 4 3 2 1 0 0 2 04/21/93 Mickey Mantle 2 0 0 0 0 0 1 04/21/93 Bobby Bonds 3 1 0 0 0 1 0 04/21/93 Willie Mays 3 1 0 0 0 1 1 05/01/93 Henry Aaron 4 2 0 1 0 1 0 05/01/93 Joe DiMaggio 3 3 0 1 0 2 1 05/01/93 The statistics are calculated using functions available with the SELECT command and stored as views. The FORMAT function is used to format output for display, but can't have computations as arguments. So two views are used; one to do the computations, one to format the output. Use the following two views to look at the team statistics. After creating the views, SELECT * FROM teams_2 displays the team standings. CREATE VIEW teams_1 + (team,won,lost,pct) AS SELECT + team_name,SUM(won),SUM(lost), + (SUM(won)/(COUNT(*))) FROM team_stats + GROUP BY team_name ORDER BY 4 DESC CREATE VIEW teams_2 + (team,won,lost,pct) AS SELECT + team,won,lost,(FORMAT(pct,'.900')) + FROM teams_1 team won lost pct ---------------- --- ---- ----- Seattle Mariners 3 0 1.000 Texas Rangers 2 1 0.666 Oakland A's 1 2 0.333 NY Yankees 0 3 0.000 Use the next two views to look at the player statistics. After creating the views, SELECT or BROWSE from player_2 to display data about all the players. CREATE VIEW player_1 + (PlayerName, BatAvg, OnBase, AtBats, + HR, Triples,Doubles, Singles, Walks) + AS SELECT + player_name,(SUM(hits)/SUM(at_bats)), + ((SUM(hits)+SUM(base_on_balls))/ + (SUM(at_bats)+SUM(base_on_balls))), + SUM (at_bats),SUM (home_runs), + SUM (triples),SUM (doubles), + SUM(singles),SUM (base_on_balls) + FROM player_stats GROUP BY player_name CREATE VIEW player_2 + (PlayerName, BatAvg, OnBase, AtBats, + HR, Triples,Doubles, Singles, Walks) + AS SELECT + PlayerName,(FORMAT(BatAvg,'.900')),+ (FORMAT(OnBase,'.900')), AtBats,HR, + Triples,Doubles,Singles,Walks + FROM player_1 ORDER BY 2 DESC PlayerName BatAvg OnBase AtBats HR Triples Doubles Singles Walks --------------- ------ ------ ------ -- ------- ------- ------- ----- Willie Mays .727 .800 11 1 2 3 2 4 Joe DiMaggio .692 .764 13 0 1 2 6 4 Bobby Bonds .666 .700 18 3 0 2 7 2 Ken Griffey Jr. .666 .736 15 2 3 3 2 4 Henry Aaron .666 .687 15 3 3 2 2 1 Mickey Mantle .363 .500 11 0 0 2 2 3 The view Player_2 automatically sorts by batting average. The player with the highest batting average is listed first. To see the players ordered by on base percentage use the following command: SELECT PlayerName,BatAvg,OnBase FROM player_2 ORDER BY OnBase DESC PlayerName BatAvg OnBase --------------- ------ ------ Willie Mays .727 .800 Joe DiMaggio .692 .764 Ken Griffey Jr. .666 .736 Bobby Bonds .666 .700 Henry Aaron .666 .687 Mickey Mantle .363 .500 Bicycling --------- Statistics for individual sports are different from team sports. Instead of comparing player to player, the comparison is yourself against the clock, and how well did you do this week compared to last week. By tracking the distance traveled and the time it took, R:BASE reports can compute your speed. The reports are printed for a specified time period to track progress and performance. Reports are used instead of views because a sequence of expressions are needed to do the computations. The time must be converted to seconds and then hours in order to calculate the speed (miles per hour). This is easier to do in a report than in a SELECT command in a view. A report also lets you do multiple levels of totals. Create a table to hold information about each outing. Record the distance traveled, the time it took and the date. Add columns to hold other relevant information as desired, for example, route, weather etc. Table: CYCLE_LOG Column definitions # Name Type Index Expression 1 Ride_Date DATE * 2 Mileage INTEGER 2 Ride_Time TIME 4 Route TEXT 20 5 Ride_Notes NOTE Create a report with the following layout and expressions: RH PH PH DATE ROUTE MILEAGE TIME SPEED PH ------- --------------------- --------- ------ ------- D ride_date route mileage ride_time vspeed MpH PF RF TOTALS: vmileage vavg_speed Mph 1: INTEGER vsec = (RIDE_TIME - 00:00:00) 2: DOUBLE vhours = (.vsec / 3600) 3: DOUBLE vspeed = (MILEAGE / .vhours) 4: INTEGER vmileage = SUM OF MILEAGE 5: INTEGER vtotal_sec = (.vtotal_sec + .vsec) 6: DOUBLE vtotal_hours = (.vtotal_sec / 3600) 7: DOUBLE vavg_speed = (.vmileage / .vtotal_hours) The variable vsec converts the time to seconds, then the seconds are converted to hours (vhours) to compute the speed in miles per hour (vspeed). Variables vmileage, vtotal_sec, vtotal_hours are used to compute the average speed, vavg_speed, for all rides. Output from the report looks like this: DATE ROUTE MILEAGE TIME SPEED -------- ---------------------- ------- ------- -------- 01/11/92 Vashon Island 25 1:22:00 18.2 MpH 03/01/92 Bainbridge Island 26 1:31:00 17.1 MpH 03/29/92 Vashon Island 25 1:17:22 19.3 MpH 06/06/92 Whidbey Island 126 6:55:00 18.2 MpH 06/20/92 Vashon Island 25 1:14:30 20.1 MpH 06/27/92 STP 195 11:45:00 16.5 MpH 07/03/92 Southworth 60 3:06:00 9.3 MpH 07/15/92 Vashon Island 25 1:13:42 20.3 MpH 07/23/92 Vashon Island 25 1:15:26 19.8 MpH 07/30/92 RAMROD 154 9:23:00 16.4 MpH 08/02/92 Vashon Island 25 1:16:50 19.5 MpH 08/25/92 Vashon Island 25 1:08:47 21.8 MpH 09/13/92 Vashon Island 25 1:15:23 19.8 MpH 10/03/92 West Seattle/I-90 30 3:00:00 10.0 MpH 10/04/92 West Seattle/I-90 15 1:30:00 10.0 MpH TOTALS: 806 17.0 Mph To do a summary report by month, add an expression to find the month, vmonth = (TMON(ride_date)), and break on the variable. Change the Detail line to a Break Footer and reset the accumulator variables, vmileage and vtotal_sec, at the break. Add a second group of expressions to do report totals. The report generates summary data to track monthly performance. Expressions for summary report: 1: INTEGER vsec = (RIDE_TIME - 00:00:00) 2: INTEGER vtotal_sec = (.vtotal_sec + .vsec) 3: DOUBLE vhours = (.vtotal_sec / 3600) 4: INTEGER vmileage = SUM OF MILEAGE 5: DOUBLE vspeed = (.vmileage / .vhours) 6: TEXT vmonth = (TMON(ride_date)) 7: INTEGER vtotal_miles = SUM OF MILEAGE 8: INTEGER vrtotal_sec = (.vrtotal_sec + .vsec) 9: DOUBLE vtotal_hours = (.vrtotal_sec / 3600) 10: DOUBLE vavg_speed = (.vtotal_miles/.vtotal_hours) When printing the monthly summary report, order by the date so it will break correctly on month. R>PRINT mthcycle ORDER BY Ride_Date MONTH MILEAGE SPEED --------- --------- ------- January 25 18.2 MpH March 51 18.1 MpH June 346 17.3 MpH July 264 17.6 MpH August 50 20.6 MpH September 25 19.8 MpH October 45 10.0 MpH Totals: 806 17.0 Mph This structure can easily be adapted to other individual sports such as running, swimming, kyaking etc.