Title: | Data for the Book "R by Example" |
---|---|
Description: | Data for the examples and exercises in the book "R by Example". Jim Albert and Maria Rizzo (2012, ISBN 978-1-4614-1365-3). |
Authors: | Maria Rizzo [aut, cre], Jim Albert [aut] |
Maintainer: | Maria Rizzo <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.101 |
Built: | 2024-11-01 11:29:16 UTC |
Source: | https://github.com/mariarizzo/rbyexample |
Batting data for all Major League players with at least 300 at-bats for the 2021 season. Data is from the Lahman database available through the Lahman package.
batting_avg_2021
batting_avg_2021
231 obs. of 5 variables:
Name of player
League
Hits
At bats
Batting average
Major League Baseball data on batting; number of hits, doubles, home runs by season. The data was extracted from baseball-reference.com website.
battinghistory
battinghistory
140 obs. of 27 variables:
season
number of teams
number of players
batter's average age
runs scored
games played
plate appearances
at-bats
hits
doubles
triples
home runs
runs batted in
stolen bases
number caught stealing
walks
strikeouts
batting average
on-base percentage
slugging percentage
OBP plus SLG
total bases
ground into double plays
hit by pitches
sacrifice hits
sacrifice flies
intentional walks
This version of the data is sorted in ascending order of Year. There are missing values, especially in early years.
baseball-reference.com.
Description: Game averages for NCAA basketball
bball
bball
43 obs. of 20 variables:
season
number of teams
average number of games played
average number of field goals
average number of field goal attempts
FG%
field goal percentage
average number of three pointers
average number of three point attempts
three-point percenrage
average number of free throws
average number of free throw attempts
FT%
free-throw percentage
average number of total rebounds
average number of assists
average number of steals
average number of blocks
average number of turnovers
average number of personal fous
average number of points scored
Year season started
factor: "M" or "W" (men or women)
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Sports Reference
Description: Game averages for NCAA men basketball
bball.men
bball.men
77 obs. of 20 variables:
season
number of teams
average number of games played
average number of field goals
average number of field goal attempts
FG%
field goal percentage
average number of three pointers
average number of three point attempts
three-point percenrage
average number of free throws
average number of free throw attempts
FT%
free-throw percentage
average number of total rebounds
average number of assists
average number of steals
average number of blocks
average number of turnovers
average number of personal fouls
average number of points scored
Year season started
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Sports Reference
Description: Game averages for NCAA women basketball
bball.women
bball.women
43 obs. of 20 variables:
season
number of teams
average number of games played
average number of field goals
average number of field goal attempts
FG%
field goal percentage
average number of three pointers
average number of three point attempts
three-point percenrage
average number of free throws
average number of free throw attempts
FT%
free-throw percentage
average number of total rebounds
average number of assists
average number of steals
average number of blocks
average number of turnovers
average number of personal fous
average number of points scored
Year season started
The data is from Sports Reference https://www.sports-reference.com/cbb/seasons/game-averages.html
Sports Reference
BGSU Enrollment
bgsu
bgsu
Data frame of selected BGSU enrollment data: 16 obs. of 2 variables
Year.
Enrollment.
J. Albert
Data from a study comparing brain size and intelligence.
brainsize
brainsize
40 obs. of 7 variables:
Male or Female.
Full Scale IQ scores based on four Wechsler (1981) subtests.
Verbal IQ scores based on four Wechsler (1981) subtests.
Performance IQ scores based on four Wechsler (1981) subtests.
Body weight in pounds.
Height in inches.
total pixel count from the 18 MRI scans.
There are missing values in Weight (2) and Height (1).
Willerman et al (1991).
College Rating Data
college
college
260 obs. of 11 variables:
Name of Institution.
Enrollment of Institution.
Ranking in tiers 1, 2, 3, 4.
Pct. of freshmen who return the following year
Pct. of freshmen who graduate in six years
Pct. of classes with 20 or fewer students
Pct. of classes with 50 or fewer students
Pct. of faculty hired full-time
Pct. of incoming students who were in top 10% of high school class
Acceptance rate of students who apply
Pct. of alumni who contribute financially
There are missing values.
US News and World Report "America's Best Colleges" 2009 report, National Universities.
Maximum Intel CPU speed vs time from 1994 through 2004.
CPUspeed
CPUspeed
27 obs. of 6 variables:
calendar year
month
day
time in years
Max IA-32 Speed (GHz)
logarithm base 10 of speed
Number of crimes per 100,000 population, as of 1970 for 16 large cities in the US. Table 1.1 in Chapter 1 of Hartigan (1975). All variables are numeric except city, which is character type.
crime.bigcity
crime.bigcity
16 obs. of 8 variables:
name of city (character)
murder rate
rape rate
robbery rate
assault rate
burglary rate
larceny rate
auto crime rate
United Sates Statistical Abstracts (1970). https://people.sc.fsu.edu/~jburkardt/datasets/hartigan/file03.txt
Hartigan, J. A. (1975). Clustering Algorithms, John Wiley, New York.
Data from the 1970 military draft lottery. The lottery assigned numbers to potential draftees by their birth date. Those with lower draft numbers were drafted first.
draftlottery
draftlottery
31 obs. of 13 variables
Day of month.
Draft numbers for January birthdays by day of month.
Draft numbers for February birthdays by day of month.
Draft numbers for March birthdays by day of month.
Draft numbers for April birthdays by day of month.
Draft numbers for May birthdays by day of month.
Draft numbers for June birthdays by day of month.
Draft numbers for July birthdays by day of month.
Draft numbers for August birthdays by day of month.
Draft numbers for September] birthdays by day of month.
Draft numbers for October birthdays by day of month.
Draft numbers for November birthdays by day of month.
Draft numbers for December birthdays by day of month.
This is the data in "draft-lottery.txt".
Moore, David S. and George P. McCabe (1989). Introduction to the Practice of Statistics.
See Fienberg, S. E. (1971), Starr, N. (1997), and "Draft Lottery (1969)", Wikipedia.org for further discussion.
This data provides measurements of ancient Etruscan skulls and modern Italian skulls.
EtruscanItalian
EtruscanItalian
154 obs. of 2 variables:
skull measurement
character: Etruscan or Italian
Critical flicker frequency and iris color of the eye for 19 individuals.
flicker
flicker
19 obs. of 2 variables:
Eye colour: Brown, Green, or Blue
Critical flicker frequency in cycles/sec.
Critical flicker frequency is the highest frequency at which the flicker in a flickering light source can be detected by the individual.
http://www.statsci.org/data/general/flicker.txt
https://gksmyth.github.io/ozdasl/general/flicker.html
Smyth, Gordon K (2011). Australasian Data and Story Library (OzDASL). https://gksmyth.github.io/ozdasl.
Grouped hit and home run data over regions over the zone for four players over the 2018-2023 baseball seasons. From Baseball Savant https://baseballsavant.mlb.com/
four_players
four_players
64 obs. of 12 variables:
interval of values of plate_x
interval of values of plate_z
count of balls in play
count of hits
count of home runs
hit rate
home run rate
z-score of hit rate
z-score of home run rate
chr: Player name
midpoint of PX interval
midpoint of PZ interval
Distances and velocities measured for 24 galaxies containing Cepheid stars to measure the Hubble constant.
hubble
hubble
24 obs. of 3 variables:
A label to identify the galaxy (a factor)
Relative velocity in kilometers per second
Distance in Mega parsecs
Freedman et al. 2001. The Astrophysical Journal 553:47-72: Tables 4 and 5.
Freedman et al. (2001) Final results from the Hubble space telescope key project to measure the Hubble constant. The Astrophysical Journal (553), 47-72. Wood, S.N. (2017) Generalized Additive Models: An Introduction with R. CRC
Data from an 1854 survey by the Massachusetts Commission on Lunacy.
lunatics
lunatics
14 obs. of 6 variables:
Name of county.
Number of lunatics by county.
Distance to nearest mental health center.
County population 1950 (thousands).
County population density per square mile.
Percent of lunatics cared for at home.
J.M. Hunter, "Need and Demand for Mental Health Care: Massachusetts 1854," The Geographic Review, 77:2 (April 1987), pp 139-156.
Gender, age, and completion time (in minutes) for 276 people who completed the 2010 New York City Marathon.
nyc.marathon
nyc.marathon
276 obs. of 3 variables:
female or male
Time of runner in minutes
Age of runner
Survival times of cancer patients with advanced cancer of the stomach, bronchus, colon, ovary or breast, whose treatment included supplemental ascorbate.
PATIENT
PATIENT
17 obs. of 5 variables:
survival times for stomach cancer patients
survival times for bronchus cancer patients
survival times for colon cancer patients
survival times for ovary cancer patients
survival times for breast cancer patients
See the text for details on how to input this data directly from the file PATIENT.DAT.
This is the data from "PATIENT.DAT" with column headings added. As input, the data is in wide format and should be stacked (long format) for a one-way ANOVA. See the text for details.
Hand et al. (1994).
Cameron and Pauling (1978).
The peanuts data records levels of a toxin (aflatoxin) in batches of peanuts.
peanuts
peanuts
34 obs. of 2 variables:
percentage of non-contaminated peanuts in the batch
average level of aflatoxin in parts per billion
Hand et al. (1994)
Survival times in units of 10 hours for animals exposed to different poisons.
poison
poison
48 obs. of 3 variables:
survival time in units of 10 hours
poison: I, II, III
treatment: A, B, C, D
Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978), Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, Wiley, New York.
Times required to round first base for 22 baseball players using three styles: rounding out, a narrow angle and a wide angle. The goal is to determine if the method of rounding first base has a significant effect on times to round first base.
rounding
rounding
66 obs. of 3 variables:
time
factor with 3 levels: NarrowAngle, RoundOut, WideAngle
player ID (integer)
Hollander and Wolfe (1999) Table 7.1, page 274.
Measurements of bulk resistivity of silicon wafers made at NIST with 5 probing instruments on each of 5 days.
SiRstv
SiRstv
25 obs. of 2 variables:
replicate
resistance
https://www.itl.nist.gov/div898/strd/anova/SiRstv_info.html
https://www.itl.nist.gov/div898/strd/anova/SiRstv.html
NIST Standard Reference Datasets: https://www.itl.nist.gov/div898/strd/index.html
Total snowfall in inches for the cities Buffalo and Cleveland for the seasons 1968-69 through 2008-09.
snowfall
snowfall
41 obs. of 3 variables:
character: winter season identified by years
Cleveland snowfall
Buffalo snowfall
Grades from an undergraduate statistics class at BGSU.
statgrades
statgrades
23 obs. of 7 variables:
Student ID; integer 1:23
Percent grade on Exam 1
Percent grade on Exam 2
Percent grade on homework
Percent grade on Final Exam
Major coded 1, 2, 3
Group coded 1, 2
Twins IQ Data
twinIQ
twinIQ
Data frame of Burt's IQ data for twins: 27 obs. of 3 variables
IQ of twin raised with foster parents.
IQ of twin raised with biological parents.
Social class of biological parents (high, low, middle)
Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotic twins reared together and apart. Br. J. Psych., 57, 147-153. Data is provided in R packages faraway and UsingR.
The data were collected at the 16th Annual Twins Day Festival in Twinsburg, Ohio, in August 1991. 495 adult twins were interviewed. The original study aimed to investigate 'By how much will another year of schooling most likely raise one's income?' Pairs of twins provide a control on confounding factors such as intelligence, family background, etc.
twins
twins
183 obs. of 16 variables:
the difference (twin 1 minus twin 2) in the logarithm of hourly wage, given in dollars.
the difference (twin 1 minus twin 2) in self-reported education, given in years.
Age in years of twin 1.
AGE squared.
Hourly wage of twin 2.
1 if twin 2 is white, 0 otherwise.
1 if twin 2 is male, 0 otherwise.
Self-reported education (in years) of twin 2.
Hourly wage of twin 1.
1 if twin 1 is white, 0 otherwise.
1 if twin 1 is male, 0 otherwise.
Self-reported education (in years) of twin 1.
the difference (twin 1 minus twin 2) in cross-reported education.
the difference (twin 1 minus twin 2) in tenure, or number of years at current job.
the difference (twin 1 minus twin 2) in marital status, where 1 signifies "married" and 0 signifies "unmarried".
the difference (twin 1 minus twin 2) in union coverage, where 1 signifies "covered" and 0 "uncovered".
There are 183 cases; 147 complete cases. Twin 1's cross-reported education is the number of years of schooling completed by twin 1 as reported by twin 2. For data analysis, the logarithm of the hourly wage is typically used instead of hourly wage.
Guido Imbens, PhD. UCLA, Department of Economics.
Ashenfelter, Orley and Krueger, Alan. "Estimates of the Economic Return to Schooling from a New Sample of Twins." The American Economic Review 84.5 (Dec. 1994) 1157-1173.
Chase Utley's Hitting Data for 2006
utley2006
utley2006
160 obs. of 6 variables:
game
date
plate appearances
at-bats
home runs
hits
During the 2006 baseball season, Chase Utley of the Philadelphia Phillies had a hitting streak of 35 games, which is one of the best hitting streaks in baseball history.
J. Albert
The 'Waste Run-up' data (Koopmans 1987, p. 86) reports weekly percentage waste of cloth by five different supplier plants of Levi-Strauss, relative to cutting from a computer pattern.
wasterunup
wasterunup
22 obs. of 5 variables:
weekly percentage waste of cloth for Plant 1
weekly percentage waste of cloth for Plant 2
weekly percentage waste of cloth for Plant 3
weekly percentage waste of cloth for Plant 4
weekly percentage waste of cloth for Plant 5
There are missing values.
The number of daily visits to the author's website was obtained using Google Analytics. The data is summarized by week.
webhits
webhits
35 obs. of 2 variables:
Week number
Number of web hits
J. Albert
Mile run world record progression as recorded by the International Amateur Athletics Federation (IAAF). The dataset includes 32 world records for men ratified by the IAAF, and 29 world records for women both in the pre-IAAF and IAAF eras.
world.record.mile
world.record.mile
276 obs. of 3 variables:
chr: female or male
chr: time as "mm:ss"
num: The whole minutes "mm" part of Time
num: The seconds "ss" part of Time
num: time expressed in seconds
chr: Name
chr: nationality
chr: date
num: year
Wikipedia page https://en.wikipedia.org/wiki/Mile_run_world_record_progression