See if this works. If not, I am working on that screen print. Also, don't forget to exclude 1.b) and 1.c)
Activity 6.2
Lines of Best Fit
It is often desirable to use a linear function to model a given set of data.
In this activity, you will work with several data sets, and for each, you will
look for a suitable line that approximates the given data. You will also work
with the line that is generally used as the best-fitting line, the least-squares
regression line. You will learn how to use Excel to find this line and its
equation.
For the following scatterplots a, b, and c, draw an appropriate line to fit
the data by "eye-balling" the graph and judging what line comes closest to
all points. In the space following each graph, indicate whether the slope
of the line you drew is positive, negative, or zero. For graph d, explain
why a line would not be a good fit. (Source:
The Wall Street Journal Almanac 1999.)
1.
SSAc06.indd 452 6/20/06 12:56:55 PM
Volume (billions of dollars charged)
1984 1986 1988 1990 1992 1994 1996 1998
60
50
40
30
20
10
0
Discover Card Market Share
Year
Slope (negative, positive, or 0): _______________
Number of Fatalities
1980 1985 1990 1995 2000
30,000
25,000
20,000
15,000
10,000
5,000
0
Total Fatalities in Alcohol-Related Crashes
Year
Slope (negative, positive, or 0): _______________
a.
b.
Activity 6.2: Lines of Best Fit 453
SSAc06.indd 453 6/20/06 12:56:57 PM
454 Excel Activities
Percent Change in Population
0 10 20 30 40 50 60 70
20
15
10
5
0
-5
-10
-15
Normal Minimum Jan. Temperature and Percent Change in Population
in the 1990s for Selected U.S. Cities
Normal Minimum Jan. Temperature in Degrees F
Slope (negative, positive, or 0): _______________
Postage (in cents)
1900 1920 1940 1960 1980 2000
35
30
25
20
15
10
5
0
Postage in Years in Which There Was an Increase (1919-1991)
Year
c.
d.
SSAc06.indd 454 6/20/06 12:56:58 PM
Why is a line not a good fit for graph d?
Because individual people might draw different lines, especially when the
data is scattered as it is on graph c, we need a way to construct a line that doesn't
depend on an individual's perception. The most commonly used method to construct
such a line results in the
least-squares regression line or just the
regression line. This line, among all possible lines we could draw, is the one that makes
the sum of the squares of the vertical distances of the data points from the line as
small as possible. This line is easy for a computer or calculator to find because it
involves calculations using straightforward (but kind of messy) formulas.
The regression line is used to show how a response variable changes, on average,
as an explanatory variable changes. You can use such a line to predict the
value of the response variable for a particular value of the explanatory variable.
The least-squares regression line, for the data given in #1c, is shown on
the following graph.
Percent Change in Population
0 10 20 30 40 50 60 70
20
15
10
5
0
-5
-10
-15
Normal Minimum Jan. Temperature and Percent Change in Population
in the 1990s for Selected U.S. Cities
Normal Minimum Jan. Temperature in Degrees F
2.
Activity 6.2: Lines of Best Fit 455
SSAc06.indd 455 6/20/06 12:56:58 PM
456 Excel Activities
Draw in the vertical distances from all points in the data set to the leastsquares
line.
How many data points lie above the line? ___________
How many data points lie below the line? ___________
How can you tell from a scatterplot of the data, whether the slope of the
regression line will be positive or negative?
Retrieve the data set "EA6.2.1 Verbal SAT Data.xls" from the CD or website
and create a scatterplot of "percent taking the test" and "verbal SAT score."
When creating the scatterplot, highlight only the two columns of data corresponding
to the two quantitative variables; do not select the names of the
states. Include appropriate titles, and change the scale on the vertical axis to
go from 450 to 600. (See Activity 2.1 if you need a refresher on creating scatterplots
and changing the scale.)
Then use the following instructions to find the regression line for these data.
Instructions to Use Excel to Find the
Regression Line
There are several ways in Excel to find the least-squares regression line, but
the easiest way is to point the cursor at one of the data points on the scatterplot
and then right-click on the point to select it.
Select Add trendline from the menu bar.
Click the Type tab and select Linear.
a.
b.
c.
d.
3.
4.
a.
b.
c.
SSAc06.indd 456 6/20/06 12:56:59 PM
Click the Options tab and select Automatic: Linear. Also click to place a
check mark in the box Display equation on chart. Make sure there are
no check marks in the other boxes. Then click OK. Your graph should display
the regression line and its equation. You may need to click and drag
the equation to a spot on the graph where you can read it clearly.
Write the equation of your line and indicate what the variables x and y represent.
What is the slope of the line and what does it represent? Interpret the slope in
the context of the data.
What is the y-intercept of the line and what does it represent? Interpret the
y-intercept in the context of the data.
Use the line you found to predict the average verbal SAT score for a state in
which 60 percent of students take the exam. Where does this value appear on
the graph? Mark it on a copy of the graph.
Retrieve the data set "EA6.2.2 Data Movies and Vid.xls" from the CD or website.
This file contains data collected from a sample of college students. Create
d.
5.
6.
7.
8.
9.
Activity 6.2: Lines of Best Fit 457
SSAc06.indd 457 6/20/06 12:56:59 PM
458 Excel Activities
a scatterplot of the two quantitative variables and find the regression line for
these data.
Write the equation of the line and indicate what the variables x and y represent
in the equation.
Is there a clear choice of explanatory variable and response variable in this
data set? Why or why not?
Describe what your scatterplot and line show.
There is one clearly unusual data point in the data set-the male who estimated
he saw 200 movies at a theater last year. Delete this case and look at
the scatterplot for these adjusted data. Write the equation of this adjusted
line and describe how the least-squares line changed when the outlier was
deleted.
Summary
In this activity, you practiced creating scatterplots and learned how to find
the regression line for a set of data using Excel. You interpreted the slope and
y-intercept of a regression line in the context of the data set from which it was
obtained, and used the regression line to predict values of the response variable.
You also explored how an unusual data point can affect the regression line.
a.
b.
c.
d.
SSAc06.indd 458 6/20/06 12:57:00 PM