The Olympic gold medal-winning height for the women's high jump, \(\textit{Wgold}\), is often lower than the best height achieved in other international women's high jump competitions in that same year. The table below lists the Olympic year, \(\textit{year}\), the gold medal-winning height, \(\textit{Wgold}\), in metres, and the best height achieved in all international women's high jump competitions in that same year, \(\textit{Wbest}\), in metres, for each Olympic year from 1972 to 2020. A scatterplot of \(\textit{Wbest}\) versus \(\textit{Wgold}\) for this data is also provided. Wgold Wbest When a least squares line is fitted to the scatterplot, the equation is found to be: \(Wbest =0.300+0.860 \times Wgold\) The correlation coefficient is 0.9318 --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- \begin{array}{|l|l|} --- 3 WORK AREA LINES (style=lined) --- --- 4 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 3 WORK AREA LINES (style=lined) --- --- 2 WORK AREA LINES (style=lined) ---
year
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
(m)1.92
1.93
1.97
2.02
2.03
2.02
2.05
2.01
2.06
2.05
2.05
1.97
2.04
(m)1.94
1.96
1.98
2.07
2.07
2.05
2.05
2.02
2.06
2.06
2.05
2.01
2.05
\hline
\rule{0pt}{2.5ex}\text { strength } \rule[-1ex]{0pt}{0pt} & \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\
\hline
\rule{0pt}{2.5ex}\text { direction } \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array}
Data Analysis, GEN1 2023 VCAA 7-8 MC
A teacher analysed the class marks of 15 students who sat two tests.
The test 1 mark and test 2 mark, all whole number values, are shown in the scatterplot below.
A least squares line has been fitted to the scatterplot.
Question 7
The equation of the least squares line is closest to
- test 2 mark = – 6.83 + 1.55 × test 1 mark
- test 2 mark = 15.05 + 0.645 × test 1 mark
- test 2 mark = – 6.78 + 0.645 × test 1 mark
- test 2 mark = 1.36 + 1.55 × test 1 mark
- test 2 mark = 6.83 + 1.55 × test 1 mark
Question 8
The least squares line shows the predicted test 2 mark for each student based on their test 1 mark.
The number of students whose actual test 2 mark was within two marks of that predicted by the line is
- 3
- 4
- 5
- 6
- 7
CORE, FUR2 2021 VCAA 3
The time series plot below shows the winning time, in seconds, for the women's 100 m freestyle swim plotted against year, for each year that the Olympic Games were held during the period 1956 to 2016.
A least squares line has been fitted to the plot to model the decreasing trend in the winning time over this period.
The equation of the least squares line is
winning time = 357.1 – 0.1515 × year
The coefficient of determination is 0.8794
- Name the explanatory variable in this time series plot. (1 mark)
- Determine the value of the correlation coefficient (`r`).
- Round your answer to three decimal places. (1 mark)
- Write down the average decrease in winning time, in seconds per year, during the period 1956 to 2016. (1 mark)
- The predicted winning time for the women's 100 m freestyle in 2000 was 54.10 seconds.
- The actual winning time for the women's 100 m freestyle in 2000 was 53.83 seconds.
- Determine the residual value in seconds. (1 mark)
- The following equation can be used to predict the winning time for the women's 100 m freestyle in the future.
- winning time = 357.1 – 0.1515 × year
- i. Show that the predicted winning time for the women's 100 m freestyle in 2032 is 49.252 seconds. (1 mark)
- ii. What assumption is being made when this equation is used to predict the winning time for the women's 100 m freestyle in 2032? (1 mark)
CORE, FUR1 2021 VCAA 11 MC
The table below shows the weight, in kilograms, and the height, in centimetres, of 10 adults.
A least squares line is fitted to the data
The least squares line enables an adult's weight to be predicted from their height.
The number of times that the predicted value of an adult's weight is greater than the actual value of their weight is
- 3
- 4
- 5
- 6
- 7
CORE, FUR2 2020 VCAA 6
The table below shows the mean age, in years, and the mean height, in centimetres, of 648 women from seven different age groups.
- What was the difference, in centimetres, between the mean height of the women in their twenties and the mean height of the women in their eighties? (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
A scatterplot displaying this data shows an association between the mean height and the mean age of these women. In an initial analysis of the data, a line is fitted to the data by eye, as shown.
- Describe this association in terms of strength and direction. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- The line on the scatterplot passes through the points (20,168) and (85,157).
Using these two points, determine the equation of this line. Write the values of the intercept and the slope in the appropriate boxes below.
Round your answers to three significant figures. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
mean height = |
|
+ |
|
× mean age |
- In a further analysis of the data, a least squares line was fitted.
The associated residual plot that was generated is shown below.
The residual plot indicates that the association between the mean height and the mean age of women is non-linear.
The data presented in the table in part a is repeated below. It can be linearised by applying an appropriate transformation to the variable mean age.
Apply an appropriate transformation to the variable mean age to linearise the data. Fit a least squares line to the transformed data and write its equation below.
Round the values of the intercept and the slope to four significant figures. (2 marks)
--- 5 WORK AREA LINES (style=lined) ---
CORE, FUR2 2020 VCAA 5
The scatterplot below shows body density, in kilograms per litre, plotted against waist measurement, in centimetres, for 250 men.
When a least squares line is fitted to the scatterplot, the equation of this line is
body density = 1.195 – 0.001512 × waist measurement
- Draw the graph of this least squares line on the scatterplot above. (1 mark)
(Answer on the scatterplot above.)
- Use the equation of this least squares line to predict the body density of a man whose waist measurement is 65 cm.
Round your answer to two decimal places. (1 mark)
- When using the equation of this least squares line to make the prediction in part b., are you extrapolating or interpolating? (1 mark)
- Interpret the slope of this least squares line in terms of a man’s body density and waist measurement. (1 mark)
- In this study, the body density of the man with a waist measurement of 122 cm was 0.995 kg/litre.
Show that, when this least squares line is fitted to the scatterplot, the residual, rounded to two decimal places, is –0.02 (1 mark)
- The coefficient of determination for this data is 0.6783
Write down the value of the correlation coefficient `r`.
Round your answer to three decimal places. (1 mark)
- The residual plot associated with fitting a least squares line to this data is shown below.
Does this residual plot support the assumption of linearity that was made when fitting this line to this data? Briefly explain your answer. (1 mark)
CORE, FUR2-NHT 2019 VCAA 4
The scatterplot below plots the variable life span, in years, against the variable sleep time, in hours, for a sample of 19 types of mammals.
On the assumption that the association between sleep time and life span is linear, a least squares line is fitted to this data with sleep time as the explanatory variable.
The equation of this least squares line is
life span = 42.1 – 1.90 × sleep time
The coefficient of determination is 0.416
- Draw the graph of the least squares line on the scatterplot above. (1 mark)
- Describe the linear association between life span and sleep time in terms of strength and direction. (2 marks)
- Interpret the slope of the least squares line in terms of life span and sleep time. (2 marks)
- Interpret the coefficient of determination in terms of life span and sleep time. (1 mark)
- The life of the mammal with a sleep time of 12 hours is 39.2 years.
Show that, when the least squares line is used to predict the life span of this mammal, the residual is 19.9 years. (2 marks)
CORE, FUR2 2019 VCAA 5
The scatterplot below shows the atmospheric pressure, in hectopascals (hPa), at 3 pm (pressure 3 pm) plotted against the atmospheric pressure, in hectopascals, at 9 am (pressure 9 am) for 23 days in November 2017 at a particular weather station.
A least squares line has been fitted to the scatterplot as shown.
The equation of this line is
pressure 3 pm = 111.4 + 0.8894 × pressure 9 am
- Interpret the slope of this least squares line in terms of the atmospheric pressure at this weather station at 9 am and at 3 pm. (1 mark)
- Use the equation of the least squares line to predict the atmospheric pressure at 3 pm when the atmospheric pressure at 9 am is 1025 hPa.
Round your answer to the nearest whole number. (1 mark)
- Is the prediction made in part b. an example of extrapolation or interpolation? (1 mark)
- Determine the residual when the atmospheric pressure at 9 am is 1013 hPa.
Round your answer to the nearest whole number. (1 mark)
- The mean and the standard deviation of pressure 9 am and pressure 3 pm for these 23 days are shown in Table 4 below.
-
- Use the equation of the least squares line and the information in Table 4 to show that the correlation coefficient for this data, rounded to three decimal places, is `r` = 0.966 (1 mark)
- What percentage of the variation in pressure 3 pm is explained by the variation in pressure 9 am?
Round your answer to one decimal place. (1 mark)
- The residual plot associated with the least squares line is shown below.
-
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
What is this assumption? (1 mark)
- The residual plot above does not support this assumption.
Explain why. (1 mark)
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
CORE, FUR2 2018 VCAA 2
The congestion level in a city can be recorded as the percentage increase in travel time due to traffic congestion in peak periods (compared to non-peak periods).
This is called the percentage congestion level.
The percentage congestion levels for the morning and evening peak periods for 19 large cities are plotted on the scatterplot below.
- Determine the median percentage congestion level for the morning peak period and the evening peak period.
Write your answers in the appropriate boxes provided below. (2 marks)
Median percentage congestion level for morning peak period |
%
|
Median percentage congestion level for evening peak period |
%
|
A least squares line is to be fitted to the data with the aim of predicting evening congestion level from morning congestion level.
The equation of this line is.
evening congestion level = 8.48 + 0.922 × morning congestion level
- Name the response variable in this equation. (1 mark)
- Use the equation of the least squares line to predict the evening congestion level when the morning congestion level is 60%. (1 mark)
- Determine the residual value when the equation of the least squares line is used to predict the evening congestion level when the morning congestion level is 47%.
Round your answer to one decimal place? (2 marks)
- The value of the correlation coefficient `r` is 0.92
What percentage of the variation in the evening congestion level can be explained by the variation in the morning congestion level?
Round your answer to the nearest whole number. (1 mark)
CORE, FUR1 2018 VCAA 14 MC
A least squares line is fitted to a set of bivariate data.
Another least squares line is fitted with response and explanatory variables reversed.
Which one of the following statistics will not change in value?
- the residual values
- the predicted values
- the correlation coefficient `r`
- the slope of the least squares line
- the intercept of the least squares line
CORE, FUR1 2018 VCAA 7-9 MC
The scatterplot below displays the resting pulse rate, in beats per minute, and the time spent exercising, in hours per week, of 16 students. A least squares line has been fitted to the data.
Part 1
Using this least squares line to model the association between resting pulse rate and time spent exercising, the residual for the student who spent four hours per week exercising is closest to
- –2.0 beats per minute.
- –1.0 beats per minute.
- –0.3 beats per minute.
- 1.0 beats per minute.
- 2.0 beats per minute.
Part 2
The equation of this least squares line is closest to
- resting pulse rate = 67.2 – 0.91 × time spent exercising
- resting pulse rate = 67.2 – 1.10 × time spent exercising
- resting pulse rate = 68.3 – 0.91 × time spent exercising
- resting pulse rate = 68.3 – 1.10 × time spent exercising
- resting pulse rate = 67.2 + 1.10 × time spent exercising
Part 3
The coefficient of determination is 0.8339
The correlation coefficient `r` is closest to
- –0.913
- –0.834
- –0.695
- 0.834
- 0.913
CORE, FUR2 2017 VCAA 3
The number of male moths caught in a trap set in a forest and the egg density (eggs per square metre) in the forest are shown in the table below.
- Determine the equation of the least squares line that can be used to predict the egg density in the forest from the number of male moths caught in the trap.
Write the values of the intercept and slope of this least squares line in the appropriate boxes provided below.
Round your answers to one decimal place. (2 marks)
- The number of female moths caught in a trap set in a forest and the egg density (eggs per square metre) in the forest can also be examined.
A scatterplot of the data is shown below.
The equation of the least squares line isegg density = 191 + 31.3 × number of female moths
- Draw the graph of this least squares line on the scatterplot (provided above). (1 mark)
- Interpret the slope of the regression line in terms of the variables egg density and number of female moths caught in the trap. (1 mark)
- The egg density is 1500 when the number of female moths caught is 55.
Determine the residual value if the least squares line is used to predict the egg density for this number of female moths. (1 mark)
- The correlation coefficient is `r = 0.862`
Determine the percentage of the variation in egg density in the forest explained by the variation in the number of female moths caught in the trap.
Round your answer to one decimal place. (1 mark)
CORE, FUR1 2017 VCAA 8-10 MC
The scatterplot below shows the wrist circumference and ankle circumference, both in centimetres, of 13 people. A least squares line has been fitted to the scatterplot with ankle circumference as the explanatory variable.
Part 1
The equation of the least squares line is closest to
- ankle = 10.2 + 0.342 × wrist
- wrist = 10.2 + 0.342 × ankle
- ankle = 17.4 + 0.342 × wrist
- wrist = 17.4 + 0.342 × ankle
- wrist = 17.4 + 0.731 × ankle
Part 2
When the least squares line on the scatterplot is used to predict the wrist circumference of the person with an ankle circumference of 24 cm, the residual will be closest to
- `–0.7`
- `–0.4`
- `–0.1`
- `0.4`
- `0.7`
Part 3
The residuals for this least squares line have a mean of 0.02 cm and a standard deviation of 0.4 cm.
The value of the residual for one of the data points is found to be – 0.3 cm.
The standardised value of this residual is
- `–0.8`
- `–0.7`
- `–0.3`
- `0.7`
- `0.8`
CORE, FUR2 2016 VCAA 3
The data in the table below shows a sample of actual temperatures and apparent temperatures recorded at a weather station. A scatterplot of the data is also shown.
The data will be used to investigate the association between the variables apparent temperature and actual temperature.
- Use the scatterplot to describe the association between apparent temperature and actual temperature in terms of strength, direction and form. (1 mark)
-
- Determine the equation of the least squares line that can be used to predict the apparent temperature from the actual temperature.
Write the values of the intercept and slope of this least squares line in the appropriate boxes provided below.
Round your answers to two significant figures. (3 marks)
apparent temperature `=` `+` `xx` actual temperature - Interpret the intercept of the least squares line in terms of the variables apparent temperature and actual temperature. (1 mark)
- Determine the equation of the least squares line that can be used to predict the apparent temperature from the actual temperature.
- The coefficient of determination for the association between the variables apparent temperature and actual temperature is 0.97
Interpret the coefficient of determination in terms of these variables. (1 mark)
- The residual plot obtained when the least squares line was fitted to the data is shown below.
- A residual plot can be used to test an assumption about the nature of the association between two numerical variables.
What is this assumption? (1 mark)
- Does the residual plot above support this assumption? Explain your answer. (1 mark)
- A residual plot can be used to test an assumption about the nature of the association between two numerical variables.
CORE, FUR1 2016 VCAA 9-10 MC
The scatterplot below shows life expectancy in years (life expectancy) plotted against the Human Development Index (HDI) for a large number of countries in 2011.
A least squares line has been fitted to the data and the resulting residual plot is also shown.
The equation of this least squares line is
life expectancy = 43.0 + 0.422 × HDI
The coefficient of determination is `r^2` = 0.875
Part 1
Given the information above, which one of the following statements is not true?
- The value of the correlation coefficient is close to 0.94
- 12.5% of the variation in life expectancy is not explained by the variation in the Human Development Index.
- On average, life expectancy increases by 43.0 years for each 10-point increase in the Human Development Index.
- Ignoring any outliers, the association between life expectancy and the Human Development Index can be described as strong, positive and linear.
- Using the least squares line to predict the life expectancy in a country with a Human Development Index of 75 is an example of interpolation.
Part 2
In 2011, life expectancy in Australia was 81.8 years and the Human Development Index was 92.9
When the least squares line is used to predict life expectancy in Australia, the residual is closest to
- `–0.6`
- `–0.4`
- `0.4`
- `11.1`
- `42.6`
CORE, FUR2 2007 VCAA 3
The table below displays the mean surface temperature (in °C) and the mean duration of warm spell (in days) in Australia for 13 years selected at random from the period 1960 to 2005.
This data set has been used to construct the scatterplot below. The scatterplot is incomplete.
- Complete the scatterplot below by plotting the bold data values given in the table above. Mark the point with a cross (×). (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Mean surface temperature is the explanatory variable.
- Determine the equation of the least squares regression line for this set of data. Write the equation in terms of the variables mean duration of warm spell and mean surface temperature. Write the value of the coefficients correct to one decimal place. (2 marks)
--- 2 WORK AREA LINES (style=lined) ---
- Plot the least squares regression line on Scatterplot 1. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Determine the equation of the least squares regression line for this set of data. Write the equation in terms of the variables mean duration of warm spell and mean surface temperature. Write the value of the coefficients correct to one decimal place. (2 marks)
The residual plot below was constructed to test the assumption of linearity for the relationship between the variables mean duration of warm spell and the mean surface temperature.
- Explain why this residual plot supports the assumption of linearity for this relationship. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Write down the percentage of variation in the mean duration of a warm spell that is explained by the variation in mean surface temperature. Write your answer correct to the nearest per cent. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Describe the relationship between the mean duration of a warm spell and the mean surface temperature in terms of strength, direction and form. (2 marks)
--- 2 WORK AREA LINES (style=lined) ---
CORE, FUR2 2007 VCAA 2
The mean surface temperature (in °C) of Australia for the period 1960 to 2005 is displayed in the time series plot below.
- In what year was the lowest mean surface temperature recorded? (1 mark)
The least squares method is used to fit a trend line to the time series plot.
- The equation of this trend line is found to be
mean surface temperature = – 12.361 + 0.013 × year
- Use the trend line to predict the mean surface temperature (in °C) for 2010. Write your answer correct to two decimal places. (1 mark)
The actual mean surface temperature in the year 2000 was 13.55°C.
- Determine the residual value (in °C) when the trend line is used to predict the mean surface temperature for this year. Write your answer correct to two decimal places. (1 mark)
- By how many degrees does the trend line predict Australia's mean surface temperature will rise each year? Write your answer correct to three decimal places. (1 mark)
CORE, FUR2 2012 VCAA 2
The maximum temperature and the minimum temperature at this weather station on each of the 30 days in November 2011 are displayed in the scatterplot below.
The correlation coefficient for this data set is `r = 0.630`.
The equation of the least squares regression line for this data set is
maximum temperature = `13 + 0.67` × minimum temperature
- Draw this least squares regression line on the scatterplot above. (1 mark)
- Interpret the vertical intercept of the least squares regression line in terms of maximum temperature and minimum temperature. (1 mark)
- Describe the relationship between the maximum temperature and the minimum temperature in terms of strength and direction. (1 mark)
- Interpret the slope of the least squares regression line in terms of maximum temperature and minimum temperature. (1 mark)
- Determine the percentage of variation in the maximum temperature that may be explained by the variation in the minimum temperature.
Write your answer, correct to the nearest percentage. (1 mark)
On the day that the minimum temperature was 11.1 °C, the actual maximum temperature was 12.2 °C.
- Determine the residual value for this day if the least squares regression line is used to predict the maximum temperature.
Write your answer, correct to the nearest degree. (2 marks)
CORE, FUR2 2013 VCAA 3
The development index and the average pay rate for workers, in dollars per hour, for a selection of 25 countries are displayed in the scatterplot below.
The table below contains the values of some statistics that have been calculated for this data.
- Determine the standardised value of the development index (`z` score) for a country with a development index of 91.
Write your answer, correct to one decimal place. (1 mark)
- Use the information in the table to show that the equation of the least squares regression line for a country’s development index, `y`, in terms of its average pay rate, `x`, is given by
`y = 81.3 + 0.272x` (2 marks) - The country with an average pay rate of $14.30 per hour has a development index of 83.
Determine the residual value when the least squares regression line given in part (b) is used to predict this country’s development index.
Write your answer, correct to one decimal place. (2 marks)
CORE, FUR1 2006 VCAA 8 MC
The waist measurement (cm) and weight (kg) of 12 men are displayed in the table below.
Using this data, the equation of the least squares regression line that enables weight to be predicted from waist measurement is
`text(weight = – 20 + 1.11 × waist)`
When this equation is used to predict the weight of the man with a waist measurement of 80 cm, the residual value is closest to
A. `–11\ text(kg)`
B. `11\ text(kg)`
C. `–2\ text(kg)`
D. `2\ text(kg)`
E. `69\ text(kg)`
CORE, FUR1 2009 VCAA 11 MC
The table below lists the average body weight (in kg) and average brain weight (in g) of nine animal species.
A least squares regression line is fitted to the data using body weight as the explanatory variable.
The equation of the least squares regression line is
`text(brain weight) = 49.4 + 2.68 xx text(body weight)`
This equation is then used to predict the brain weight (in g) of the baboon.
The residual value (in g) for this prediction will be closest to
A. `–351`
B. `–102`
C. `–78`
D. `78`
E. `102`
CORE, FUR1 2010 VCAA 7-9 MC
The height (in cm) and foot length (in cm) for each of eight Year 12 students were recorded and displayed in the scatterplot below.
A least squares regression line has been fitted to the data as shown.
Part 1
By inspection, the value of the product-moment correlation coefficient `(r)` for this data is closest to
- `0.98`
- `0.78`
- `0.23`
- `– 0.44`
- `– 0.67`
Part 2
The explanatory variable is foot length.
The equation of the least squares regression line is closest to
- height = –110 + 0.78 × foot length.
- height = 141 + 1.3 × foot length.
- height = 167 + 1.3 × foot length.
- height = 167 + 0.67 × foot length.
- foot length = 167 + 1.3 × height.
Part 3
The plot of the residuals against foot length is closest to