The Olympic gold medal-winning height for the women's high jump, \(\textit{Wgold}\), is often lower than the best height achieved in other international women's high jump competitions in that same year. The table below lists the Olympic year, \(\textit{year}\), the gold medal-winning height, \(\textit{Wgold}\), in metres, and the best height achieved in all international women's high jump competitions in that same year, \(\textit{Wbest}\), in metres, for each Olympic year from 1972 to 2020. A scatterplot of \(\textit{Wbest}\) versus \(\textit{Wgold}\) for this data is also provided. Wgold Wbest When a least squares line is fitted to the scatterplot, the equation is found to be: \(Wbest =0.300+0.860 \times Wgold\) The correlation coefficient is 0.9318 --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- \begin{array}{|l|l|} --- 3 WORK AREA LINES (style=lined) --- --- 4 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 3 WORK AREA LINES (style=lined) --- --- 2 WORK AREA LINES (style=lined) ---
year
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
(m)1.92
1.93
1.97
2.02
2.03
2.02
2.05
2.01
2.06
2.05
2.05
1.97
2.04
(m)1.94
1.96
1.98
2.07
2.07
2.05
2.05
2.02
2.06
2.06
2.05
2.01
2.05
\hline
\rule{0pt}{2.5ex}\text { strength } \rule[-1ex]{0pt}{0pt} & \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\
\hline
\rule{0pt}{2.5ex}\text { direction } \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array}
Data Analysis, GEN1 2024 VCAA 9-10 MC
The least squares equation for the relationship between the average number of male athletes per competing nation, \(males\), and the number of the Summer Olympic Games, \(number\), is
\(males =67.5-1.27 \times number\)
Part A
The summary statistics for the variables number and males are shown in the table below.
The value of Pearson's correlation coefficient, \(r\), rounded to three decimal places, is closest to
- \(-0.569\)
- \(-0.394\)
- \(0.394\)
- \(0.569\)
Part B
At which Summer Olympic Games will the predicted average number of \(males\) be closest to 25.6 ?
- 31st
- 32nd
- 33rd
- 34th
Data Analysis, GEN2 2023 VCAA 3
The scatterplot below plots the average monthly ice cream consumption, in litres/person, against average monthly temperature, in °C. The data for the graph was recorded in the Northern Hemisphere.
When a least squares line is fitted to the scatterplot, the equation is found to be:
consumption = 0.1404 + 0.0024 × temperature
The coefficient of determination is 0.7212
- Draw the least squares line on the scatterplot graph above. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Determine the value of the correlation coefficient \(r\).
- Round your answer to three decimal places. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Describe the association between average monthly ice cream consumption and average monthly temperature in terms of strength, direction and form. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
\begin{array} {|l|c|}
\hline
\rule{0pt}{2.5ex} \textbf{strength} \rule[-1ex]{0pt}{0pt} & \quad \quad \quad \quad \quad \quad \quad \quad \\
\hline
\rule{0pt}{2.5ex} \textbf{direction} \rule[-1ex]{0pt}{0pt} & \\
\hline
\rule{0pt}{2.5ex} \textbf{form} \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array} - Referring to the equation of the least squares line, interpret the value of the intercept in terms of the variables consumption and temperature. (1 mark)
--- 3 WORK AREA LINES (style=lined) ---
- Use the equation of the least squares line to predict the average monthly ice cream consumption, in litres per person, when the monthly average temperature is –6°C. (1 mark)
--- 2 WORK AREA LINES (style=lined) ---
- Write down whether this prediction is an interpolation or an extrapolation. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
Data Analysis, GEN1 2022 VCAA 12-14 MC
The scatterplot below displays the body length, in centimetres, of 17 crocodiles, plotted against their head length, in centimetres. A least squares line has been fitted to the scatterplot. The explanatory variable is head length.
Question 12
The equation of the least squares line is closest to
- head length = –40 + 7 × body length
- body length = –40 + 7 × head length
- head length = 168 + 7 × body length
- body length = 168 – 40 × head length
- body length = 7 + 168 × head length
Question 13
The median head length of the 17 crocodiles, in centimetres, is closest to
- 49
- 51
- 54
- 300
- 345
Question 14
The correlation coefficient \(r\) is equal to 0.963
The percentage of variation in body length that is not explained by the variation in head length is closest to
- 0.9%
- 3.7%
- 7.3%
- 92.7%
- 96.3%
Data Analysis, GEN1 2023 VCAA 10 MC
A study of Year 10 students shows that there is a negative association between the scores of topic tests and the time spent on social media. The coefficient of determination is 0.72
From this information it can be concluded that
- a decreased time spent on social media is associated with an increased topic test score.
- less time spent on social media causes an increase in topic test performance.
- an increased time spent on social media is associated with an increased topic test score.
- too much time spent on social media causes a reduction in topic test performance.
- a decreased time spent on social media is associated with a decreased topic test score.
CORE, FUR2 2021 VCAA 3
The time series plot below shows the winning time, in seconds, for the women's 100 m freestyle swim plotted against year, for each year that the Olympic Games were held during the period 1956 to 2016.
A least squares line has been fitted to the plot to model the decreasing trend in the winning time over this period.
The equation of the least squares line is
winning time = 357.1 – 0.1515 × year
The coefficient of determination is 0.8794
- Name the explanatory variable in this time series plot. (1 mark)
- Determine the value of the correlation coefficient (`r`).
- Round your answer to three decimal places. (1 mark)
- Write down the average decrease in winning time, in seconds per year, during the period 1956 to 2016. (1 mark)
- The predicted winning time for the women's 100 m freestyle in 2000 was 54.10 seconds.
- The actual winning time for the women's 100 m freestyle in 2000 was 53.83 seconds.
- Determine the residual value in seconds. (1 mark)
- The following equation can be used to predict the winning time for the women's 100 m freestyle in the future.
- winning time = 357.1 – 0.1515 × year
- i. Show that the predicted winning time for the women's 100 m freestyle in 2032 is 49.252 seconds. (1 mark)
- ii. What assumption is being made when this equation is used to predict the winning time for the women's 100 m freestyle in 2032? (1 mark)
CORE, FUR2 2021 VCAA 2
The two running events in the heptathlon are the 200 m run and the 800 m run. The times taken by the athletes in these two events, times200 and time800, are linearly related.
When a least squares line is fitted to the data, the equation of this line is found to be
time800 = 0.03931 + 5.2756 × time200
- Round the values for the intercept and the slope to three significant figures. Write your answers in the boxes provided. (1 mark)
time800= + × time200 - The mean and the standard deviation for each variable, time200 and time800, are shown in the table below.
The equation of the least squares line is
time800 = 0.03931 + 5.2756 × time200
Use this information to calculate the coefficient of determination as a percentage.Round your answer to the nearest percentage. (2 marks)
CORE, FUR2 2020 VCAA 5
The scatterplot below shows body density, in kilograms per litre, plotted against waist measurement, in centimetres, for 250 men.
When a least squares line is fitted to the scatterplot, the equation of this line is
body density = 1.195 – 0.001512 × waist measurement
- Draw the graph of this least squares line on the scatterplot above. (1 mark)
(Answer on the scatterplot above.)
- Use the equation of this least squares line to predict the body density of a man whose waist measurement is 65 cm.
Round your answer to two decimal places. (1 mark)
- When using the equation of this least squares line to make the prediction in part b., are you extrapolating or interpolating? (1 mark)
- Interpret the slope of this least squares line in terms of a man’s body density and waist measurement. (1 mark)
- In this study, the body density of the man with a waist measurement of 122 cm was 0.995 kg/litre.
Show that, when this least squares line is fitted to the scatterplot, the residual, rounded to two decimal places, is –0.02 (1 mark)
- The coefficient of determination for this data is 0.6783
Write down the value of the correlation coefficient `r`.
Round your answer to three decimal places. (1 mark)
- The residual plot associated with fitting a least squares line to this data is shown below.
Does this residual plot support the assumption of linearity that was made when fitting this line to this data? Briefly explain your answer. (1 mark)
CORE, FUR2 2020 VCAA 4
The age, in years, body density, in kilograms per litre, and weight, in kilograms, of a sample of 12 men aged 23 to 25 years are shown in the table below.
Age (years) |
Body density |
Weight |
|
23 | 1.07 | 70.1 | |
23 | 1.07 | 90.4 | |
23 | 1.08 | 73.2 | |
23 | 1.08 | 85.0 | |
24 | 1.03 | 84.3 | |
24 | 1.05 | 95.6 | |
24 | 1.07 | 71.7 | |
24 | 1.06 | 95.0 | |
25 | 1.07 | 80.2 | |
25 | 1.09 | 87.4 | |
25 | 1.02 | 94.9 | |
25 | 1.09 | 65.3 |
- For these 12 men, determine
- i. their median age, in years (1 mark)
- ii. the mean of their body density, in kilograms per litre. (1 mark)
- A least squares line is to be fitted to the data with the aim of predicting body density from weight.
- i. Name the explanatory variable for this least squares line. (1 mark)
- ii. Determine the slope of this least squares line.
- Round your answer to three significant figures. (1 mark)
- What percentage of the variation in body density can be explained by the variation in weight?
- Round your answer to the nearest percentage. (1 mark)
CORE, FUR2-NHT 2019 VCAA 4
The scatterplot below plots the variable life span, in years, against the variable sleep time, in hours, for a sample of 19 types of mammals.
On the assumption that the association between sleep time and life span is linear, a least squares line is fitted to this data with sleep time as the explanatory variable.
The equation of this least squares line is
life span = 42.1 – 1.90 × sleep time
The coefficient of determination is 0.416
- Draw the graph of the least squares line on the scatterplot above. (1 mark)
- Describe the linear association between life span and sleep time in terms of strength and direction. (2 marks)
- Interpret the slope of the least squares line in terms of life span and sleep time. (2 marks)
- Interpret the coefficient of determination in terms of life span and sleep time. (1 mark)
- The life of the mammal with a sleep time of 12 hours is 39.2 years.
Show that, when the least squares line is used to predict the life span of this mammal, the residual is 19.9 years. (2 marks)
CORE, FUR2-NHT 2019 VCAA 3
The life span, in years, and gestation period, in days, for 19 types of mammals are displayed in the table below.
- A least squares line that enables life span to be predicted from gestation period is fitted to this data. (1 mark)
Name the explanatory variable in the equation of this least squares line.
- Determine the equation of the least squares line in terms of the variables life span and gestation period.
Write your answers in the appropriate boxes provided below.
Round the numbers representing the intercept and slope to three significant figures. (2 marks)
= + ×
- Write the value of the correlation rounded to three decimal places. (1 mark)
`r =`
CORE, FUR1-NHT 2019 VCAA 13 MC
The association between amount of protein consumed (in grams/day) and family income (in dollars) is best displayed using
- a scatterplot.
- a time series plot.
- parallel boxplots.
- back-to-back stem plots.
- a two-way frequency table.
CORE, FUR1-NHT 2019 VCAA 12 MC
Which one of the following statements could be true when written as part of the results of a statistical investigation?
- The correlation coefficient between height (in centimetres) and foot length (in centimetres) was found to be `r = 1.24`
- The correlation coefficient between height (below average, average, above average) and arm span (in centimetres) was found to be `r = 0.64`
- The correlation coefficient between blood pressure (low, normal, high) and age (under 25, 25–49, over 50) was found to be `r = 0.74`
- The correlation coefficient between the height of students of the same age (in centimetres) and the money they spent on snack food (in dollars) was found to be `r = 0.22`
- The correlation coefficient between height of wheat (in centimetres) and grain yield (in tonnes) was found to be `r = –0.40` and the coefficient of determination was found to be `r^2 = –0.16`
CORE, FUR2 2019 VCAA 5
The scatterplot below shows the atmospheric pressure, in hectopascals (hPa), at 3 pm (pressure 3 pm) plotted against the atmospheric pressure, in hectopascals, at 9 am (pressure 9 am) for 23 days in November 2017 at a particular weather station.
A least squares line has been fitted to the scatterplot as shown.
The equation of this line is
pressure 3 pm = 111.4 + 0.8894 × pressure 9 am
- Interpret the slope of this least squares line in terms of the atmospheric pressure at this weather station at 9 am and at 3 pm. (1 mark)
- Use the equation of the least squares line to predict the atmospheric pressure at 3 pm when the atmospheric pressure at 9 am is 1025 hPa.
Round your answer to the nearest whole number. (1 mark)
- Is the prediction made in part b. an example of extrapolation or interpolation? (1 mark)
- Determine the residual when the atmospheric pressure at 9 am is 1013 hPa.
Round your answer to the nearest whole number. (1 mark)
- The mean and the standard deviation of pressure 9 am and pressure 3 pm for these 23 days are shown in Table 4 below.
-
- Use the equation of the least squares line and the information in Table 4 to show that the correlation coefficient for this data, rounded to three decimal places, is `r` = 0.966 (1 mark)
- What percentage of the variation in pressure 3 pm is explained by the variation in pressure 9 am?
Round your answer to one decimal place. (1 mark)
- The residual plot associated with the least squares line is shown below.
-
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
What is this assumption? (1 mark)
- The residual plot above does not support this assumption.
Explain why. (1 mark)
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
CORE, FUR2 2019 VCAA 4
The relative humidity (%) at 9 am and 3 pm on 14 days in November 2017 is shown in Table 3 below.
A least squares line is to be fitted to the data with the aim of predicting the relative humidity at 3 pm (humidity 3 pm) from the relative humidity at 9 am (humidity 9 am).
- Name the explanatory variable. (1 mark)
- Determine the values of the intercept and the slope of this least squares line.
Round both values to three significant figures and write them in the appropriate boxes provided.
humidity 3 pm = |
|
+ |
|
× humidity 9 am (1 mark) |
- Determine the value of the correlation coefficient for this data set.
Round your answer to three decimal places. (1 mark)
CORE, FUR1 2019 VCAA 9-10 MC
A least squares line is used to model the relationship between the monthly average temperature and latitude recorded at seven different weather stations. The equation of the least squares line is found to be
`quad text(average temperature) = 42.9842 - 0.877447 xx text(latitude)`
Part 1
When the numbers in this equation are correctly rounded to three significant figures, the equation will be
- `text(average temperature) = 42.984 - 0.877 xx text(latitude)`
- `text(average temperature) = 42.984 - 0.878 xx text(latitude)`
- `text(average temperature) = 43.0 - 0.878 xx text(latitude)`
- `text(average temperature) = 42.9 - 0.878 xx text(latitude)`
- `text(average temperature) = 43.0 - 0.877 xx text(latitude)`
Part 2
The coefficient of determination was calculated to be 0.893743
The value of the correlation coefficient, rounded to three decimal places, is
- − 0.945
- − 0.898
- 0.806
- 0.898
- 0.945
CORE, FUR2 2018 VCAA 2
The congestion level in a city can be recorded as the percentage increase in travel time due to traffic congestion in peak periods (compared to non-peak periods).
This is called the percentage congestion level.
The percentage congestion levels for the morning and evening peak periods for 19 large cities are plotted on the scatterplot below.
- Determine the median percentage congestion level for the morning peak period and the evening peak period.
Write your answers in the appropriate boxes provided below. (2 marks)
Median percentage congestion level for morning peak period |
%
|
Median percentage congestion level for evening peak period |
%
|
A least squares line is to be fitted to the data with the aim of predicting evening congestion level from morning congestion level.
The equation of this line is.
evening congestion level = 8.48 + 0.922 × morning congestion level
- Name the response variable in this equation. (1 mark)
- Use the equation of the least squares line to predict the evening congestion level when the morning congestion level is 60%. (1 mark)
- Determine the residual value when the equation of the least squares line is used to predict the evening congestion level when the morning congestion level is 47%.
Round your answer to one decimal place? (2 marks)
- The value of the correlation coefficient `r` is 0.92
What percentage of the variation in the evening congestion level can be explained by the variation in the morning congestion level?
Round your answer to the nearest whole number. (1 mark)
CORE, FUR1 2018 VCAA 14 MC
A least squares line is fitted to a set of bivariate data.
Another least squares line is fitted with response and explanatory variables reversed.
Which one of the following statistics will not change in value?
- the residual values
- the predicted values
- the correlation coefficient `r`
- the slope of the least squares line
- the intercept of the least squares line
CORE, FUR1 2018 VCAA 7-9 MC
The scatterplot below displays the resting pulse rate, in beats per minute, and the time spent exercising, in hours per week, of 16 students. A least squares line has been fitted to the data.
Part 1
Using this least squares line to model the association between resting pulse rate and time spent exercising, the residual for the student who spent four hours per week exercising is closest to
- –2.0 beats per minute.
- –1.0 beats per minute.
- –0.3 beats per minute.
- 1.0 beats per minute.
- 2.0 beats per minute.
Part 2
The equation of this least squares line is closest to
- resting pulse rate = 67.2 – 0.91 × time spent exercising
- resting pulse rate = 67.2 – 1.10 × time spent exercising
- resting pulse rate = 68.3 – 0.91 × time spent exercising
- resting pulse rate = 68.3 – 1.10 × time spent exercising
- resting pulse rate = 67.2 + 1.10 × time spent exercising
Part 3
The coefficient of determination is 0.8339
The correlation coefficient `r` is closest to
- –0.913
- –0.834
- –0.695
- 0.834
- 0.913
CORE, FUR2 2017 VCAA 3
The number of male moths caught in a trap set in a forest and the egg density (eggs per square metre) in the forest are shown in the table below.
- Determine the equation of the least squares line that can be used to predict the egg density in the forest from the number of male moths caught in the trap.
Write the values of the intercept and slope of this least squares line in the appropriate boxes provided below.
Round your answers to one decimal place. (2 marks)
- The number of female moths caught in a trap set in a forest and the egg density (eggs per square metre) in the forest can also be examined.
A scatterplot of the data is shown below.
The equation of the least squares line isegg density = 191 + 31.3 × number of female moths
- Draw the graph of this least squares line on the scatterplot (provided above). (1 mark)
- Interpret the slope of the regression line in terms of the variables egg density and number of female moths caught in the trap. (1 mark)
- The egg density is 1500 when the number of female moths caught is 55.
Determine the residual value if the least squares line is used to predict the egg density for this number of female moths. (1 mark)
- The correlation coefficient is `r = 0.862`
Determine the percentage of the variation in egg density in the forest explained by the variation in the number of female moths caught in the trap.
Round your answer to one decimal place. (1 mark)
CORE, FUR1 2017 VCAA 12 MC
Data collected over a period of 10 years indicated a strong, positive association between the number of stray cats and the number of stray dogs reported each year (`r = 0.87`) in a large, regional city.
A positive association was also found between the population of the city and both the number of stray cats (`r = 0.61`) and the number of stray dogs (`r = 0.72`).
During the time that the data was collected, the population of the city grew from 34 564 to 51 055.
From this information, we can conclude that
- if cat owners paid more attention to keeping dogs off their property, the number of stray cats reported would decrease.
- the association between the number of stray cats and stray dogs reported cannot be causal because only a correlation of +1 or –1 shows causal relationships.
- there is no logical explanation for the association between the number of stray cats and stray dogs reported in the city so it must be a chance occurrence.
- because larger populations tend to have both a larger number of stray cats and stray dogs, the association between the number of stray cats and the number of stray dogs can be explained by a common response to a third variable, which is the increasing population size of the city.
- more stray cats were reported because people are no longer as careful about keeping their cats properly contained on their property as they were in the past.
CORE, FUR2 2016 VCAA 3
The data in the table below shows a sample of actual temperatures and apparent temperatures recorded at a weather station. A scatterplot of the data is also shown.
The data will be used to investigate the association between the variables apparent temperature and actual temperature.
- Use the scatterplot to describe the association between apparent temperature and actual temperature in terms of strength, direction and form. (1 mark)
-
- Determine the equation of the least squares line that can be used to predict the apparent temperature from the actual temperature.
Write the values of the intercept and slope of this least squares line in the appropriate boxes provided below.
Round your answers to two significant figures. (3 marks)
apparent temperature `=` `+` `xx` actual temperature - Interpret the intercept of the least squares line in terms of the variables apparent temperature and actual temperature. (1 mark)
- Determine the equation of the least squares line that can be used to predict the apparent temperature from the actual temperature.
- The coefficient of determination for the association between the variables apparent temperature and actual temperature is 0.97
Interpret the coefficient of determination in terms of these variables. (1 mark)
- The residual plot obtained when the least squares line was fitted to the data is shown below.
- A residual plot can be used to test an assumption about the nature of the association between two numerical variables.
What is this assumption? (1 mark)
- Does the residual plot above support this assumption? Explain your answer. (1 mark)
- A residual plot can be used to test an assumption about the nature of the association between two numerical variables.
CORE, FUR1 2016 VCAA 11-12 MC
The table below gives the Human Development Index (HDI) and the mean number of children per woman (children) for 14 countries in 2007.
A scatterplot of the data is also shown.
Part 1
The scatterplot is non-linear.
A log transformation applied to the variable children can be used to linearise the scatterplot.
With HDI as the explanatory variable, the equation of the least squares line fitted to the linearised data is closest to
- log(children) = 1.1 – 0.0095 × HDI
- children = 1.1 – 0.0095 × log(HDI)
- log(children) = 8.0 – 0.77 × HDI
- children = 8.0 – 0.77 × log(HDI)
- log(children) = 21 – 10 × HDI
Part 2
There is a strong positive association between a country’s Human Development Index and its carbon dioxide emissions.
From this information, it can be concluded that
- increasing a country’s carbon dioxide emissions will increase the Human Development Index of the country.
- decreasing a country’s carbon dioxide emissions will increase the Human Development Index of the country.
- this association must be a chance occurrence and can be safely ignored.
- countries that have higher human development indices tend to have higher levels of carbon dioxide emissions.
- countries that have higher human development indices tend to have lower levels of carbon dioxide emissions.
CORE, FUR1 2016 VCAA 9-10 MC
The scatterplot below shows life expectancy in years (life expectancy) plotted against the Human Development Index (HDI) for a large number of countries in 2011.
A least squares line has been fitted to the data and the resulting residual plot is also shown.
The equation of this least squares line is
life expectancy = 43.0 + 0.422 × HDI
The coefficient of determination is `r^2` = 0.875
Part 1
Given the information above, which one of the following statements is not true?
- The value of the correlation coefficient is close to 0.94
- 12.5% of the variation in life expectancy is not explained by the variation in the Human Development Index.
- On average, life expectancy increases by 43.0 years for each 10-point increase in the Human Development Index.
- Ignoring any outliers, the association between life expectancy and the Human Development Index can be described as strong, positive and linear.
- Using the least squares line to predict the life expectancy in a country with a Human Development Index of 75 is an example of interpolation.
Part 2
In 2011, life expectancy in Australia was 81.8 years and the Human Development Index was 92.9
When the least squares line is used to predict life expectancy in Australia, the residual is closest to
- `–0.6`
- `–0.4`
- `0.4`
- `11.1`
- `42.6`
CORE, FUR2 2006 VCAA 2
The heights (in cm) and ages (in months) of a random sample of 15 boys have been plotted in the scatterplot below. The least squares regression line has been fitted to the data.
The equation of the least squares regression line is
`text(height = 75.4 + 0.53 × age)`
The correlation coefficient is `r= 0.7541`
- Complete the following sentence.
On average, the height of a boy increases by _______ cm for each one-month increase in age. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
-
- Evaluate the coefficient of determination.
Write your answer, as a percentage, correct to one decimal place. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Interpret the coefficient of determination in terms of the variables height and age. (1 mark)
--- 2 WORK AREA LINES (style=lined) ---
- Evaluate the coefficient of determination.
CORE, FUR2 2007 VCAA 3
The table below displays the mean surface temperature (in °C) and the mean duration of warm spell (in days) in Australia for 13 years selected at random from the period 1960 to 2005.
This data set has been used to construct the scatterplot below. The scatterplot is incomplete.
- Complete the scatterplot below by plotting the bold data values given in the table above. Mark the point with a cross (×). (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Mean surface temperature is the explanatory variable.
- Determine the equation of the least squares regression line for this set of data. Write the equation in terms of the variables mean duration of warm spell and mean surface temperature. Write the value of the coefficients correct to one decimal place. (2 marks)
--- 2 WORK AREA LINES (style=lined) ---
- Plot the least squares regression line on Scatterplot 1. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Determine the equation of the least squares regression line for this set of data. Write the equation in terms of the variables mean duration of warm spell and mean surface temperature. Write the value of the coefficients correct to one decimal place. (2 marks)
The residual plot below was constructed to test the assumption of linearity for the relationship between the variables mean duration of warm spell and the mean surface temperature.
- Explain why this residual plot supports the assumption of linearity for this relationship. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Write down the percentage of variation in the mean duration of a warm spell that is explained by the variation in mean surface temperature. Write your answer correct to the nearest per cent. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Describe the relationship between the mean duration of a warm spell and the mean surface temperature in terms of strength, direction and form. (2 marks)
--- 2 WORK AREA LINES (style=lined) ---
CORE, FUR2 2009 VCAA 3
The scatterplot below shows the rainfall (in mm) and the percentage of clear days for each month of 2008.
An equation of the least squares regression line for this data set is
rainfall = 131 – 2.68 × percentage of clear days
- Draw this line on the scatterplot. (1 mark)
- Use the equation of the least squares regression line to predict the rainfall for a month with 35% of clear days. Write your answer in mm correct to one decimal place. (1 mark)
- The coefficient of determination for this data set is 0.8081.
- Interpret the coefficient of determination in terms of the variables rainfall and percentage of clear days. (1 mark)
- Determine the value of Pearson’s product moment correlation coefficient. Write your answer correct to three decimal places. (2 marks)
CORE, FUR2 2011 VCAA 2
Table 1 shows information about a particular country. It shows the percentage of women by age at first marriage, for the years 1986, 1996 and 2006.
- Of the women who first married in 1986, what percentage were aged 20 to 29 years inclusive? (1 mark)
- Does the information in Table 1 support the opinion that, for the years 1986, 1996 and 2006, the age of women at first marriage was associated with years of marriage?
Justify your answer by quoting appropriate percentages. It is sufficient to consider one age group only when justifying your answer. (2 marks)
CORE, FUR2 2012 VCAA 2
The maximum temperature and the minimum temperature at this weather station on each of the 30 days in November 2011 are displayed in the scatterplot below.
The correlation coefficient for this data set is `r = 0.630`.
The equation of the least squares regression line for this data set is
maximum temperature = `13 + 0.67` × minimum temperature
- Draw this least squares regression line on the scatterplot above. (1 mark)
- Interpret the vertical intercept of the least squares regression line in terms of maximum temperature and minimum temperature. (1 mark)
- Describe the relationship between the maximum temperature and the minimum temperature in terms of strength and direction. (1 mark)
- Interpret the slope of the least squares regression line in terms of maximum temperature and minimum temperature. (1 mark)
- Determine the percentage of variation in the maximum temperature that may be explained by the variation in the minimum temperature.
Write your answer, correct to the nearest percentage. (1 mark)
On the day that the minimum temperature was 11.1 °C, the actual maximum temperature was 12.2 °C.
- Determine the residual value for this day if the least squares regression line is used to predict the maximum temperature.
Write your answer, correct to the nearest degree. (2 marks)
CORE, FUR2 2014 VCAA 4
The scatterplot below shows the population density, in people per square kilometre, and the area, in square kilometres, of 38 inner suburbs of the same city.
For this scatterplot, `r^2 = 0.141`
- Describe the association between the variables population density and area for these suburbs in terms of strength, direction and form. (1 mark)
- The mean and standard deviation of the variables population density and area for these 38 inner suburbs are shown in the table below.
- One of these suburbs has a population density of 3082 people per square kilometre.
Determine the standard `z`-score of this suburb’s population density.
Write your answer, correct to one decimal place. (1 mark)
- One of these suburbs has a population density of 3082 people per square kilometre.
Assume the areas of these inner suburbs are approximately normally distributed.
- How many of these 38 suburbs are expected to have an area that is two standard deviations or more above the mean?
Write your answer, correct to the nearest whole number. (1 mark)
- How many of these 38 inner suburbs actually have an area that is two standard deviations or more above the mean? (1 mark)
CORE, FUR2 2014 VCAA 2
The scatterplot below shows the population and area (in square kilometres) of a sample of inner suburbs of a large city.
The equation of the least squares regression line for the data in the scatterplot is
population = 5330 + 2680 × area
- Write down the response variable. (1 mark)
- Draw the least squares regression line on the scatterplot above.
(Answer on the scatterplot above.) (1 mark)
- Interpret the slope of this least squares regression line in terms of the variables area and population. (2 marks)
- Wiston is an inner suburb. It has an area of 4 km² and a population of 6690.
The correlation coefficient, `r`, is equal to 0.668
- Calculate the residual when the least squares regression line is used to predict the population of Wiston from its area. (1 mark)
- What percentage of the variation in the population of the suburbs is explained by the variaton in area.
Write your answer, correct to one decimal place. (1 mark)
CORE, FUR2 2015 VCAA 4
The table below shows male life expectancy (male) and female life expectancy (female) for a number of countries in 2013. The scatterplot has been constructed from this data.
- Use the scatterplot to describe the association between male life expectancy and female life expectancy in terms of strength, direction and form. (1 mark)
- Determine the equation of a least squares regression line that can be used to predict male life expectancy from female life expectancy for the year 2013.
Complete the equation for the least squares regression line below by writing the intercept and slope in the space provided.
Write these values correct to two decimal places. (1 mark)
male = ______________ + ______________ × female
CORE, FUR2 2015 VCAA 3
The scatterplot below plots male life expectancy (male) against female life expectancy (female) in 1950 for a number of countries. A least squares regression line has been fitted to the scatterplot as shown.
The slope of this least squares regression line is 0.88
- Interpret the slope in terms of the variables male life expectancy and female life expectancy. (1 mark)
--- 3 WORK AREA LINES (style=lined) ---
The equation of this least squares regression line is
male = 3.6 + 0.88 × female
- In a particular country in 1950, female life expectancy was 35 years.
Use the equation to predict male life expectancy for that country. (1 mark)
--- 2 WORK AREA LINES (style=lined) ---
- The coefficient of determination is 0.95
Interpret the coefficient of determination in terms of male life expectancy and female life expectancy. (1 mark)
--- 4 WORK AREA LINES (style=lined) ---
CORE, FUR1 2007 VCAA 7-8 MC
The lengths and diameters (in mm) of a sample of jellyfish selected were recorded and displayed in the scatterplot below. The least squares regression line for this data is shown.
The equation of the least squares regression line is
length = 3.5 + 0.87 × diameter
The correlation coefficient is `r = 0.9034`
Part 1
Written as a percentage, the coefficient of determination is closest to
- `0.816 text(%)`
- `0.903text(%)`
- `81.6text(%)`
- `90.3text(%)`
- `95.0text(%)`
Part 2
From the equation of the least squares regression line, it can be concluded that for these jellyfish, on average
- there is a 3.5 mm increase in diameter for each 1 mm increase in length.
- there is a 3.5 mm increase in length for each 1 mm increase in diameter.
- there is a 0.87 mm increase in diameter for each 1 mm increase in length.
- there is a 0.87 mm increase in length for each 1 mm increase in diameter.
- there is a 4.37 mm increase in diameter for each 1 mm increase in length.
CORE, FUR1 2011 VCAA 11 MC
For a group of 15-year-old students who regularly played computer games, the correlation between the time spent playing computer games and fitness level was found to be `r = -0.56.`
On the basis of this information it can be concluded that
- 56% of these students were not very fit.
- these students would become fitter if they if they spent less time playing computer games.
- these students would become fitter if they if they spent more time playing computer games.
- the students in the group who spent a short amount of time playing computer games tended to be fitter.
- the students in the group who spent a large amount of time playing computer games tended to be fitter.
CORE, FUR1 2011 VCAA 6-8 MC
When blood pressure is measured, both the systolic (or maximum) pressure and the diastolic (or minimum) pressure are recorded.
Table 1 displays the blood pressure readings, in mmHg, that result from fifteen successive measurements of the same person's blood pressure.
Part 1
Correct to one decimal place, the mean and standard deviation of this person's systolic blood pressure measurements are respectively
A. `124.9 and 4.4`
B. `125.0 and 5.8`
C. `125.0 and 6.0`
D. `125.9 and 5.8`
E. `125.9 and 6.0`
Part 2
Using systolic blood pressure (systolic) as the response variable, and diastolic blood pressure (diastolic) as the explanatory variable, a least squares regression line is fitted to the data in Table 1.
The equation of the least squares regression line is closest to
A. `text(systolic) = 70.3 + 0.790 xx text(diastolic)`
B. `text(diastolic) = 70.3 + 0.790 xx text(systolic)`
C. `text(systolic) = 29.3 + 0.330 xx text(diastolic)`
D. `text(diastolic) = 0.330 + 29.3 xx text(systolic)`
E. `text(systolic) = 0.790 + 70.3 xx text(diastolic)`
Part 3
From the fifteen blood pressure measurements for this person, it can be concluded that the percentage of the variation in systolic blood pressure that is explained by the variation in diastolic blood pressure is closest to
A. `25.8text(%)`
B. `50.8text(%)`
C. `55.4text(%)`
D. `71.9text(%)`
E. `79.0text(%)`
CORE, FUR1 2008 VCAA 10 MC
A large study of Year 12 students shows that there is a negative association between the time spent doing homework each week and the time spent watching television. The correlation coefficient is `r = – 0.6`.
From this information it can be concluded that
- the time spent doing homework is 60% lower than the time spent watching television.
- 36% of students spend more time watching television than doing homework.
- the slope of the least squares regression line is 0.6.
- if a student spends less time watching television, they will do more homework.
- an increased time spent watching television is associated with a decreased time doing homework.
CORE, FUR1 2010 VCAA 7-9 MC
The height (in cm) and foot length (in cm) for each of eight Year 12 students were recorded and displayed in the scatterplot below.
A least squares regression line has been fitted to the data as shown.
Part 1
By inspection, the value of the product-moment correlation coefficient `(r)` for this data is closest to
- `0.98`
- `0.78`
- `0.23`
- `– 0.44`
- `– 0.67`
Part 2
The explanatory variable is foot length.
The equation of the least squares regression line is closest to
- height = –110 + 0.78 × foot length.
- height = 141 + 1.3 × foot length.
- height = 167 + 1.3 × foot length.
- height = 167 + 0.67 × foot length.
- foot length = 167 + 1.3 × height.
Part 3
The plot of the residuals against foot length is closest to
CORE, FUR1 2012 VCAA 7 MC
The table below shows the percentage of students in two age groups (15–19 years and 20–24 years) who regularly use the internet at one or more of three locations.
- at home
- at an educational institution
- at work
For the students surveyed, which one of the following statements, by itself, supports the contention that the location of internet use is associated with the age group of the internet user?
- 85% of students aged 15–19 years used the internet at an educational institution.
- 95% of students aged 15–19 years used the internet at home, but only 38% of 15–19 year olds used it at work.
- 95% of students aged 15–19 years used the internet at home and 18% of 20–24 year olds used the internet at an educational institution.
- The percentage of students who used the internet at an educational institution decreased from 85% for those aged 15–19 years to 18% for those aged 20–24 years.
- The percentage of students who used the internet at home was 95% for those aged 15–19 years and 95% for those aged 20–24 years.
CORE, FUR1 2009 VCAA 9-10 MC
The table below lists the average life span (in years) and average sleeping time (in hours/day) of 12 animal species.
Part 1
Using sleeping time as the independent variable, a least squares regression line is fitted to the data.
The equation of the least squares regression line is closest to
A. life span = 38.9 – 2.36 × sleeping time.
B. life span = 11.7 – 0.185 × sleeping time.
C. life span = – 0.185 – 11.7 × sleeping time.
D. sleeping time = 11.7 – 0.185 × life span.
E. sleeping time = 38.9 – 2.36 × life span.
Part 2
The value of Pearson’s product-moment correlation coefficient for life span and sleeping time is closest to
A. `–0.6603`
B. `–0.4360`
C. `–0.1901`
D. `0.4360`
E. `0.6603`
CORE, FUR1 2013 VCAA 8 MC
CORE, FUR1 2013 VCAA 7 MC
For a city, the correlation coefficient between
- population density and distance from the centre of the city is `r` = – 0.563
- house size and distance from the centre of the city is `r` = 0.357.
Given this information, which one of the following statements is true?
- Around 31.7% of the variation observed in house size in the city can be explained by the variation in distance from the centre of the city.
- Population density tends to increase as the distance from the centre of the city increases.
- House sizes tend to be larger as the distance from the centre of the city decreases.
- The slope of a least squares regression line relating population density to distance from the centre of the city is positive.
- Population density is more strongly associated with distance from the centre of the city than is house size.