The Olympic gold medal-winning height for the women's high jump, \(\textit{Wgold}\), is often lower than the best height achieved in other international women's high jump competitions in that same year. The table below lists the Olympic year, \(\textit{year}\), the gold medal-winning height, \(\textit{Wgold}\), in metres, and the best height achieved in all international women's high jump competitions in that same year, \(\textit{Wbest}\), in metres, for each Olympic year from 1972 to 2020. A scatterplot of \(\textit{Wbest}\) versus \(\textit{Wgold}\) for this data is also provided. Wgold Wbest When a least squares line is fitted to the scatterplot, the equation is found to be: \(Wbest =0.300+0.860 \times Wgold\) The correlation coefficient is 0.9318 --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 1 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- \begin{array}{|l|l|} --- 3 WORK AREA LINES (style=lined) --- --- 4 WORK AREA LINES (style=lined) --- --- 0 WORK AREA LINES (style=lined) --- --- 3 WORK AREA LINES (style=lined) --- --- 2 WORK AREA LINES (style=lined) ---
year
1972
1976
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
(m)1.92
1.93
1.97
2.02
2.03
2.02
2.05
2.01
2.06
2.05
2.05
1.97
2.04
(m)1.94
1.96
1.98
2.07
2.07
2.05
2.05
2.02
2.06
2.06
2.05
2.01
2.05
\hline
\rule{0pt}{2.5ex}\text { strength } \rule[-1ex]{0pt}{0pt} & \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\
\hline
\rule{0pt}{2.5ex}\text { direction } \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array}
Data Analysis, GEN1 2024 VCAA 9-10 MC
The least squares equation for the relationship between the average number of male athletes per competing nation, \(males\), and the number of the Summer Olympic Games, \(number\), is
\(males =67.5-1.27 \times number\)
Part A
The summary statistics for the variables number and males are shown in the table below.
The value of Pearson's correlation coefficient, \(r\), rounded to three decimal places, is closest to
- \(-0.569\)
- \(-0.394\)
- \(0.394\)
- \(0.569\)
Part B
At which Summer Olympic Games will the predicted average number of \(males\) be closest to 25.6 ?
- 31st
- 32nd
- 33rd
- 34th
Data Analysis, GEN2 2023 VCAA 3
The scatterplot below plots the average monthly ice cream consumption, in litres/person, against average monthly temperature, in °C. The data for the graph was recorded in the Northern Hemisphere.
When a least squares line is fitted to the scatterplot, the equation is found to be:
consumption = 0.1404 + 0.0024 × temperature
The coefficient of determination is 0.7212
- Draw the least squares line on the scatterplot graph above. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
- Determine the value of the correlation coefficient \(r\).
- Round your answer to three decimal places. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Describe the association between average monthly ice cream consumption and average monthly temperature in terms of strength, direction and form. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
\begin{array} {|l|c|}
\hline
\rule{0pt}{2.5ex} \textbf{strength} \rule[-1ex]{0pt}{0pt} & \quad \quad \quad \quad \quad \quad \quad \quad \\
\hline
\rule{0pt}{2.5ex} \textbf{direction} \rule[-1ex]{0pt}{0pt} & \\
\hline
\rule{0pt}{2.5ex} \textbf{form} \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array} - Referring to the equation of the least squares line, interpret the value of the intercept in terms of the variables consumption and temperature. (1 mark)
--- 3 WORK AREA LINES (style=lined) ---
- Use the equation of the least squares line to predict the average monthly ice cream consumption, in litres per person, when the monthly average temperature is –6°C. (1 mark)
--- 2 WORK AREA LINES (style=lined) ---
- Write down whether this prediction is an interpolation or an extrapolation. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
CORE, FUR2 2021 VCAA 4
The time series plot below shows that the winning time for both men and women in the 100 m freestyle swim in the Olympic Games has been decreasing during the period 1912 to 2016.
Least squares lines are used to model the trend for both men and women.
The least squares line for the men's winning time has been drawn on the time series plot above.
The equation of the least squares line for men is
winning time men = 356.9 – 0.1544 × year
The equation of the least squares line for women is
winning time women = 538.9 – 0.2430 × year
- Draw the least squares line for winning time women on the time series plot above. (1 mark)
- The difference between the women's predicted winning time and the men's predicted winning time can be calculated using the formula.
- difference = winning time women – winning time men
- Use the equation of the least squares lines and the formula above to calculate the difference predicted for the 2024 Olympic Games.
- Round your answer to one decimal place. (2 marks)
- The Olympic Games are held every four years. The next Olympic Games will be held in 2024, then 2028, 2032 and so on.
- In which Olympic year do the two least squares lines predict that the wining time for women will first be faster than the winning time for men in the 100 m freesytle? (2 marks)
CORE, FUR2 2021 VCAA 3
The time series plot below shows the winning time, in seconds, for the women's 100 m freestyle swim plotted against year, for each year that the Olympic Games were held during the period 1956 to 2016.
A least squares line has been fitted to the plot to model the decreasing trend in the winning time over this period.
The equation of the least squares line is
winning time = 357.1 – 0.1515 × year
The coefficient of determination is 0.8794
- Name the explanatory variable in this time series plot. (1 mark)
- Determine the value of the correlation coefficient (`r`).
- Round your answer to three decimal places. (1 mark)
- Write down the average decrease in winning time, in seconds per year, during the period 1956 to 2016. (1 mark)
- The predicted winning time for the women's 100 m freestyle in 2000 was 54.10 seconds.
- The actual winning time for the women's 100 m freestyle in 2000 was 53.83 seconds.
- Determine the residual value in seconds. (1 mark)
- The following equation can be used to predict the winning time for the women's 100 m freestyle in the future.
- winning time = 357.1 – 0.1515 × year
- i. Show that the predicted winning time for the women's 100 m freestyle in 2032 is 49.252 seconds. (1 mark)
- ii. What assumption is being made when this equation is used to predict the winning time for the women's 100 m freestyle in 2032? (1 mark)
CORE, FUR2 2020 VCAA 5
The scatterplot below shows body density, in kilograms per litre, plotted against waist measurement, in centimetres, for 250 men.
When a least squares line is fitted to the scatterplot, the equation of this line is
body density = 1.195 – 0.001512 × waist measurement
- Draw the graph of this least squares line on the scatterplot above. (1 mark)
(Answer on the scatterplot above.)
- Use the equation of this least squares line to predict the body density of a man whose waist measurement is 65 cm.
Round your answer to two decimal places. (1 mark)
- When using the equation of this least squares line to make the prediction in part b., are you extrapolating or interpolating? (1 mark)
- Interpret the slope of this least squares line in terms of a man’s body density and waist measurement. (1 mark)
- In this study, the body density of the man with a waist measurement of 122 cm was 0.995 kg/litre.
Show that, when this least squares line is fitted to the scatterplot, the residual, rounded to two decimal places, is –0.02 (1 mark)
- The coefficient of determination for this data is 0.6783
Write down the value of the correlation coefficient `r`.
Round your answer to three decimal places. (1 mark)
- The residual plot associated with fitting a least squares line to this data is shown below.
Does this residual plot support the assumption of linearity that was made when fitting this line to this data? Briefly explain your answer. (1 mark)
CORE, FUR2 2019 VCAA 5
The scatterplot below shows the atmospheric pressure, in hectopascals (hPa), at 3 pm (pressure 3 pm) plotted against the atmospheric pressure, in hectopascals, at 9 am (pressure 9 am) for 23 days in November 2017 at a particular weather station.
A least squares line has been fitted to the scatterplot as shown.
The equation of this line is
pressure 3 pm = 111.4 + 0.8894 × pressure 9 am
- Interpret the slope of this least squares line in terms of the atmospheric pressure at this weather station at 9 am and at 3 pm. (1 mark)
- Use the equation of the least squares line to predict the atmospheric pressure at 3 pm when the atmospheric pressure at 9 am is 1025 hPa.
Round your answer to the nearest whole number. (1 mark)
- Is the prediction made in part b. an example of extrapolation or interpolation? (1 mark)
- Determine the residual when the atmospheric pressure at 9 am is 1013 hPa.
Round your answer to the nearest whole number. (1 mark)
- The mean and the standard deviation of pressure 9 am and pressure 3 pm for these 23 days are shown in Table 4 below.
-
- Use the equation of the least squares line and the information in Table 4 to show that the correlation coefficient for this data, rounded to three decimal places, is `r` = 0.966 (1 mark)
- What percentage of the variation in pressure 3 pm is explained by the variation in pressure 9 am?
Round your answer to one decimal place. (1 mark)
- The residual plot associated with the least squares line is shown below.
-
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
What is this assumption? (1 mark)
- The residual plot above does not support this assumption.
Explain why. (1 mark)
- The residual plot above can be used to test one of the assumptions about the nature of the association between the atmospheric pressure at 3 pm and the atmospheric pressure at 9 am.
CORE, FUR2 2018 VCAA 3
Table 3 shows the yearly average traffic congestion levels in two cities, Melbourne and Sydney, during the period 2008 to 2016. Also shown is a time series plot of the same data.
The time series plot for Melbourne is incomplete.
- Use the data in Table 3 to complete the time series plot above for Melbourne. (1 mark)
(Answer on the time series plot above.)
- A least squares line is used to model the trend in the time series plot for Sydney. The equation is
`text(congestion level = −2280 + 1.15 × year)`
- Draw this least squares line on the time series plot. (1 mark)
(Answer on the time series plot above.)
- Use the equation of the least squares line to determine the average rate of increase in percentage congestion level for the period 2008 to 2016 in Sydney.
Write your answer in the box provided below. (1 mark)
- Draw this least squares line on the time series plot. (1 mark)
|
% per year |
-
- Use the least squares line to predict when the percentage congestion level in Sydney will be 43%. (1 mark)
The yearly average traffic congestion level data for Melbourne is repeated in Table 4 below.
- When a least squares line is used to model the trend in the data for Melbourne, the intercept of this line is approximately –1514.75556
Round this value to four significant figures. (1 mark)
- Use the data in Table 4 to determine the equation of the least squares line that can be used to model the trend in the data for Melbourne. The variable year is the explanatory variable.
Write the values of the intercept and the slope of this least squares line in the appropriate boxes provided below.
Round both values to four significant figures. (2 marks)
congestion level = |
|
+ |
|
× year |
- Since 2008, the equations of the least squares lines for Sydney and Melbourne have predicted that future traffic congestion levels in Sydney will always exceed future traffic congestion levels in Melbourne.
Explain why, quoting the values of appropriate statistics. (2 marks)
CORE, FUR2 2017 VCAA 4
The eggs laid by the female moths hatch and become caterpillars.
The following time series plot shows the total area, in hectares, of forest eaten by the caterpillars in a rural area during the period 1900 to 1980.
The data used to generate this plot is also given.
The association between area of forest eaten by the caterpillars and year is non-linear.
A log10 transformation can be applied to the variable area to linearise the data.
- When the equation of the least squares line that can be used to predict log10 (area) from year is determined, the slope of this line is approximately 0.0085385
Round this value to three significant figures. (1 mark)
- Perform the log10 transformation to the variable area and determine the equation of the least squares line that can be used to predict log10 (area) from year.
Write the values of the intercept and slope of this least squares line in the appropriate boxes provided below.
Round your answers to three significant figures. (2 marks)
- The least squares line predicts that the log10 (area) of forest eaten by the caterpillars by the year 2020 will be approximately 2.85
Using this value of 2.85, calculate the expected area of forest that will be eaten by the caterpillars by the year 2020.
Round your answer to the nearest hectare. (1 mark)
- Give a reason why this prediction may have limited reliability. (1 mark)
- The least squares line predicts that the log10 (area) of forest eaten by the caterpillars by the year 2020 will be approximately 2.85
CORE, FUR2 2010 VCAA 2
In the scatterplot below, average annual female income, in dollars, is plotted against average annual male income, in dollars, for 16 countries. A least squares regression line is fitted to the data.
The equation of the least squares regression line for predicting female income from male income is
female income = 13 000 + 0.35 × male income
- What is the explanatory variable? (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- Complete the following statement by filling in the missing information.
From the least squares regression line equation it can be concluded that, for these countries, on average, female income increases by `text($________)` for each $1000 increase in male income. (1 mark)
--- 0 WORK AREA LINES (style=lined) ---
-
- Use the least squares regression line equation to predict the average annual female income (in dollars) in a country where the average annual male income is $15 000. (1 mark)
--- 1 WORK AREA LINES (style=lined) ---
- The prediction made in part c.i. is not likely to be reliable.
Explain why. (1 mark)
--- 2 WORK AREA LINES (style=lined) ---
- Use the least squares regression line equation to predict the average annual female income (in dollars) in a country where the average annual male income is $15 000. (1 mark)
CORE, FUR2 2015 VCAA 5
The time series plot below displays the life expectancy, in years, of people living in Australia and the United Kingdom (UK) for each year from 1920 to 2010.
- By how much did life expectancy in Australia increase during the period 1920 to 2010?
Write your answer correct to the nearest year. (1 mark)
- In 1975, the life expectancies in Australia and the UK were very similar.
From 1975, the gap between the life expectancies in the two countries increased, with people in Australia having a longer life expectancy than people in the UK.
To investigate the difference in life expectancies, least squares regression lines were fitted to the data for both Australia and the UK for the period 1975 to 2010.
The results are shown below.
The equations of the least squares regression lines are as follows.
`text(Australia:)\ \ \ ` | `text(life expectancy) = – 451.7 + 0.2657 xx text(year)` |
`text(UK:)` | `text(life expectancy) = – 350.4 + 0.2143 xx text(year)` |
- Use these equations to predict the difference between the life expectancies of Australia and the UK in 2030.
Give your answer correct to the nearest year. (2 marks)
- Explain why this prediction may be of limited reliability. (1 mark)