SmarterEd

Aussie Maths & Science Teachers: Save your time with SmarterEd

  • Login
  • Get Help
  • About

Data Analysis, GEN2 2024 VCAA 3

The Olympic gold medal-winning height for the women's high jump, \(\textit{Wgold}\), is often lower than the best height achieved in other international women's high jump competitions in that same year.

The table below lists the Olympic year, \(\textit{year}\), the gold medal-winning height, \(\textit{Wgold}\), in metres, and the best height achieved in all international women's high jump competitions in that same year, \(\textit{Wbest}\), in metres, for each Olympic year from 1972 to 2020.

A scatterplot of \(\textit{Wbest}\) versus \(\textit{Wgold}\) for this data is also provided.

When a least squares line is fitted to the scatterplot, the equation is found to be:

\(Wbest =0.300+0.860 \times Wgold\)

The correlation coefficient is 0.9318

  1. Name the response variable in this equation.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Draw the least squares line on the scatterplot above.  (1 mark)

    --- 0 WORK AREA LINES (style=lined) ---

  3. Determine the value of the coefficient of determination as a percentage.  (1 mark)
  4. Round your answer to one decimal place.

    --- 1 WORK AREA LINES (style=lined) ---

  5. Describe the association between \(\textit{Wbest}\) and \(\textit{Wgold}\) in terms of strength and direction.  (1 mark)

    --- 0 WORK AREA LINES (style=lined) ---

\begin{array}{|l|l|}
\hline
\rule{0pt}{2.5ex}\text { strength } \rule[-1ex]{0pt}{0pt} & \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\\
\hline
\rule{0pt}{2.5ex}\text { direction } \rule[-1ex]{0pt}{0pt} & \\
\hline
\end{array}

  1. Referring to the equation of the least squares line, interpret the value of the slope in terms of the variables \(\textit{Wbest}\) and \(\textit{Wgold}\).  (1 mark)

    --- 3 WORK AREA LINES (style=lined) ---

  2. In 1984, the \(\textit{Wbest}\) value was 2.07 m for a \(\textit{Wgold}\) value of 2.02 m .
  3. Show that when this least squares line is fitted to the scatterplot, the residual value for this point is 0.0328.  (2 marks)

    --- 4 WORK AREA LINES (style=lined) ---

  4. The residual plot obtained when the least squares line was fitted to the data is shown below. The residual value from part f is missing from the residual plot.
     

    1. Complete the residual plot by adding the residual value from part f, drawn as a cross ( X ), to the residual plot above.   (1 mark)

      --- 0 WORK AREA LINES (style=lined) ---

    2. In part b, a least squares line was fitted to the scatterplot. Does the residual plot from part g justify this? Briefly explain your answer.  (1 mark)

      --- 3 WORK AREA LINES (style=lined) ---

  1. In 1964, the gold medal-winning height, \(\textit{Wgold}\), was 1.90m . When the least squares line is used to predict \(\textit{Wbest}\), it is found to be 1.934 m .
  2. Explain why this prediction is not likely to be reliable.  (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only

a.    \(Wbest\)

b.    

c.    \(86.8\%\)

d.    \(\text{Strong, positive}\)

e.    \(Wbest\ \text{will increase, on average, by 0.86 metres for every metre of increase in}\ Wgold.\)

f.      \(Wbest\) \(=0.300 +0.86\times 2.02\)
    \(=2.0372\)

\(\therefore\ \text{Residual}\ =2.07-2.0372=0.0328\)

g.i.

g.ii.  \(\text{Yes, it is justified as there is no clear pattern, linear or otherwise.}\)

h.    \(\text{This prediction is outside the data range (1972 – 2020 → extrapolation)}\)

\(\text{and therefore cannot be relied upon.}\)

Show Worked Solution

a.    \(Wbest\)

b.    \(\text{Using points:}\ (1.90, 1.934)\ \text{and}\ (2.00, 2.02)\)
 

Mean mark (b) 51%.

c.    \(r=0.9318\ \ \Rightarrow\ \ r^2=0.9318^2=0.8682\dots\)

\(\therefore\ \text{Coefficient of determination} \approx 86.8\%\)
 

d.    \(\text{Strong, positive}\)
 

e.    \(Wbest\ \text{will increase, on average, by 0.86 metres for every metre of increase in}\ Wgold.\)
 

f.      \(Wbest\) \(=0.300 +0.86\times 2.02\)
    \(=2.0372\)

 
\(\therefore\ \text{Residual}\ =2.07-2.0372=0.0328\)

♦ Mean mark (f) 48%.

g.i.

g.ii.  \(\text{Yes, it is justified as there is no clear pattern, linear or otherwise.}\)

♦ Mean mark (g)(i) 47%.
♦ Mean mark (g)(ii) 40%.

h.    \(\text{This prediction is outside the data range (1972–2020 → extrapolation)}\)

\(\text{and therefore cannot be relied upon.}\)

♦ Mean mark (h) 50%.

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, Band 5, smc-265-10-r / r^2 and Association, smc-265-30-LSRL formula, smc-265-40-Interpret Gradient, smc-265-50-Residuals, smc-265-60-Extrapolation / Interpolation, smc-265-75-Explanatory / Response

Data Analysis, GEN2 2024 VCAA 1

Table 1 lists the Olympic year, \(\textit{year}\), and the gold medal-winning height for the men's high jump, \(\textit{Mgold}\), in metres, for each Olympic Games held from 1928 to 2020. No Olympic Games were held in 1940 or 1944, and the 2020 Olympic Games were held in 2021.

Table 1

\begin{array}{|c|c|}
\hline \quad \textit{year} \quad & \textit{Mgold}\,\text{(m)} \\
\hline 1928 & 1.94 \\
\hline 1932 & 1.97 \\
\hline 1936 & 2.03 \\
\hline 1948 & 1.98 \\
\hline 1952 & 2.04 \\
\hline 1956 & 2.12 \\
\hline 1960 & 2.16 \\
\hline 1964 & 2.18 \\
\hline 1968 & 2.24 \\
\hline 1972 & 2.23 \\
\hline 1976 & 2.25 \\
\hline 1980 & 2.36 \\
\hline 1984 & 2.35 \\
\hline 1988 & 2.38 \\
\hline 1992 & 2.34 \\
\hline 1996 & 2.39 \\
\hline 2000 & 2.35 \\
\hline 2004 & 2.36 \\
\hline 2008 & 2.36 \\
\hline 2012 & 2.33 \\
\hline 2016 & 2.38 \\
\hline 2020 & 2.37 \\
\hline
\end{array}

  1. For the data in Table 1, determine:
  2.  i. the maximum \(\textit{Mgold}\) in metres   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  3. ii. the percentage of \(\textit{Mgold}\) values greater than 2.25 m.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  4. The mean of these \(\textit{Mgold}\) values is 2.23 m, and the standard deviation is 0.15 m.
  5. Calculate the standardised \(z\)-score for the 2000 \(\textit{Mgold}\) of 2.35 m.   (1 mark)

    --- 3 WORK AREA LINES (style=lined) ---

  6. Construct a boxplot for the \(\textit{Mgold}\) data in Table 1 on the grid below.   (2 marks)

    --- 0 WORK AREA LINES (style=lined) ---

     

  1. A least squares line can also be used to model the association between \(\textit{Mgold}\) and \(\textit{year}\).
  2. Using the data from Table 1, determine the equation of the least squares line for this data set.
  3. Use the template below to write your answer.
  4. Round the values of the intercept and slope to three significant figures.   (2 marks)

    --- 0 WORK AREA LINES (style=lined) ---

     

  1. The coefficient of determination is 0.857
  2. Interpret the coefficient of determination in terms of \(\textit{Mgold}\) and \(\textit{year}\).   (1 mark)

    --- 4 WORK AREA LINES (style=lined) ---

Show Answers Only

a.i.   \(2.39\)

a.ii.  \(50\%\)

b.    \(0.8\)

c.     

d.   
    

e.    \(\text{A coefficient of determination of 85.7% shows the variation in}\)

\(\text{the}\ Mgold\ \text{that is explained by the variation in the }year.\)

Show Worked Solution

a.i.   \(2.39\)

a.ii.  \(\dfrac{11}{22}=50\%\)

b.     \(z\) \(=\dfrac{x-\overline x}{s_x}\)
    \(=\dfrac{2.35-2.23}{0.15}\)
    \(=0.8\)

  
c.   
\(Q_2=\dfrac{2.33+2.25}{2}=2.29\)

\(Q_1=2.12, \ Q_3=2.36\)

\(\text{Min}\ =1.94, \ \text{Max}\ =2.39\)
  

d.   \(\text{Using CAS:}\)


  
 

Mean mark (d) 52%.
Mean mark (e) 52%.

e.    \(\text{A coefficient of determination of 85.7% shows the variation in}\)

\(\text{the}\ Mgold\ \text{that is explained by the variation in the }year.\)

Filed Under: Correlation and Regression, Graphs - Stem/Leaf and Boxplots, Normal Distribution Tagged With: Band 2, Band 3, Band 4, smc-265-20-Find LSRL Equation/Gradient, smc-265-75-Explanatory / Response, smc-265-80-Rounding (Sig Fig), smc-600-10-Single z-score, smc-643-10-Single Box-Plots

Data Analysis, GEN2 2023 VCAA 1

Data was collected to investigate the use of electronic images to automate the sizing of oysters for sale. The variables in this study were:

    • ID: identity number of the oyster
    • weight: weight of the oyster in grams (g)
    • volume: volume of the oyster in cubic centimetres (cm³)
    • image size: oyster size determined from its electronic image (in megapixels)
    • size: oyster size when offered for sale: small, medium or large

The data collected for a sample of 15 oysters is displayed in the table.
 

  1. Write down the number of categorical variables in the table.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Determine, in grams:
    1. the mean weight of all the oysters in this sample.   (1 mark)

      --- 1 WORK AREA LINES (style=lined) ---

    2. the median weight of the large oysters in this sample.   (1 mark)

      --- 1 WORK AREA LINES (style=lined) ---

  3. When a least squares line is used to model the association between oyster weight and volume, the equation is: 

    1. \(\textit{volume} = 0.780 + 0.953 \times \textit{weight} \)
    1. Name the response variable in this equation.   (1 mark)

      --- 1 WORK AREA LINES (style=lined) ---

    2. Complete the following sentence by filling in the blank space provided.   (1 mark)

      --- 0 WORK AREA LINES (style=lined) ---

      This equation predicts that, on average, each 10 g increase in the weight of an oyster is associated with a ________________ cm³ increase in its volume.
  1. A least squares line can also be used to model the association between an oyster's volume, in cm³, and its electronic image size, in megapixels. In this model, image size is the explanatory variable.
  2. Using data from the table, determine the equation of this least squares line. Use the template below to write your answer. Round the values of the intercept and slope to four significant figures.   (2 marks)

    --- 4 WORK AREA LINES (style=lined) ---

  3. The number of megapixels needed to construct an accurate electronic image of an oyster is approximately normally distributed.
  4. Measurements made on recently harvested oysters showed that:
    • 97.5% of the electronic images contain less than 4.6 megapixels
    • 84% of the electronic images contain more than 4.3 megapixels.
  1. Use the 68-95-99.7% rule to determine, in megapixels, the mean and standard deviation of this normal distribution.   (2 marks)

    --- 4 WORK AREA LINES (style=lined) ---

Show Answers Only

a.    \(\text{Categorical variables = 2 (ID and size)}\)

b.i.  \(\text{Mean weight}\ = \dfrac{\text{sum of oyster weights}}{15} = \dfrac{171.3}{15} = 11.42 \)

b.ii.  \(\text{Median}\ = 11.4 \)

c.i.   \(\text{Volume}\)

c.ii.  \(\text{Increase}\ = 0.953 \times 10 = 9.53\ \text{cm}^{3} \)

d.    \(\text{Volume}\ = 0.002857 + 2.571 \times \text{image size} \)

e.    \(s_x = 0.1 \)

\(\bar x = 4.4\) 

Show Worked Solution

a.    \(\text{Categorical variables = 2 (ID and size)}\)
 

b.i.  \(\text{Mean weight}\ = \dfrac{\text{sum of oyster weights}}{15} = \dfrac{171.3}{15} = 11.42 \)
 

b.ii.  \(\text{15 data points}\ \ \Rightarrow \ \ \text{Median = 8th data point (in order)}\)

 \(\text{Median}\ = 11.4 \)
 

c.i.   \(\text{Volume}\)
 

c.ii.  \(\text{Increase}\ = 0.953 \times 10 = 9.53\ \text{cm}^{3} \)
 

d.    \(\text{Input the image size column values}\ (x)\ \text{and volume} \)

\(\text{values}\ (y)\ \text{into the calculator:}\)

\(\textit{Volume}\ = 0.002857 + 2.571 \times \textit{image size} \)
 

e.    \(z\text{-score (4.6)}\ = 2\ \ \Rightarrow \bar x + 2 \times s_x = 4.6\ …\ (1)\)

\(z\text{-score (4.3)}\ = -1\ \ \Rightarrow \bar x-s_x = 4.3\ …\ (2)\)

\( (1)-(2) \)

\(3 s_x = 0.3 \ \ \Rightarrow\ \ s_x = 0.1 \)

\(\bar x = 4.4\) 

Filed Under: Correlation and Regression, Normal Distribution, Summary Statistics Tagged With: Band 3, Band 4, smc-265-20-Find LSRL Equation/Gradient, smc-265-30-LSRL formula, smc-265-75-Explanatory / Response, smc-468-20-Mean, smc-468-40-Median Mode and Range, smc-600-20-z-score Intervals

Data Analysis, GEN1 2023 VCAA 10 MC

A study of Year 10 students shows that there is a negative association between the scores of topic tests and the time spent on social media. The coefficient of determination is 0.72

From this information it can be concluded that

  1. a decreased time spent on social media is associated with an increased topic test score.
  2. less time spent on social media causes an increase in topic test performance.
  3. an increased time spent on social media is associated with an increased topic test score.
  4. too much time spent on social media causes a reduction in topic test performance.
  5. a decreased time spent on social media is associated with a decreased topic test score.
Show Answers Only

\(A\)

Show Worked Solution

\(\text{Negative association:}\)

  • \(\text{does not mean causality (eliminate B and D).}\)
  • \(\text{Option A: if time spent on social media ↓, topic test scores ↑}\ \checkmark \)
  • \(\text{Option C: if time spent on social media ↑, topic test scores ↓}\ \cross \)
  • \(\text{Option E: if time spent on social media ↓, topic test scores ↓}\ \cross \)

\(\Rightarrow A\)

Filed Under: Correlation and Regression Tagged With: Band 4, smc-265-10-r / r^2 and Association, smc-265-75-Explanatory / Response

CORE, FUR2 2021 VCAA 3

The time series plot below shows the winning time, in seconds, for the women's 100 m freestyle swim plotted against year, for each year that the Olympic Games were held during the period 1956 to 2016.

A least squares line has been fitted to the plot to model the decreasing trend in the winning time over this period.
 

The equation of the least squares line is

winning time = 357.1 – 0.1515 × year

The coefficient of determination is 0.8794

  1. Name the explanatory variable in this time series plot.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Determine the value of the correlation coefficient (`r`).
  3. Round your answer to three decimal places.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  4. Write down the average decrease in winning time, in seconds per year, during the period 1956 to 2016.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  5. The predicted winning time for the women's 100 m freestyle in 2000 was 54.10 seconds.
  6. The actual winning time for the women's 100 m freestyle in 2000 was 53.83 seconds.
  7. Determine the residual value in seconds.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  8. The following equation can be used to predict the winning time for the women's 100 m freestyle in the future.
  9.      winning time =  357.1 – 0.1515 × year
  10.  i. Show that the predicted winning time for the women's 100 m freestyle in 2032 is 49.252 seconds.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  11. ii. What assumption is being made when this equation is used to predict the winning time for the women's 100 m freestyle in 2032?   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `text{year}`
  2. `- 0.938`
  3. `0.1515 \ text{seconds}`
  4. `-0.27`
  5.  i. `49.252 \ text{seconds}`
  6. ii. `text{The same trend continues when the graph is extended beyond 2016.}`
Show Worked Solution

a.      `text{year}`
 

b. `r^2` `= 0.8794 \ text{(given)}`
  `r` `= ± sqrt{0.8794}`
    `= ± 0.938 \ text{(to 3 d.p.)}`

 
`text{By inspection of graph, correlation is negative}`

`:. \ r = -0.938`
 

c.    `text{Average decrease in winning time = 0.1515 seconds}`

`text{(this is given by the slope of the line.)}`
 

d.    `text{Residual Value}` `= text{actual}-text{predicted}`
    `= 53.83-54.10`
    `= -0.27`

 

e.i.      `text{winning time (2032)}` `= 357.1-0.1515 xx 2032`
      `=49.252 \ text{seconds}`

 

e.ii.  `text{The assumption is that the graph is accurate when it is extended}`

  `text{beyond 2016 (i.e decreasing trend continues).}`

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, Band 5, smc-265-10-r / r^2 and Association, smc-265-30-LSRL formula, smc-265-40-Interpret Gradient, smc-265-50-Residuals, smc-265-60-Extrapolation / Interpolation, smc-265-75-Explanatory / Response

CORE, FUR1 2021 VCAA 11 MC

The table below shows the weight, in kilograms, and the height, in centimetres, of 10 adults.

A least squares line is fitted to the data

The least squares line enables an adult's weight to be predicted from their height.

The number of times that the predicted value of an adult's weight is greater than the actual value of their weight is

  1. 3
  2. 4
  3. 5
  4. 6
  5. 7
Show Answers Only

`D`

Show Worked Solution

`text(S)text(ince weight is being predicted from height,)`

♦ Mean mark 40%.

`text{Weight} -> text{response} \ (y text{-value})`

`text{Height} -> text{explanatory} \ (x text{-value})`

`text{Using CAS, the graph and scatterplot show that 6 points lie below the regression line}`

`text{(where predicted weight > actual)}`

`=> D`

Filed Under: Correlation and Regression Tagged With: Band 5, smc-265-50-Residuals, smc-265-75-Explanatory / Response

CORE, FUR2 2020 VCAA 4

The age, in years, body density, in kilograms per litre, and weight, in kilograms, of a sample of 12 men aged 23 to 25 years are shown in the table below.
 

          Age       
(years)

        Body density        
(kg/litre)

        Weight        
(kg)

  23 1.07 70.1
  23 1.07 90.4
  23 1.08 73.2
  23 1.08 85.0
  24 1.03 84.3
  24 1.05 95.6
  24 1.07 71.7
  24 1.06 95.0
  25 1.07 80.2
  25 1.09 87.4
  25 1.02 94.9
  25 1.09 65.3
     
  1. For these 12 men, determine
  2.  i. their median age, in years.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  3. ii. the mean of their body density, in kilograms per litre.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  4. A least squares line is to be fitted to the data with the aim of predicting body density from weight.
  5.  i. Name the explanatory variable for this least squares line.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  6. ii. Determine the slope of this least squares line.
  7.     Round your answer to three significant figures.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  8. What percentage of the variation in body density can be explained by the variation in weight?
  9. Round your answer to the nearest percentage.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1.  i. `24`
  2. ii. `1.065\ text(kg/litre)`
  3. i. `text(Weight)`
  4. ii. `text(Slope) = -0.00112\ text{(by CAS)}`
  5. `29 text(%)`
Show Worked Solution
a.i.   `n = 12`  
  `text(Median)` `= (text{6th + 7th})/2`
    `= (24 + 24)/2`
    `= 24`

 

a.ii.   `text(Mean)` `= (∑\ text{body density})/12`
    `= 1.065\ text(kg/litre)`

 

b.i.   `text(Weight)`

♦ Mean mark b.ii. 29%.
MARKER’S COMMENT: Most students did not round correctly.

b.ii.   `text(Slope) = -0.00112\ text{(by CAS)}`

 

c.   `r` `= -0.53847\ text{(by CAS)}`
  `r^2` `= 0.289…`

 
`:. 29 text(%)`

Filed Under: Correlation and Regression Tagged With: Band 2, Band 3, Band 4, Band 5, smc-265-10-r / r^2 and Association, smc-265-20-Find LSRL Equation/Gradient, smc-265-75-Explanatory / Response

Data Analysis, GEN2 2019 NHT 3

The life span, in years, and gestation period, in days, for 19 types of mammals are displayed in the table below.
 

  1. A least squares line that enables life span to be predicted from gestation period is fitted to this data.
  2. Name the explanatory variable in the equation of this least squares line.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  3. Determine the equation of the least squares line in terms of the variables life span and gestation period.
  4. Round the numbers representing the intercept and slope to three significant figures.   (2 marks)

    --- 2 WORK AREA LINES (style=lined) ---

  5. Write the value of the correlation rounded to three decimal places.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `text(gestation period)`
  2. `text(life span) = 7.58 + 0.101 xx \ text(gestation period)`
  3. `0.904`
Show Worked Solution

a.    `text(gestation period)`
 

b.    `text(Input data points into CAS:)`

`text(life span) = 7.58 + 0.101 xx \ text(gestation period)`
 

c.    `r = 0.904 \ text{(by CAS)}`

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, page-break-before-question, smc-265-10-r / r^2 and Association, smc-265-20-Find LSRL Equation/Gradient, smc-265-75-Explanatory / Response, smc-265-80-Rounding (Sig Fig)

CORE, FUR2 2019 VCAA 4

The relative humidity (%) at 9 am and 3 pm on 14 days in November 2017 is shown in Table 3 below.

A least squares line is to be fitted to the data with the aim of predicting the relative humidity at 3 pm (humidity 3 pm) from the relative humidity at 9 am (humidity 9 am).

  1. Name the explanatory variable.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Determine the values of the intercept and the slope of this least squares line.
  3. Round both values to three significant figures and write them in the appropriate boxes provided.   (1 mark)

    --- 0 WORK AREA LINES (style=lined) ---

humidity 3 pm = 
 
  +  
 
  × humidity 9 am  (1 mark)
  1. Determine the value of the correlation coefficient for this data set.
  2. Round your answer to three decimal places.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `text(humidity 9 am)`
  2. `text(humidity 3 pm) = -1.26 + 0.765 xx text(humidity 9 am)`
  3. `r = 0.871`
Show Worked Solution

a.  `text(humidity 9 am)`
 

b.  `text(Input all data points into CAS:)`

`text(humidity 3 pm) = -1.26 + 0.765 xx text(humidity 9 am)`
 

c.    `r = 0.871\ \ text{(3 d.p.)}`

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, smc-265-10-r / r^2 and Association, smc-265-20-Find LSRL Equation/Gradient, smc-265-75-Explanatory / Response, smc-265-80-Rounding (Sig Fig)

CORE, FUR2 2018 VCAA 2

The congestion level in a city can be recorded as the percentage increase in travel time due to traffic congestion in peak periods (compared to non-peak periods).

This is called the percentage congestion level.

The percentage congestion levels for the morning and evening peak periods for 19 large cities are plotted on the scatterplot below.
 

  1. Determine the median percentage congestion level for the morning peak period and the evening peak period.

     

    Write your answers in the appropriate boxes provided below.   (2 marks)

    --- 0 WORK AREA LINES (style=lined) ---

Median percentage congestion level for morning peak period
%
Median percentage congestion level for evening peak period
%

A least squares line is to be fitted to the data with the aim of predicting evening congestion level from morning congestion level.

The equation of this line is.

evening congestion level = 8.48 + 0.922 × morning congestion level

  1. Name the response variable in this equation.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Use the equation of the least squares line to predict the evening congestion level when the morning congestion level is 60%.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  3. Determine the residual value when the equation of the least squares line is used to predict the evening congestion level when the morning congestion level is 47%.
  4. Round your answer to one decimal place?   (2 marks)

    --- 4 WORK AREA LINES (style=lined) ---

  5. The value of the correlation coefficient `r` is 0.92
  6. What percentage of the variation in the evening congestion level can be explained by the variation in the morning congestion level?
  7. Round your answer to the nearest whole number.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `52 text(%);\ 56 text(%)`
  2. `text(evening congestion level)`
  3. `63.8 text(%)`
  4. `-1.8 text{% (to 1 d.p.)}`
  5. `85 text(%)`
Show Worked Solution

a.   `19\ text(data points of morning peak)`

Mean mark 56%.
COMMENT: This question was surprisingly poorly answered. Review carefully!

`text(Median is 10th data point moving left to right.)`

`:.\ text{Median (morning peak) = 52%}`

 

`text(Median of evening peak is 10th data point)`

`text(moving bottom to top.)`

`:.\ text{Median (afternoon peak) = 56%}`
 

b.   `text(Response variable is evening congestion level.)`
 

c.    `text(evening congestion level)` `= 8.48 + 0.922 xx 60`
    `= 63.8 text(%)`

 
d.
 `text(When morning level = 47%, Actual = 50%)`

Mean mark part (d) 53%.
COMMENT: Many students had problems at a number of stages in this part.

`text(Residual)` `=\ text(Actual evening congestion − predicted)`
  `= 50 – (8.48 + 0.922 xx 47)`
  `= -1.814`
  `= -1.8 text{% (to 1 d.p.)}`

 

e.    `r` `= 0.92`
  `r^2` `= 0.8464`
    `= 85 text{% (nearest whole)}`

 
`:. 85 text(% of the variations is explained.)`

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, smc-265-10-r / r^2 and Association, smc-265-50-Residuals, smc-265-75-Explanatory / Response

CORE, FUR1 2018 VCAA 14 MC

A least squares line is fitted to a set of bivariate data.

Another least squares line is fitted with response and explanatory variables reversed.

Which one of the following statistics will not change in value?

  1. the residual values
  2. the predicted values
  3. the correlation coefficient `r`
  4. the slope of the least squares line
  5. the intercept of the least squares line
Show Answers Only

`C`

Show Worked Solution

`text(If the variables are reversed, the equation changes.)`

♦ Mean mark 42%.

`:.\ text(Differences will occur in:)`

`text(- slope)`

`text(- intercept)`

`text(- predicted and residual values)`
 

`text(The correlation co-efficient will remain unchanged however,)`

`text(as the scattering of the points around the line of best fit is)`

`text{the same (i.e. scattering of}\ x\ text(values relative to)\ y\ text(values)`

`text(is the same as)\ y\ text(values relative to)\ x).`

`=> C`

Filed Under: Correlation and Regression Tagged With: Band 5, smc-265-10-r / r^2 and Association, smc-265-40-Interpret Gradient, smc-265-50-Residuals, smc-265-75-Explanatory / Response

CORE, FUR2 2008 VCAA 4

The arm spans (in cm) and heights (in cm) for a group of 13 boys have been measured. The results are displayed in the table below.
 

CORE, FUR2 2008 VCAA 4 

The aim is to find a linear equation that allows arm span to be predicted from height.

  1. What will be the explanatory variable in the equation?   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Assuming a linear association, determine the equation of the least squares regression line that enables arm span to be predicted from height. Write this equation in terms of the variables arm span and height. Give the coefficients correct to two decimal places.   (2 marks)

    --- 2 WORK AREA LINES (style=lined) ---

  3. Using the equation that you have determined in part b., interpret the slope of the least squares regression line in terms of the variables height and arm span.   (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `text(Height)`
  2. `text(Arm span)\ = 1.09 xx text(height) – 15.63`
  3. `text(On average, arm span increases by 1.09 cm)`

     

    `text(for each 1 cm increase in height.)`

Show Worked Solution

a.   `text(Height)`

♦ Mean mark sub 50% (exact data not available).
MARKER’S COMMENT: Many students did not understand the term co-efficients as it applies to the regression equation.

 

b.   `text(By calculator,)`

`text(Arm span)\ = 1.09 xx text(height) – 15.63`

 

c.   `text(On average, arm span increases by 1.09 cm)`

`text(for each 1 cm increase in height.)`

Filed Under: Correlation and Regression Tagged With: Band 4, Band 5, smc-265-20-Find LSRL Equation/Gradient, smc-265-40-Interpret Gradient, smc-265-75-Explanatory / Response

CORE, FUR2 2010 VCAA 2

In the scatterplot below, average annual female income, in dollars, is plotted against average annual male income, in dollars, for 16 countries. A least squares regression line is fitted to the data.
 


 

The equation of the least squares regression line for predicting female income from male income is

female income = 13 000 + 0.35 × male income

  1. What is the explanatory variable?  (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Complete the following statement by filling in the missing information.

     

    From the least squares regression line equation it can be concluded that, for these countries, on average, female income increases by `text($________)` for each $1000 increase in male income.  (1 mark)

    --- 0 WORK AREA LINES (style=lined) ---

    1. Use the least squares regression line equation to predict the average annual female income (in dollars) in a country where the average annual male income is $15 000.  (1 mark)

      --- 1 WORK AREA LINES (style=lined) ---

    2. The prediction made in part c.i. is not likely to be reliable.

       

      Explain why.  (1 mark)

      --- 2 WORK AREA LINES (style=lined) ---


Show Answers Only

  1. `text(Male income)`
  2. `$350`
    1. `$18\ 250`
    2. `text(The model established by the regression)`
      `text(equation cannot be relied upon outside the)`
      `text(range of the given data set.)`
  3.  

Show Worked Solution

a.   `text(Male income)`
 

b.   `text(Increase in female income)`

`= 0.35 xx 1000`

`= $350`
 

c.i.   `text(Average annual female income)`

`= 13\ 000 + 0.35 xx 15\ 000`

`= $18\ 250`

♦♦ This part was poorly answered (exact data unavailable).
MARKER’S COMMENT: Many students offered “real world” explanations which did not gain a mark here.

 
c.ii.
   `text(The model established by the regression)`

   `text(equation cannot be relied upon outside the)`

   `text(range of the given data set.)`

Filed Under: Correlation and Regression Tagged With: Band 3, Band 4, Band 5, smc-265-40-Interpret Gradient, smc-265-60-Extrapolation / Interpolation, smc-265-75-Explanatory / Response

CORE, FUR2 2014 VCAA 2

The scatterplot below shows the population and area (in square kilometres) of a sample of inner suburbs of a large city.
 

Core, FUR2 2015 VCAA 2

The equation of the least squares regression line for the data in the scatterplot is

population = 5330 + 2680 × area

  1. Write down the response variable.   (1 mark)

    --- 1 WORK AREA LINES (style=lined) ---

  2. Draw the least squares regression line on the scatterplot above.   (1 mark)

    --- 0 WORK AREA LINES (style=lined) ---

  3. Interpret the slope of this least squares regression line in terms of the variables area and population.  (2 marks)

    --- 4 WORK AREA LINES (style=lined) ---

  4. Wiston is an inner suburb. It has an area of 4 km² and a population of 6690.
  5. The correlation coefficient, `r`, is equal to 0.668
  6.  i. Calculate the residual when the least squares regression line is used to predict the population of Wiston from its area.  (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

  7. ii. What percentage of the variation in the population of the suburbs is explained by the variaton in area.
  8.     Write your answer, correct to one decimal place.  (1 mark)

    --- 2 WORK AREA LINES (style=lined) ---

Show Answers Only
  1. `text(Population.)`
  2.  
          Core, FUR2 2015 VCAA 2 Answer
  3. `text(Population increases by 2680 people, on average,)`
    `text(for each additional 1 km² in area.)`
    1. ` −9360`
    2. `text(44.6%)`
Show Worked Solution

a.   `text(Population.)`
 

♦ Mean mark 36%.
MARKER’S COMMENT: Use the equation to draw the line and use points at the extremities.
b.   

Core, FUR2 2015 VCAA 2 Answer

 

c.   `text(Population increases by 2680 people, on average,)`

♦ Mean mark 41% (part (iii)).

`text(for each additional 1 km² in area.)`
 

d.i.  `text(Predicted population) = 5330 + 2680 xx 4= 16\ 050`

`:.\ text(Residual)\ = 6690-16\ 050= -9360`
 

♦ Part (iv) in total had a mean mark 42%.
d.ii.    `r` `= 0.668^2`
  `r^2` `= 0.4462…`
    `= 44.6 text{%  (to 1 d.p.)}`

 

`:.\ text(44.6% of the variation in the population is explained)`

`text(by variation in the area.)`

Filed Under: Correlation and Regression Tagged With: Band 4, Band 5, smc-265-10-r / r^2 and Association, smc-265-40-Interpret Gradient, smc-265-75-Explanatory / Response

Copyright © 2014–2025 SmarterEd.com.au · Log in