Problems
- (10 points) Suppose you have a dataset \( \{ \mathbf{x}\} = \{ x_1, x_2, x_3, x_4 \}\) consisting of 4 items. You know that \( x_1=25 \) and \( x_2=-15 \) and that after standardization \( \hat{x}_1= 0 \) and \( \hat{x}_2= -1 \).
- Find \( \text{mean}\{ \mathbf{x}\} \) and \( \text{std}\{ \mathbf{x}\} \)
- Find \( x_3 \) and \( x_4 \) given that \( x_3 \leq x_4 \)
- (10 points) Textbook problem 2.1
- (10 points) Textbook problem 2.2
- (10 points) Textbook problem 2.8 (data). Note that US state abbreviations were not standardized until 1963. This data is from 1960, so NE=Nevada and NB=Nebraska.
- (10 points) Download the daily adjusted closing stock prices for current year of the Coca-Cola Company (KO) and PepsiCo (PEP).
- Use this data to find the correlation coefficient between the stock prices of these two corporations
- Plot a scatter plot with KO prices on the horizontal axis and PEP prices on the vertical axis
- Add a prediction line to your plot that shows predictions of PEP prices from KO prices
- (Extra credit: 2 points) Let \(\{\widehat{x_{i}}\}\) be the standardized data set that is derived from \( \{x_i\} \) and it has \(N\) items. Prove the vector \(\left \langle \begin{matrix} \frac{\widehat{x_{1}}}{\sqrt{N}}, & ... & \frac{\widehat{x_{N}}}{\sqrt{N}} \end{matrix} \right \rangle \) has unit length.