Homework 2

Due Monday Sept. 16 at 11:59pm

Homework policies and submission instructions

Problems

  1. (10 points) Suppose you have a dataset \( \{ \mathbf{x}\} = \{ x_1, x_2, x_3, x_4 \}\) consisting of 4 items. You know that \( x_1=25 \) and \( x_2=-15 \) and that after standardization \( \hat{x}_1=0 \) and \( \hat{x}_2=-1 \).
    1. Find \( \text{mean}\{ \mathbf{x}\} \) and \( \text{std}\{ \mathbf{x}\} \)
    2. Find \( x_3 \) and \( x_4 \) given that \( x_3 \leq x_4 \)
  2. (10 points) Textbook problem 2.1
  3. (10 points) Textbook problem 2.2
  4. (10 points) Textbook problem 2.8 (data). Note that US state abbreviations were not standardized until 1963. This data is from 1960, so NE=Nevada and NB=Nebraska.
  5. (10 points) Download the daily adjusted closing stock prices for current year of the Coca-Cola Company (KO) and PepsiCo (PEP).
    1. Use this data to find the correlation coefficient between the stock prices of these two corporations
    2. Plot a scatter plot with KO prices on the horizontal axis and PEP prices on the vertical axis
    3. Add a prediction line to your plot that shows predictions of PEP prices from KO prices
  6. (Extra credit: 2 points) Let \(\{\widehat{x_{i}}\}\) be the standardized data set that is derived from \( \{x_i\} \) and it has \(N\) items. Prove the vector \(\left \langle \begin{matrix} \frac{\widehat{x_{1}}}{\sqrt{N}}, & ... & \frac{\widehat{x_{N}}}{\sqrt{N}} \end{matrix} \right \rangle \) has unit length.