Loading dataset

The first step will be to load the dataset to a variable to access easily to the inside data.

dataset <- ChickWeight

a) Calculate the mean and variance of variables weight and Time.

weight <- dataset$weight
time <- dataset$Time

Mean

The formula of the mean can be represented as:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

Where:

  • \(\bar{x}\) = Mean.
  • \(n\) = Number of elements.
  • \(x_i\) = Value of the ith element.

Implementing the previous formula in R would look like this:

my_mean <- function(data) {
  x <- 0
  n <- 0
  for (x_i in data) {
    x <- x + x_i
    n <- n + 1
  }
  return(x / n)
}

Now the values ​​of the means can be calculated using the previous function:

my_mean(weight)
## [1] 121.8183
my_mean(time)
## [1] 10.71799

Variance

The formula of the variance can be represented as:

\[ \sigma^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]

Where:

  • \(\sigma^2\) = Variance.
  • \(n\) = Number of elements.
  • \(x_i\) = Value of the ith element.
  • \(\bar{x}\) = Mean of the elements.

Implementing the previous formula in R would look like this:

variance <- function(data) {
  mean <- my_mean(data)
  n <- 0
  sigma_sq <- 0
  for (x_i in data) {
    sigma_sq <- sigma_sq + (x_i - mean)^2
    n <- n + 1
  }
  return(sigma_sq / (n - 1))
}

Now the values ​​of the means can be calculated using the previous function:

variance(weight)
## [1] 5051.223
variance(time)
## [1] 45.67597

b) Calculate the covariance between weight and Time. Interpret the result.

Covariance

The formula of the covariance can be represented as:

\[S_{xy} = \frac{\sum_{i=1}^{n} {(x_i - \bar{x}) * (y_i - \bar{y})}}{n - 1}\]

Where:

  • \(S_{xy}\) = Covariance.
  • \(\bar{x}\) = Mean of the x elements.
  • \(x_i\) = Value of the ith x element.
  • \(\bar{y}\) = Mean of the y elements.
  • \(y_i\) = Value of the ith y element.
  • \(n\) = Number of elements; \(|x|=|y|\).

Implementing the previous formula in R would look like this:

covariance <- function(x, y) {
  mean_1 <- my_mean(x)
  mean_2 <- my_mean(y)
  n <- 0
  s_xy <- 0
  for (d in x) {
    s_xy <- s_xy + (x[n + 1] - mean_1) * (y[n + 1] - mean_2)
    n <- n + 1
  }
  return(s_xy / (n - 1))
}

Now the values ​​of the means can be calculated using the previous function:

covariance(weight, time)
## [1] 402.0873

Result interpretation

As can be seen, when executing the covariance between weight and time, a positive covariance is obtained, specifically a strong positive relationship. What does this mean? This means that both variables are likely to increase or decrease together. In other words, when time or weight increases the other variable is also likely to increase and vice versa, when one decreases, the other is most likely will also do so.

Verifying results

To check if the functions are implemented correctly they can be compared to the built-in Rfunctions:

digits <- 5 # Number of digits to round up

Mean

round(mean(weight), digits) == round(my_mean(weight), digits)
## [1] TRUE
round(mean(time), digits) == round(my_mean(time), digits)
## [1] TRUE

Variance

round(var(weight), digits) == round(variance(weight), digits)
## [1] TRUE
round(var(time), digits) == round(variance(time), digits)
## [1] TRUE

Covariance

round(cov(weight, time), digits) == round(covariance(weight, time), digits)
## [1] TRUE