The first step will be to load the dataset to a variable to access easily to the inside data.
dataset <- ChickWeight
weight <- dataset$weight
time <- dataset$Time
The formula of the mean can be represented as:
\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]
Where:
Implementing the previous formula in R
would look like
this:
my_mean <- function(data) {
x <- 0
n <- 0
for (x_i in data) {
x <- x + x_i
n <- n + 1
}
return(x / n)
}
Now the values of the means can be calculated using the previous function:
my_mean(weight)
## [1] 121.8183
my_mean(time)
## [1] 10.71799
The formula of the variance can be represented as:
\[ \sigma^2 = \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]
Where:
Implementing the previous formula in R
would look like
this:
variance <- function(data) {
mean <- my_mean(data)
n <- 0
sigma_sq <- 0
for (x_i in data) {
sigma_sq <- sigma_sq + (x_i - mean)^2
n <- n + 1
}
return(sigma_sq / (n - 1))
}
Now the values of the means can be calculated using the previous function:
variance(weight)
## [1] 5051.223
variance(time)
## [1] 45.67597
The formula of the covariance can be represented as:
\[S_{xy} = \frac{\sum_{i=1}^{n} {(x_i - \bar{x}) * (y_i - \bar{y})}}{n - 1}\]
Where:
Implementing the previous formula in R
would look like
this:
covariance <- function(x, y) {
mean_1 <- my_mean(x)
mean_2 <- my_mean(y)
n <- 0
s_xy <- 0
for (d in x) {
s_xy <- s_xy + (x[n + 1] - mean_1) * (y[n + 1] - mean_2)
n <- n + 1
}
return(s_xy / (n - 1))
}
Now the values of the means can be calculated using the previous function:
covariance(weight, time)
## [1] 402.0873
As can be seen, when executing the covariance between weight and time, a positive covariance is obtained, specifically a strong positive relationship. What does this mean? This means that both variables are likely to increase or decrease together. In other words, when time or weight increases the other variable is also likely to increase and vice versa, when one decreases, the other is most likely will also do so.
To check if the functions are implemented correctly they can be
compared to the built-in R
functions:
digits <- 5 # Number of digits to round up
round(mean(weight), digits) == round(my_mean(weight), digits)
## [1] TRUE
round(mean(time), digits) == round(my_mean(time), digits)
## [1] TRUE
round(var(weight), digits) == round(variance(weight), digits)
## [1] TRUE
round(var(time), digits) == round(variance(time), digits)
## [1] TRUE
round(cov(weight, time), digits) == round(covariance(weight, time), digits)
## [1] TRUE