I created this program to estimate the Mean Squared Error (MSE), and Mean absolute percent error (MAPE):
Is everything all right with this?
pune
is an .csv file with 22 data points.
pune <- read.csv("C:/Users/ervis/Desktop/Te dhenat e konsum energji/pune.csv", header=T,dec=",", sep=";")
pune <- data.matrix(pune,rownames.force=NA)
m1 <- seq(from = 14274.19, to = 14458.17, length.out = 10000)
MSE1 <- numeric(length = 10000)
for(i in seq_along(MSE1)) {
MSE1[i] <- 1 / length(pune) * sum((pune-m1[i]) ^ 2)
}
MAPE1 <- numeric(length = 10000)
for(i in seq_along(MAPE1)) {
MAPE1[i] <- 1 / length(pune) * sum(abs((pune-m1[i]) / pune))
}
Am I right?
Mean squared error seems to have different meanings in different contexts.
For a random sample taken from a population, the MSE of the sample mean is just the variance divided by the number of samples, i.e.,
mse <- function(sample_mean) var(sample_mean) / length(sample_mean)
mse(pune)
For regressions, MSE means the sum of squares of residuals divided by the degreees of freedom of those residuals.
mse.lm <- function(lm_model) sum(residuals(lm_model) ^ 2) / lm_model$df.residual
#or
mse.lm <- function(lm_model) summary(lm_model)$sigma ^ 2
Seems like a lot of code for a simple calculation. Here is how I would do it for a data vector a
:
a = c(1:10)
mse_a = sum((a - mean(a)) ^ 2) / length(a)
From what I can see your formula for MSE is correct, but there should only be one value for the whole dataset, not multiple values.
If your data only contains 22 points, I can't see why you need to create a 10,000 item vector, regardless of whether you are using loops or not.
精彩评论