开发者

How to test time series model?

开发者 https://www.devze.com 2023-02-06 22:20 出处:网络
I am wondering what the good approach for testing time series model would be. Suppose I have a time series in a time domain t1,t2,...tN. I have inputs, say, zt1, zt2,...ztN and output x1,x2...xN.

I am wondering what the good approach for testing time series model would be. Suppose I have a time series in a time domain t1,t2,...tN. I have inputs, say, zt1, zt2,...ztN and output x1,x2...xN.

Now, if that were a classical data mining problem, I could go with known approaches like cross-validation, leave-one-out, 70-30 or something else.

But how should I approach the problem of testing my model with time series? Should I build the model on the first t1,t2,...t(N-k) inputs and test it on the last k inputs? But what if we want to maximise the prediction for p steps ahead and not k (where p < k). I am looking for a robust solution which I can apply t开发者_C百科o my specific case.


With timeseries fitting, you need to be careful about not using your Out-of-sample data until after you've developed your model. The main problem with modelling is that it's simply easy to overfit.

Typically what we do is to use 70% for in-sample modelling, 30% of out-of-sample testing/validation. And when we use the model in production, the data we collect day-to-day becomes true-out-of-sample data : the data you have never seen or used.

Now, if you have enough data points, I'd suggest trying rolling window fitting approach. For each time step in your in-sample, you look back N time steps to fit your model and see how the parameters in your model varies over time. For example, let's say your model is linear regression with Y = B0 + B1*X1 + B2*X2. You'd do regression N - window_size time over the sample. This way, you understand how sensitive your Betas are in relation to time, among other things.


It sounds like you have a choice between

  1. Using the first few years of data to create the model, then seeing how well it predicts the remaining years.

  2. Using all the years of data for some subset of input conditions, then seeing how well it predicts using the remaining input conditions.

0

精彩评论

暂无评论...
验证码 换一张
取 消