How does Statsmodels ARIMA compute the first few fitted values?

Suppose I have an ARIMA(p,d,q), d>0 that I estimate using statsmodels. (See ARIMA class here.) For in-sample forecasts for t >= d, I get exactly what I expect, namely the Y(t) uses exactly the observed values of Y(t-1), Y(t-2), ... as well as the estimated innovations (computed from innovations algorithm on differences series). (I will index the series starting at zero.)

For example, when d = 3, I get exactly

`Y_fitted(3) = 3 * Y(2) - 3 * Y(1) + Y(0) + estimated innovations(3)`

This exactly reproduces the statsmodels fit.fittedvalues values for any t >= d.

I am confused about the predictions for t < d though. It seems like (from simulating a bunch of ARIMA process, estimating, and regressing fitted values on lagged observations) that

Y_fitted(0) = 0 (which makes sense!)
Y_fitted(1) = (d+1)/2 * Y(0) (not sure where that comes from!)
Y_fitted(2) =
- 2.5 * Y(1) - 1.667 * Y(0) when d=3
- 3.0 * Y(1) - 2.500 * Y(0) when d=4
- 3.5 * Y(1) - 3.500 * Y(0) when d=5
- ...
Y_fitted(3) =
- 3.5 * Y(2) - 4.2 * Y(1) + 1.75 * Y(0) when d=4
- ...

For the life of me, despite googling and trying to reverse engineer a heuristic, I cannot figure out where these coefficients come from. I don't think they correspond to the best linear predictors given observed values to time-t, (but I am not sure about that either). Trying to step through the statsmodels code is too complicated. (The documentation for statsmodels does say the the initial residuals are funky, but doesn't say how they are actually computed. They aren't just padding zeros, they are doing something.)

Does anyone have any idea? Any guidance would be appreciated. I am happy to add code if people think that would be helpful but I think the question is more conceptual about the code.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

How does Statsmodels ARIMA compute the first few fitted values? - Stack Overflow

与本文相关的文章

评论列表(0)