Building an ARIMA Model to Predict Future Sales Using Python

In this study, we will create an ARIMA model to predict the future sales values of a market using python.

Required libraries;

At this stage, we have loaded the data.

When we take a look at the df, we will see that the date data in the “Month” column is not very regular. We need to edit this data with data manipulation.

During the for loop and the following process, we edited the data in the “Month” column.

Now our date data has become as follows:

We simplified the name of the column showing the sales and assigned the Month column to the index.

We drew a graph to see the distribution of sales:

Statistical Test

We can perform statistical testing using the code below to make sure the data is stationary. Stationarity in time series means that the variance and mean are constant over time.

Just looking at the p-value will suffice for now.
If P <0.05; The data is stationary.
If p> 0.05; Data are not stationary.

Output:

We see that our data is not stationary. To make this data stationary, we need to give the “d” value of the ARIMA Model 1.

ARIMA Model

When creating the ARIMA model, 3 parameters are given, respectively; p,d and q.

p: How many steps ahead values at time x(t) will be taken into account in the estimation process, q: How many steps ago the estimation error at x(t) will be subjected to moving average with values, d: degree of difference taking to make the data stationary means.

It basically uses something called the AIC score to decide how good a particular prediction model is with the auto_arima function. It only tries to minimize the AIC score.

Output:

The function gave the ARIMA parameters (1,1,2) to give the best score.

ARIMA (1,1,2) means that you define some response variable (Y) by combining a 1st order Auto-Regressive model and a 2nd order Moving Average model.

Separation of data set as test and train.

Fitting the ARIMA Model:

Output:

RMSE

We compared the actual values with the estimated values at hand. The RMSE error amount was 90,986. Now let’s see the forecast and actual values on the graph. Actual values will be shown with a blue line and estimated values with a red line.

Output:

It can be seen in the red line graph, which we can describe as the output of our model for the future periods, and the blue line graph, which has actually occurred, in the output image. The situation that we can pay attention to here can be seen as our prediction model does not fully overlap with the real values in order to make as consistent predictions as possible without being too frivolous. However, the important point here is that although the lines do not exactly overlap, the time-dependent uptrend and downtrend of the predicted values and real values show a consistent pattern. This can be considered as a good result in terms of preventing over-fitting.

Leave a Reply

Your email address will not be published. Required fields are marked *