"Those who have knowledge, don't predict. Those who predict, don't have knowledge. "--Lao Tzu, 6th Century BC Chinese Poet
A thought provoking statement by Lao Tzu.
Wikipedia states that Forecasting is the process of making predictions of the future based on past and present data and analysis of trends. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods. Usage can differ between areas of application: for example, in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.
Today I'm going to discuss about Time Series Forecasting . Many experts have written about this topic. My favorite being Professor Rob Hyndman http://robjhyndman.com/hyndsight/
R is pretty neat with its graphical capabilities to aid visualisation as we go along.
Time series Forecasting:
Forecasting is almost always done along side a time-series . This is due to the dependency of the algorithms used in forecasting to data that contains the trends for the relevant metric in terms of a time slice such as (Day , Week , Month ..... and the list is long ).
Let's see how to use Time series forecasting methods to predict oil prices .
Some thoughts before we proceed.
1. The metric we want to forecast should have a time-slice attached to it.
2. Forecast methods in R use the following components .
Therefore it is advisable to have atleast 48 data points to achieve a decent accuracy in your prediction.
3. Even though there is no restriction in the time-slice , generally accuracy starts improving when the data is at a month-level. Having said that you can still experiment with week / day level data.
Getting into the business.
Step 1: You need the following packages to proceed with forecasting.
If you don't have the packages you can install them using the following command.
Subsequently you will have to include them in the environment by using the library command above.
Step 2: Its my habit to set the working directory appropriately before proceeding with any analysis . This way I ensure that all my relevant work is stored in the same folder.
Step 3: Create the time series as follows. See below where I'm creating a timeseries based on the column price along with explicit declaration of start and end .
Syntax is as follows c(year , month).I knew my data was between 1-Jan-1986 and 1-Nov-2015. You can edit it according to your dataset.
, , , , ,
Step 4: As I said earlier now we will try to visualize the components seasonality , trend and randomness .
Step 5: I am attempting to remove the seasonality factor now.
Step 6: If you like to play with the smoothing parameters you can play around with the alpha,beta and gamma values in the Holtwinters function. Trust me the function uses machine learning to arrive at the values and therefore tweak it only if you want to see how your data responds.
Step 7: Finally generate forecasts.
, , , ,
Observe the following graph where the forecast values are shown in a blue line with 80% and 95% confidence intervals in two different colors.
Step 8: Now that we generated the forecast , lets blow them up and see.
, , , , ,
Validation of Quality of Forecast:
There are two ways to measure the accuracy .
One is reactive method and the other is proactive.
Measuring your forecast value against the actual value once you encounter is pro-active.
Ex: Lets say you have forecasted a profit of x amount the month of Apr'2016. Then you will have to wait till then to see it's accuracy. [ Reactive , not a good idea ].
There are multiple methods to validate the accuracy of a forecasting method. I prefer using MAD ( Mean Absolute Deviation ) .
See below :
##  16.40478
Link for dataset and code
Once again my thanks to
Professor Rob Hyndman for the forecast packages
http://www.indexmundi.com for the dataset.