My Experiments with Data Science: Time Series Forecasting with R

"Those who have knowledge, don't predict. Those who predict, don't have knowledge. "

--Lao Tzu, 6th Century BC Chinese Poet

A thought provoking statement by Lao Tzu.

Wikipedia states that Forecasting is the process of making predictions of the future based on past and present data and analysis of trends. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods. Usage can differ between areas of application: for example, in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

https://en.wikipedia.org/wiki/Forecasting

Today I'm going to discuss about Time Series Forecasting . Many experts have written about this topic. My favorite being Professor Rob Hyndman http://robjhyndman.com/hyndsight/

R is pretty neat with its graphical capabilities to aid visualisation as we go along.

Time series Forecasting:

Forecasting is almost always done along side a time-series . This is due to the dependency of the algorithms used in forecasting to data that contains the trends for the relevant metric in terms of a time slice such as (Day , Week , Month ..... and the list is long ).

Let's see how to use Time series forecasting methods to predict oil prices .

Some thoughts before we proceed.

1. The metric we want to forecast should have a time-slice attached to it.
2. Forecast methods in R use the following components .

   a. Seasonality
   b. Randomness
   c. Trend

Therefore it is advisable to have atleast 48 data points to achieve a decent accuracy in your prediction.

3. Even though there is no restriction in the time-slice , generally accuracy starts improving when the data is at a month-level. Having said that you can still experiment with week / day level data.

Getting into the business.

Step 1: You need the following packages to proceed with forecasting.

# My Favorite Reference 
#http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html
#Library
library(TTR)
library(forecast)

If you don't have the packages you can install them using the following command.

install.packages(<packagename>)

Subsequently you will have to include them in the environment by using the library command above.

Step 2: Its my habit to set the working directory appropriately before proceeding with any analysis . This way I ensure that all my relevant work is stored in the same folder.

#set working directory
setwd("D:/DataScience/Exercises/TimeSeriesForecasting/OilPrice")

# I used oil price data from the following web page. 
# http://www.indexmundi.com/ # Please like the page in facebook. # Don't worry I have attached the data set that I have used at the end of this # blog

#get data.
oil <- read.csv("MonthlyOilPrice.csv")

#head - My habit to take a quick glance at data.
head(oil)

Step 3: Create the time series as follows. See below where I'm creating a timeseries based on the column price along with explicit declaration of start and end .

Syntax is as follows c(year , month).I knew my data was between 1-Jan-1986 and 1-Nov-2015. You can edit it according to your dataset.

#create timeseries
oilts <- ts(oil$Price, start=c(1986, 1), end=c(2015, 11), frequency=12)

# A quick plot#plot
plot.ts(oilts)

Step 4: As I said earlier now we will try to visualize the components seasonality , trend and randomness .

#decompose to get seasonality,observed,random and trend
oiltscomp <- decompose(oilts)

plot(oiltscomp)

Step 5: I am attempting to remove the seasonality factor now.

#removing seasonality easy isn't it ?
oiltscompadjust <- oilts - oiltscomp$seasonal

plot(oiltscompadjust)

Step 6: If you like to play with the smoothing parameters you can play around with the alpha,beta and gamma values in the Holtwinters function. Trust me the function uses machine learning to arrive at the values and therefore tweak it only if you want to see how your data responds.

#Forecasting Including Smoothing

#Apply HoltWinters smoothing

# oiltsforecast <-HoltWinters(logoilts)
oiltsforecast <-HoltWinters(oilts)
plot(oiltsforecast)

Step 7: Finally generate forecasts.


# use the variable h for deciding number of periods .# levels to decide the confidence intervals. By default it is 80% and 95%.oiltsforecast2 <- forecast.HoltWinters(oiltsforecast,level=c(80,95),h = 12)

# plot blue line shows forecast , dark grey 80% confidence , light grey 95 % confidence
# plot.forecast(oiltsforecast2)
plot.forecast(oiltsforecast2,type="h",main="Oil Price Forecasting",xlab="Year-Month",ylab="$s per barrel")

Observe the following graph where the forecast values are shown in a blue line with 80% and 95% confidence intervals in two different colors.

Step 8: Now that we generated the forecast , lets blow them up and see.

#Just focus on the forecast variabes by setting the include variable at 0
plot.forecast(oiltsforecast2,include=0,type="h",main="Oil Price Forecasting",xlab="Year-Month",ylab="$s per barrel")

Validation of Quality of Forecast:

There are two ways to measure the accuracy .

One is reactive method and the other is proactive.

Measuring your forecast value against the actual value once you encounter is pro-active.

Ex: Lets say you have forecasted a profit of x amount the month of Apr'2016. Then you will have to wait till then to see it's accuracy. [ Reactive , not a good idea ].

There are multiple methods to validate the accuracy of a forecasting method. I prefer using MAD ( Mean Absolute Deviation ) .

See below :

#Mean Absolute Deviation method to validate deviation of forecast
#Note you will have to fit the forecast before validating.

mad(fitted(oiltsforecast2))

## [1] 16.40478

Link for dataset and code

https://drive.google.com/folderview?id=0Bw4afn-u-hxjYjFfVV9MSW1XbU0&usp=sharing

Once again my thanks to

Professor Rob Hyndman for the forecast packages
http://www.indexmundi.com for the dataset.

My Experiments with Data Science

Thursday, December 17, 2015

Time Series Forecasting with R - Oil Price Prediction

"Those who have knowledge, don't predict. Those who predict, don't have knowledge. "

No comments:

Post a Comment