Time Series Analysis for Data Analysts

What Is Time Series Data?

A time series is a sequence of observations recorded at successive, equally-spaced points in time. Unlike cross-sectional data — which captures a snapshot of many subjects at one moment — time series data tracks one or more variables over an extended period. The defining characteristic is that the order of observations matters: the value at time t is systematically related to values at t-1, t-2, and so on. Examples include daily active users, monthly revenue, hourly website traffic, quarterly inventory levels, and real-time sensor readings.

Core Components of a Time Series

Component	Definition	Example	How to Identify
Trend	Long-term directional movement (increasing or decreasing)	Growing monthly revenue over three years	Apply a moving average; the slope reveals trend direction
Seasonality	Regular, predictable fluctuations that repeat at fixed intervals	Higher e-commerce sales every December	Plot by day-of-week or month; use seasonal decomposition
Cyclicality	Longer irregular fluctuations driven by economic or business cycles (not a fixed period)	Business hiring slowing during economic downturns	Requires multi-year data and domain knowledge; harder to model than seasonality
Noise (Residual)	Random, unexplained variation remaining after removing trend, seasonality, and cyclicality	Day-to-day random fluctuations in web traffic	Residuals from decomposition; should resemble white noise with no systematic pattern

Decomposing a Time Series

Classical decomposition separates a series into its constituent components. There are two main models — additive (components add together) and multiplicative (components multiply).

Model Type	Formula	When to Use	Example
Additive	Y = Trend + Seasonality + Residual	Seasonal swings are roughly constant in absolute size regardless of trend level	Website traffic with steady ±15,000 visits per week
Multiplicative	Y = Trend × Seasonality × Residual	Seasonal swings grow proportionally as the trend increases	Retail revenue where December spikes grow larger as the business grows
Log-transformed Additive	log(Y) = Trend + Seasonality + Residual	Alternative to multiplicative when methods assume additive structure	Exponentially growing data where variance increases with level

Stationarity: Why It Matters

Most classical forecasting models (ARIMA etc.) assume the series is stationary — its mean, variance, and autocorrelation structure do not change over time. Non-stationary data produces spurious relationships. Use the Augmented Dickey-Fuller (ADF) test to check: a p-value below 0.05 rejects the null hypothesis that the series has a unit root, confirming stationarity.

Common transformations to achieve stationarity:

First differencing: subtract the previous value — removes linear trends
Seasonal differencing: subtract the value from the same period last cycle — removes seasonal patterns
Log transformation: stabilises variance when it grows with the level

Moving Averages and Smoothing

Technique	How It Works	Best For	Limitation
Simple Moving Average (SMA)	Average of the last n observations; each window point has equal weight	Identifying trend direction; reducing noise in dashboards	Lags behind actual data; does not adapt to recent changes
Exponential Moving Average (EMA)	Weighted average giving exponentially more weight to recent observations	When recent data should carry more influence; financial and operational metrics	Still a lagging indicator; smoothing parameter (alpha) must be chosen
Rolling Standard Deviation	Standard deviation over a rolling window; tracks volatility over time	Identifying periods of instability or anomalous variance	Window length determines sensitivity; too short = noisy, too long = sluggish
LOESS / LOWESS	Locally weighted polynomial regression fitted over sliding windows	Flexible trend extraction when the relationship is non-linear	Computationally heavier; not easily interpretable as a single formula

Forecasting with ARIMA

ARIMA (AutoRegressive Integrated Moving Average) is the most widely used classical forecasting framework. It has three parameters: p (autoregressive order — how many past values to use), d (differencing degree — how many times to difference to achieve stationarity), and q (moving average order — how many past error terms to use).

# Python example using statsmodels
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd

# Load monthly sales data
df = pd.read_csv('monthly_sales.csv', parse_dates=['date'], index_col='date')

# Fit ARIMA(1,1,1)
model = ARIMA(df['sales'], order=(1, 1, 1))
result = model.fit()

# Forecast next 12 months
forecast = result.get_forecast(steps=12)
mean_forecast = forecast.predicted_mean
conf_int = forecast.conf_int(alpha=0.05)  # 95% prediction interval
print(result.summary())

Seasonal Forecasting Models

Model	Key Idea	Seasonal Period	Notes
Holt-Winters Additive	Triple exponential smoothing; separates trend, seasonality, level	Any (e.g., s=12)	Good for constant-seasonal-amplitude data
Holt-Winters Multiplicative	Seasonal amplitude scales with trend level	Any (e.g., s=12)	Better when seasonal swings grow with the trend
SARIMA(p,d,q)(P,D,Q,s)	Extends ARIMA with seasonal AR, differencing, and MA components at lag s	Any (e.g., s=12 for monthly)	More parameters; can overfit on short series
Prophet	Additive model fitting trend + seasonal Fourier terms + holidays	Multiple simultaneously	Analyst-friendly; handles missing data and holidays well

# Using Facebook Prophet
from prophet import Prophet
import pandas as pd

df = pd.read_csv('monthly_sales.csv')
df = df.rename(columns={'date': 'ds', 'sales': 'y'})

model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=False,
    changepoint_prior_scale=0.05  # controls trend flexibility
)
model.fit(df)

# Forecast 24 months ahead
future = model.make_future_dataframe(periods=24, freq='M')
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(24))

Evaluating Forecast Accuracy

Metric	Formula	Interpretation	When to Use
MAE (Mean Absolute Error)	Average of \|actual − forecast\|	Average absolute deviation in original units; robust to outliers	When all errors should be penalised equally
RMSE (Root Mean Squared Error)	Square root of average squared errors	Same units as original data; penalises large errors more than MAE	When large forecast misses are disproportionately costly
MAPE	Average of \|actual − forecast\| / \|actual\| × 100	Scale-independent; expresses error as a percentage of actuals	Comparing accuracy across series of different scales; undefined when actual = 0
MASE	MAE of model / MAE of naive seasonal baseline	MASE < 1 means model beats naive forecast; scale-free	Gold-standard for comparing across multiple series

Walk-Forward Validation

Unlike random train/test splits, time series data must be validated in temporal order. Walk-forward validation (also called time-series cross-validation) simulates real forecasting: the model is trained on past data and tested on the immediately following period, then the window advances.

# Walk-forward validation
import numpy as np
from statsmodels.tsa.arima.model import ARIMA

def walk_forward_validation(series, n_test, order=(1,1,1)):
    predictions, actuals = [], []
    history = list(series[:-n_test])
    for i in range(n_test):
        model = ARIMA(history, order=order)
        result = model.fit()
        yhat = result.forecast(steps=1)[0]
        predictions.append(yhat)
        actual = series[-(n_test - i)]
        actuals.append(actual)
        history.append(actual)  # expand window
    mae = np.mean(np.abs(np.array(predictions) - np.array(actuals)))
    return predictions, mae

# Always compare against naive baseline
naive_mae = np.mean(np.abs(np.diff(series[-n_test-1:])))
print(f'Model MAE: {mae:.2f} | Naive MAE: {naive_mae:.2f}')

Practical Workflow for Time Series Analysis

Visualise — Plot the raw series; inspect for trend, seasonality, and outliers
Handle missing values — Choose between forward fill, interpolation, or exclusion based on gap size and cause
Decompose — Use STL or classical decomposition to separate trend, seasonal, and residual components
Test stationarity — Apply ADF test; difference or transform until stationary (p < 0.05)
Select and fit model — Choose ARIMA, ETS, Prophet, or ML based on data characteristics; use walk-forward validation
Evaluate and iterate — Measure MAE, RMSE, MAPE on hold-out data; compare to naive baseline
Communicate uncertainty — Always report prediction intervals alongside point forecasts; stakeholders making decisions need the range, not just the centre

Detecting Anomalies in Time Series

Method	Approach	Best For
3-sigma rule	Flag points that exceed mean ± 3 standard deviations of the residuals	Simple anomaly detection on stable, normally distributed series
Residual analysis	Fit a model; treat large residuals outside prediction intervals as anomalies	Detecting unexpected deviations after removing trend and seasonality
Isolation Forest	Unsupervised ML method that isolates observations by random partitioning	Multivariate time series with complex anomaly patterns
Changepoint detection (PELT, BOCPD)	Identifies points where the statistical properties of the series shift significantly	Detecting regime shifts, product launches, or system failures

Summary

Time series analysis is essential for any data analyst working with metrics that evolve over time — which in practice means almost every business KPI. Mastering decomposition helps you understand what is driving a metric; stationarity testing ensures your models have valid foundations; and selecting the right forecasting method depends on the data characteristics, the forecast horizon, and the business need. Equally important is quantifying forecast uncertainty: point forecasts alone are misleading, and stakeholders who understand prediction intervals make better decisions. Always benchmark against a naive baseline before declaring a sophisticated model successful.