What Is Time Series Data?
A time series is a sequence of observations recorded at successive, equally-spaced points in time. Unlike cross-sectional data — which captures a snapshot of many subjects at one moment — time series data tracks one or more variables over an extended period. The defining characteristic is that the order of observations matters: the value at time t is systematically related to values at t-1, t-2, and so on. Examples include daily active users, monthly revenue, hourly website traffic, quarterly inventory levels, and real-time sensor readings.
Core Components of a Time Series
Component | Definition | Example | How to Identify |
|---|---|---|---|
Trend | Long-term directional movement (increasing or decreasing) | Growing monthly revenue over three years | Apply a moving average; the slope reveals trend direction |
Seasonality | Regular, predictable fluctuations that repeat at fixed intervals | Higher e-commerce sales every December | Plot by day-of-week or month; use seasonal decomposition |
Cyclicality | Longer irregular fluctuations driven by economic or business cycles (not a fixed period) | Business hiring slowing during economic downturns | Requires multi-year data and domain knowledge; harder to model than seasonality |
Noise (Residual) | Random, unexplained variation remaining after removing trend, seasonality, and cyclicality | Day-to-day random fluctuations in web traffic | Residuals from decomposition; should resemble white noise with no systematic pattern |
Decomposing a Time Series
Classical decomposition separates a series into its constituent components. There are two main models — additive (components add together) and multiplicative (components multiply).
Model Type | Formula | When to Use | Example |
|---|---|---|---|
Additive | Y = Trend + Seasonality + Residual | Seasonal swings are roughly constant in absolute size regardless of trend level | Website traffic with steady ±15,000 visits per week |
Multiplicative | Y = Trend × Seasonality × Residual | Seasonal swings grow proportionally as the trend increases | Retail revenue where December spikes grow larger as the business grows |
Log-transformed Additive | log(Y) = Trend + Seasonality + Residual | Alternative to multiplicative when methods assume additive structure | Exponentially growing data where variance increases with level |
Stationarity: Why It Matters
Most classical forecasting models (ARIMA etc.) assume the series is stationary — its mean, variance, and autocorrelation structure do not change over time. Non-stationary data produces spurious relationships. Use the Augmented Dickey-Fuller (ADF) test to check: a p-value below 0.05 rejects the null hypothesis that the series has a unit root, confirming stationarity.
Common transformations to achieve stationarity:
First differencing: subtract the previous value — removes linear trends
Seasonal differencing: subtract the value from the same period last cycle — removes seasonal patterns
Log transformation: stabilises variance when it grows with the level
Moving Averages and Smoothing
Technique | How It Works | Best For | Limitation |
|---|---|---|---|
Simple Moving Average (SMA) | Average of the last n observations; each window point has equal weight | Identifying trend direction; reducing noise in dashboards | Lags behind actual data; does not adapt to recent changes |
Exponential Moving Average (EMA) | Weighted average giving exponentially more weight to recent observations | When recent data should carry more influence; financial and operational metrics | Still a lagging indicator; smoothing parameter (alpha) must be chosen |
Rolling Standard Deviation | Standard deviation over a rolling window; tracks volatility over time | Identifying periods of instability or anomalous variance | Window length determines sensitivity; too short = noisy, too long = sluggish |
LOESS / LOWESS | Locally weighted polynomial regression fitted over sliding windows | Flexible trend extraction when the relationship is non-linear | Computationally heavier; not easily interpretable as a single formula |
Forecasting with ARIMA
ARIMA (AutoRegressive Integrated Moving Average) is the most widely used classical forecasting framework. It has three parameters: p (autoregressive order — how many past values to use), d (differencing degree — how many times to difference to achieve stationarity), and q (moving average order — how many past error terms to use).
# Python example using statsmodels
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
# Load monthly sales data
df = pd.read_csv('monthly_sales.csv', parse_dates=['date'], index_col='date')
# Fit ARIMA(1,1,1)
model = ARIMA(df['sales'], order=(1, 1, 1))
result = model.fit()
# Forecast next 12 months
forecast = result.get_forecast(steps=12)
mean_forecast = forecast.predicted_mean
conf_int = forecast.conf_int(alpha=0.05) # 95% prediction interval
print(result.summary())
Seasonal Forecasting Models
Model | Key Idea | Seasonal Period | Notes |
|---|---|---|---|
Holt-Winters Additive | Triple exponential smoothing; separates trend, seasonality, level | Any (e.g., s=12) | Good for constant-seasonal-amplitude data |
Holt-Winters Multiplicative | Seasonal amplitude scales with trend level | Any (e.g., s=12) | Better when seasonal swings grow with the trend |
SARIMA(p,d,q)(P,D,Q,s) | Extends ARIMA with seasonal AR, differencing, and MA components at lag s | Any (e.g., s=12 for monthly) | More parameters; can overfit on short series |
Prophet | Additive model fitting trend + seasonal Fourier terms + holidays | Multiple simultaneously | Analyst-friendly; handles missing data and holidays well |
# Using Facebook Prophet
from prophet import Prophet
import pandas as pd
df = pd.read_csv('monthly_sales.csv')
df = df.rename(columns={'date': 'ds', 'sales': 'y'})
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=False,
changepoint_prior_scale=0.05 # controls trend flexibility
)
model.fit(df)
# Forecast 24 months ahead
future = model.make_future_dataframe(periods=24, freq='M')
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(24))
Evaluating Forecast Accuracy
Metric | Formula | Interpretation | When to Use |
|---|---|---|---|
MAE (Mean Absolute Error) | Average of |actual − forecast| | Average absolute deviation in original units; robust to outliers | When all errors should be penalised equally |
RMSE (Root Mean Squared Error) | Square root of average squared errors | Same units as original data; penalises large errors more than MAE | When large forecast misses are disproportionately costly |
MAPE | Average of |actual − forecast| / |actual| × 100 | Scale-independent; expresses error as a percentage of actuals | Comparing accuracy across series of different scales; undefined when actual = 0 |
MASE | MAE of model / MAE of naive seasonal baseline | MASE < 1 means model beats naive forecast; scale-free | Gold-standard for comparing across multiple series |
Walk-Forward Validation
Unlike random train/test splits, time series data must be validated in temporal order. Walk-forward validation (also called time-series cross-validation) simulates real forecasting: the model is trained on past data and tested on the immediately following period, then the window advances.
# Walk-forward validation
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
def walk_forward_validation(series, n_test, order=(1,1,1)):
predictions, actuals = [], []
history = list(series[:-n_test])
for i in range(n_test):
model = ARIMA(history, order=order)
result = model.fit()
yhat = result.forecast(steps=1)[0]
predictions.append(yhat)
actual = series[-(n_test - i)]
actuals.append(actual)
history.append(actual) # expand window
mae = np.mean(np.abs(np.array(predictions) - np.array(actuals)))
return predictions, mae
# Always compare against naive baseline
naive_mae = np.mean(np.abs(np.diff(series[-n_test-1:])))
print(f'Model MAE: {mae:.2f} | Naive MAE: {naive_mae:.2f}')
Practical Workflow for Time Series Analysis
Visualise — Plot the raw series; inspect for trend, seasonality, and outliers
Handle missing values — Choose between forward fill, interpolation, or exclusion based on gap size and cause
Decompose — Use STL or classical decomposition to separate trend, seasonal, and residual components
Test stationarity — Apply ADF test; difference or transform until stationary (p < 0.05)
Select and fit model — Choose ARIMA, ETS, Prophet, or ML based on data characteristics; use walk-forward validation
Evaluate and iterate — Measure MAE, RMSE, MAPE on hold-out data; compare to naive baseline
Communicate uncertainty — Always report prediction intervals alongside point forecasts; stakeholders making decisions need the range, not just the centre
Detecting Anomalies in Time Series
Method | Approach | Best For |
|---|---|---|
3-sigma rule | Flag points that exceed mean ± 3 standard deviations of the residuals | Simple anomaly detection on stable, normally distributed series |
Residual analysis | Fit a model; treat large residuals outside prediction intervals as anomalies | Detecting unexpected deviations after removing trend and seasonality |
Isolation Forest | Unsupervised ML method that isolates observations by random partitioning | Multivariate time series with complex anomaly patterns |
Changepoint detection (PELT, BOCPD) | Identifies points where the statistical properties of the series shift significantly | Detecting regime shifts, product launches, or system failures |
Summary
Time series analysis is essential for any data analyst working with metrics that evolve over time — which in practice means almost every business KPI. Mastering decomposition helps you understand what is driving a metric; stationarity testing ensures your models have valid foundations; and selecting the right forecasting method depends on the data characteristics, the forecast horizon, and the business need. Equally important is quantifying forecast uncertainty: point forecasts alone are misleading, and stakeholders who understand prediction intervals make better decisions. Always benchmark against a naive baseline before declaring a sophisticated model successful.
Create a free reader account to keep reading.