Customer Segmentation and Clustering

What Is Customer Segmentation?

Customer segmentation is the process of dividing a customer base into distinct groups of individuals who share similar characteristics — such as behaviour, demographics, purchasing patterns, or needs. Unlike predictive modelling, which forecasts a single outcome per customer, segmentation is a descriptive technique: it organises the population into meaningful clusters so that analysts, marketers, and product teams can tailor strategies to each group. Effective segmentation enables personalised marketing, targeted product features, differentiated pricing, and prioritised customer success resources — all of which drive higher conversion, retention, and lifetime value.

Approaches to Segmentation

Approach	How It Works	Examples	Best For
Rule-based (heuristic)	Analyst defines segments manually using business logic and thresholds	SMB / Mid-Market / Enterprise by employee count; Active / At-risk / Churned by last login date	When business rules are clear and stable; easy to explain to stakeholders
RFM analysis	Scores customers on Recency, Frequency, and Monetary value; combines scores into segments	Champions (high R, F, M); At-Risk (high F/M but low R); Lost (low on all three)	E-commerce and subscription businesses with transaction data
Clustering (unsupervised ML)	Algorithm discovers natural groupings in the data without predefined labels	K-Means, DBSCAN, Hierarchical clustering applied to behavioural or demographic features	Exploratory discovery of unknown patterns; large feature spaces
Persona-based	Qualitative research combined with quantitative data to define archetypes	"The Power User", "The Occasional Visitor", "The Price-Sensitive Buyer"	Product design and UX; when motivations and goals matter as much as behaviour

RFM Segmentation in Practice

Dimension	Definition	How to Calculate	Why It Matters
Recency (R)	How recently did the customer last make a purchase or engage?	Days since last transaction; score 1–5 (5 = most recent)	Recent customers are more likely to respond to campaigns; high recency predicts retention
Frequency (F)	How often does the customer purchase or engage within a period?	Count of transactions in the past 12 months; score 1–5 (5 = most frequent)	Frequent buyers are more loyal and cheaper to retain than to acquire new customers
Monetary (M)	How much does the customer spend in total or on average?	Sum or average order value in the period; score 1–5 (5 = highest spend)	Identifies high-value customers who deserve premium service and retention investment

K-Means Clustering: The Core Algorithm

Step	What Happens	Analyst Decision
1. Choose k	Specify the number of clusters to find	Use the Elbow method (plot inertia vs. k) or Silhouette score to select optimal k
2. Initialise centroids	Randomly place k cluster centres in feature space (k-means++ improves initialisation)	Use sklearn default (k-means++) for reproducibility; set random_state
3. Assign points	Each data point is assigned to the nearest centroid by Euclidean distance	Ensure features are scaled; Euclidean distance is distorted by different units/magnitudes
4. Update centroids	Centroid of each cluster is recalculated as the mean of all assigned points	Algorithm iterates steps 3 and 4 until centroids stabilise (convergence)
5. Evaluate	Assess cluster quality with Silhouette score (−1 to 1; higher is better) and business interpretability	A statistically valid cluster that has no business meaning is useless; always sense-check with domain experts

Choosing and Evaluating Clustering Algorithms

Algorithm	How It Works	Strengths	Weaknesses	When to Use
K-Means	Assigns each point to the nearest centroid; minimises within-cluster variance	Fast; scalable; easy to interpret	Requires k to be specified; sensitive to outliers; assumes spherical clusters	Default choice for behavioural segmentation with numeric features
DBSCAN	Groups densely connected points; marks sparse points as outliers	Does not require k; detects outliers naturally; finds non-spherical clusters	Sensitive to epsilon and min_samples parameters; struggles with varying density	Anomaly detection combined with clustering; geographic clustering
Hierarchical (Agglomerative)	Builds a tree (dendrogram) by merging closest clusters bottom-up	No need to specify k upfront; dendrogram reveals structure at multiple granularities	Does not scale to large datasets; computationally expensive (O(n²) or worse)	Small to medium datasets where you want to explore cluster hierarchy
Gaussian Mixture Model (GMM)	Models each cluster as a Gaussian distribution; uses soft (probabilistic) assignments	Provides cluster membership probabilities; handles elliptical clusters	More complex to tune; can converge to local optima	When customers can belong partially to multiple segments

Feature Engineering for Segmentation

Feature Type	Examples	Preprocessing Required
Demographic	Age, company size, industry, geography, account tier	Encode categoricals (one-hot or target encoding); normalise numerics
Behavioural	Login frequency, feature usage counts, pages viewed, actions taken per session	Aggregate to customer level (e.g. mean, sum, max over time window); log-transform skewed counts
Transactional	Purchase frequency, average order value, product category mix, discount usage	Recalculate over consistent time windows; handle zero-inflation for inactive customers
Engagement	Email open rate, NPS score, support ticket count, referral activity	Normalise rates to [0,1]; consider recency-weighting older engagement signals

Turning Clusters into Actionable Segments

Step	What to Do	Common Pitfall
Profile each cluster	Calculate mean/median of each feature per cluster; identify what makes each cluster distinct	Reporting raw cluster numbers (Cluster 0, Cluster 1) with no interpretation
Name the segments	Give each cluster a descriptive business name based on its characteristics	Overly technical names that stakeholders cannot relate to or act on
Validate with business stakeholders	Present segment profiles to sales, marketing, and product teams for gut-check validation	Treating clustering output as ground truth without qualitative validation
Define actions per segment	Specify what marketing message, product feature, or outreach strategy applies to each segment	Spending weeks on clustering without any action plan for what to do with the segments
Monitor segment drift	Re-run segmentation periodically; track how customers move between segments over time	Using static segment labels that become stale as customer behaviour evolves

Summary

Customer segmentation is one of the highest-impact analytical techniques available to data analysts because it directly informs how businesses allocate marketing spend, design products, and prioritise service. The choice between rule-based, RFM, and clustering approaches depends on the data available, the business question, and the stakeholders who need to act on the results. K-Means is an excellent starting point for most behavioural segmentation tasks, but the real value comes not from the algorithm itself but from the quality of the features, the interpretability of the resulting clusters, and the concrete actions that each segment enables.