Determine the appropriate number of segments
The goal of Cluster Analysis is to group respondents (e.g., consumers) into segments based on needs, benefits, and/or behaviors. The tool tries to achieve this goal by looking for respondents that are similar, putting them together in a cluster or segment, and separating them from other, dissimilar, respondents. The researcher compares the segments and provides a descriptive label for each.
First, go to the Data > Manage tab, select
examples from the
Load data of type
dropdown, and press the
Load button. Then select the
toothpaste dataset. The dataset contains information from
60 consumers who were asked to respond to six questions to determine
their attitudes towards toothpaste. The scores shown for variables v1-v6
indicate the level of agreement with statements on a 7-point scale where
1 = strongly disagree and 7 = strongly agree.
We first establish the number of segments/clusters in the data using Hierarchical Cluster Analysis. Ward’s method with Squared Euclidean distance is often used to determine how (dis)similar individuals are. These are the default values in Radiant but they can be changed if desired. The most important information from this analysis is provide by the plots, so we will focus our attention there.
Select variables v1 through v6 in the Variables box and click the
Estimate button or press
CMD-enter on mac) to generate results. Note that
Hierarchical Cluster Analysis can be time-consuming and memory intensive
for large datasets. If your dataset has more than 5,000 observations
make sure to increase the value in the
Max cases input to
the appropriate number. The Dendrogram shown below provides information
to help you determine the most appropriate number of clusters (or
Hierarchical cluster analysis starts with many segments, as many as there are respondents, and in a stepwise (i.e., hierarchical) process adds the most similar respondents or groups together until only one segment remains. To determine the appropriate number of segments look for a jump along the vertical axis of the plot. At that point two dissimilar segments have been joined. The measure along the vertical axis indicates of the level of heterogeneity within the segments that have been formed. The purpose of clustering is to create homogeneous groups to avoid segments with heterogeneous characteristics, needs, etc. Since the most obvious jump in heterogeneity occurs when we go from 3 to 2 segments we choose 3 segments (i.e., we avoid creating a heterogeneous segment).
Another plot that can be used to determine the number of segments is
a scree-plot. This is a plot of the within-cluster heterogeneity on the
vertical axis and the number of segments on the horizontal axis. Again,
Hierarchical cluster analysis starts with many segments and groups
respondents together until only one segments is left. The scree plot is
created by selecting
Plot(s) dropdown menu. If
Plot cutoff is
set to 0 we see results for all possible cluster solutions. To make the
plot easier to evaluate, we can set
Plot cutoff to, for
example, 0.05 (i.e. show only solutions that have
Within-cluster heterogeneity above 5%).
Reading the plot from left-to-right we see that within-segment
heterogeneity increases sharply when we move from 3 to 2 segments. This
is also clear from the
Change in within-cluster heterogeneity plot (i.e.,
Change). To avoid creating a heterogeneous segment we,
again, choose 3 segments. Now that we have determined the appropriate
number of segments to extract we can use either Cluster >
Hierarchical or Cluster > K-clustering to generate the
final cluster solution.
To download the plots click the download button on the top-right of the screen.
Standardizebox is un-checked
Number of clusters, then provide a name for the variable that will contain cluster assignment information, and finally, press the
gowerdistance will automatically be selected. For more information on the gower distance and R-package see the package vignette
Add code to
> Rmd to (re)create the analysis by clicking the
icon on the bottom
left of your screen or by pressing
ALT-enter on your
If a plot was created it can be customized using
commands or with
gridExtra. See example below and
> Visualize for details.
plot(result, plots = "change", custom = TRUE) + labs(caption = "Data used from ...")
To add, for example, a sub-title to a dendrogram plot use
title(sub = "Data used from ..."). See the
graphics documentation for additional information.
For an overview of related R-functions used by Radiant to conduct cluster analysis see Multivariate > Cluster
The key function from the
stats package used in the
hclus tool is