K-clustering

kclus(dataset, vars, fun = "kmeans", hc_init = TRUE,
  distance = "sq.euclidian", method = "ward.D", seed = 1234,
  nr_clus = 2, standardize = TRUE, lambda = NULL, data_filter = "",
  envir = parent.frame())

Arguments

dataset

Dataset

vars

Vector of variables to include in the analysis

fun

Use either "kmeans" or "kproto" for clustering

hc_init

Use centers from hclus as the starting point

distance

Distance for hclus

method

Method for hclus

seed

Random see to use for k-clustering if hc_init is FALSE

nr_clus

Number of clusters to extract

standardize

Standardize data (TRUE or FALSE)

lambda

Parameter > 0 to trade off between Euclidean distance of numeric variables and simple matching coefficient between categorical variables. Also a vector of variable specific factors is possible where the order must correspond to the order of the variables in the data. In this case all variables' distances will be multiplied by their corresponding lambda value.

data_filter

Expression entered in, e.g., Data > View to filter the dataset in Radiant. The expression should be a string (e.g., "price > 10000")

envir

Environment to extract data from

Value

A list of all variables used in kclus as an object of class kclus

Details

See https://radiant-rstats.github.io/docs/multivariate/kclus.html for an example in Radiant

See also

summary.kclus to summarize results

plot.kclus to plot results

store.kclus to add cluster membership to the selected dataset

Examples

kclus(shopping, c("v1:v6"), nr_clus = 3) %>% str()
#> List of 19 #> $ nr_obs : int 20 #> $ clus_means :'data.frame': 3 obs. of 6 variables: #> ..$ v1: num [1:3] 5.75 1.67 3.5 #> ..$ v2: num [1:3] 3.62 3 5.83 #> ..$ v3: num [1:3] 6 1.83 3.33 #> ..$ v4: num [1:3] 3.12 3.5 6 #> ..$ v5: num [1:3] 1.88 5.5 3.5 #> ..$ v6: num [1:3] 3.88 3.33 6 #> $ clus_names : chr [1:3] "Cluster 1" "Cluster 2" "Cluster 3" #> $ km_out :List of 9 #> ..$ cluster : int [1:20] 1 2 1 3 2 1 1 1 2 3 ... #> ..$ centers : num [1:3, 1:6] 1 -1.149 -0.184 -0.337 -0.78 ... #> .. ..- attr(*, "dimnames")=List of 2 #> .. .. ..$ : chr [1:3] "1" "2" "3" #> .. .. ..$ : chr [1:6] "v1" "v2" "v3" "v4" ... #> ..$ totss : num 114 #> ..$ withinss : num [1:3] 11.98 7.72 10.02 #> ..$ tot.withinss: num 29.7 #> ..$ betweenss : num 84.3 #> ..$ size : int [1:3] 8 6 6 #> ..$ iter : int 1 #> ..$ ifault : int 0 #> ..- attr(*, "class")= chr "kmeans" #> $ clus_var : int [1:20] 1 2 1 3 2 1 1 1 2 3 ... #> $ center_calc:function (x, prop = FALSE) #> $ max_freq :function (x) #> $ df_name : chr "shopping" #> $ dataset :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20 obs. of 6 variables: #> ..$ v1: int [1:20] 6 2 7 4 1 6 5 7 2 3 ... #> ..$ v2: int [1:20] 4 3 2 6 3 4 3 3 4 5 ... #> ..$ v3: int [1:20] 7 1 6 4 2 6 6 7 3 3 ... #> ..$ v4: int [1:20] 3 4 4 5 2 3 3 4 3 6 ... #> ..$ v5: int [1:20] 2 5 1 3 6 3 3 1 6 4 ... #> ..$ v6: int [1:20] 3 4 3 6 4 4 4 4 3 6 ... #> ..- attr(*, "description")= chr "## Shopping attitudes\n\n### Description\n\n20 consumers were asked to respond to six questions to determine th"| __truncated__ #> $ vars : chr [1:6] "v1" "v2" "v3" "v4" ... #> $ fun : chr "kmeans" #> $ hc_init : logi TRUE #> $ distance : chr "sq.euclidian" #> $ method : chr "ward.D" #> $ seed : num 1234 #> $ nr_clus : num 3 #> $ standardize: logi TRUE #> $ lambda : NULL #> $ data_filter: chr "" #> - attr(*, "class")= chr [1:2] "kclus" "list"