|
|
(12 intermediate revisions by one other user not shown) |
Line 1: |
Line 1: |
| == cluster ==
| | #REDIRECT [[PLS_Toolbox_Topics]] |
| | |
| | |
| '''Purpose'''
| |
| | |
| Agglomerative and K-means cluster analysis with dendrograms.
| |
| | |
| '''Synopsis'''
| |
| | |
| :[results,fig] = cluster(data'',labels,options'')
| |
| | |
| :[results,fig] = cluster(data'',options'')
| |
| | |
| :options = cluster('options')
| |
| | |
| '''Description'''
| |
| | |
| ''cluster(data)'' performs a cluster analysis using either one of six different agglomerative
| |
| methods (including K-Nearest-Neighbor (KNN), furthest neighbor, and Ward's
| |
| method) or K-means clustering algorithm and plots a dendrogram. The input is data (class double or
| |
| dataset).
| |
| | |
| Optional input ''labels'' can be used to put labels on the
| |
| dendrogram plots. For data ''M'' by ''N'' then ''labels'' must be a
| |
| character array with ''M'' rows. When ''labels'' is not specified and data is class “double”, the
| |
| dendrogram is plotted using sample numbers. When ''labels'' is not specified
| |
| and ''data'' is class
| |
| “dataset”, the dendrogram is plotted using sample labels. If the labels field is empty it
| |
| will use sample numbers.
| |
| | |
| The output is a dendrogram showing the sample distances.
| |
| | |
| Note: Calling cluster}} with no inputs starts the graphical user interface (GUI) for this analysis
| |
| method.
| |
| | |
| OUTPUTS:
| |
| | |
| The outputs are (results) a structure containing results of
| |
| the clustering (defined below) and the handle (fig) to any plot created. The
| |
| results structure will contain the following fields:
| |
| | |
| <p class="optionsbody"> font-family:Monaco,Courier'>dist : the distance threshold at which each
| |
| cluster forms.
| |
| | |
| <p class="optionsbody"><span style="font-size: 10.0; font-family: Monaco,Courier"> class
| |
| </span>: the classes of each sample (columns of class) for each distance
| |
| (rows of class).
| |
| | |
| <p class="optionsbody"> Monaco,Courier'>order : the
| |
| order of the samples which locates similar samples nearest to each other (this
| |
| is the order used for the plots).
| |
| | |
| <p class="optionsbody"> Monaco,Courier'>linkage : a
| |
| table of linkages where each row indicates a linkage of one group to another.
| |
| Each row in the matrix represents one group. The first two columns indicate the
| |
| sample or group numbers which were linked to form the group. The final column
| |
| indicates the distance between linked items. Group numbers start at m+1 (where
| |
| m is the number of samples in the input dat matrix) thus, row j of this matrix
| |
| is group number m+j. This matrix can be used with the statistics toolbox
| |
| dendogram function.
| |
| | |
| The (results.class) matrix can be used with the
| |
| (results.dist) matrix to determine clusters of samples for any distance using:
| |
| | |
| <p class="MATLABCommand">
| |
| | |
| <p class="MATLABCommand">results = cluster(data); %do
| |
| cluster
| |
| | |
| <p class="MATLABCommand">ind = max(find(results.dist<threshold));
| |
| %user-desired threshold
| |
| | |
| <p class="MATLABCommand">thisclass = results.class(ind,:); %grab arbitrary
| |
| classes
| |
| | |
| <p class="Ref2">Options
| |
| | |
| <p class="optionsbody"> options'' = a structure array with the following fields:
| |
| | |
| <p class="optionsbody"> font-family:Monaco,Courier'>plots: Monaco,Courier'>['none' | {'final'}] Governs plotting. When set to 'none', the
| |
| distance/cluster matrix is returned, 'final' returns a dendrogram plot showing
| |
| sample distances.
| |
| | |
| <p class="optionsbody"> <span style="font-size: 10.0; font-family: Monaco,Courier">algorithm</span>: [] clustering algorithm,
| |
| | |
| <p class="optionsbody"> 'knn' {DEFAULT}:
| |
| K-Nearest Neighbor
| |
| | |
| <p class="optionsbody"> 'fn'
| |
| | |
| : Furthest Neighbor
| |
| | |
| <p class="optionsbody"> 'avgpair' : Average
| |
| Paired Distance
| |
| | |
| <p class="optionsbody"> 'med' : Median
| |
| | |
| <p class="optionsbody"> 'cnt' : Centroid
| |
| | |
| <p class="optionsbody"> 'ward' : Ward's Method
| |
| | |
| <p class="optionsbody"> 'kmeans' : K-means
| |
| | |
| <p class="optionsbody"> <span style="font-size: 10.0; font-family: Monaco,Courier">preprocessing</span>: {[]} Preprocessing structure
| |
| or keyword (see PREPROCESS),
| |
| | |
| <p class="optionsbody"> font-family:Monaco,Courier'>pca: Monaco,Courier'>[{'off'} | 'on'] if ‘on’ then font-family:Monaco,Courier'>CLUSTER performs PCA first and clustering on the
| |
| scores,
| |
| | |
| <p class="optionsbody"> font-family:Monaco,Courier'>ncomp: Monaco,Courier'>[] number of PCA factors to use {default = [], the user is
| |
| prompted to select the number of factors from the SSQ table},
| |
| | |
| <p class="optionsbody"> <span style="font-size: 10.0; font-family: Monaco,Courier">mahalanobis</span>: [{'off'} | 'on'] if ‘on’
| |
| then a Mahalanobis distance on the scores is used,
| |
| | |
| <p class="optionsbody"> font-family:Monaco,Courier'>slack: Monaco,Courier'>[0] integer number indicating how many samples can be
| |
| "overridden" when two class branches merge. If the smaller of the two
| |
| classes has no more than this number of samples, the branch will be absorbed
| |
| into the larger class. This feature is only valid when classes are supplied in
| |
| the input data. A value of 0 (zero) disables this feature.
| |
| | |
| <p class="optionsbody">
| |
| | |
| The default options can be retreived using: options = cluster('options');.
| |
| | |
| <p class="Ref2">See Also
| |
| <span style="font-size: 10.0; font-family: Monaco,Courier"> agcluster, [analysis.html analysis], [corrmap.html corrmap], dendrogram, [gcluster.html gcluster], [simca.html simca]
| |
| </span>
| |