The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any Matlab user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed Matlab files. Chapter 2 deals with the exposition of the files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining multidimensional data sets.
About the Toolbox
The Fuzzy Clustering and Data Analysis Toolbox is a collection of Matlab functions. The toolbox provides five categories of functions:
- Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.
- Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize the results.
- Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).
- Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a multidimensional scaling method described by Sammon.
- Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms.
-------------------------- Janos Abonyi, Ph.D
Head of the Department of Process Engineering University of Veszprem P.O.Box 158 H-8200, Veszprem, Hungary Tel: +36-88-624209 or 36-88-622793 Fax: +36-88-421-709 www.fmt.vein.hu/softcomp