Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Fuzzy Clustering and Data Analysis Toolbox
Posted:
Apr 21, 2005 1:39 PM


Fuzzy Clustering and Data Analysis Toolbox
The first release of the toolbox is now available from
http://www.fmt.vein.hu/softcomp/fclusttoolbox/
The purpose of the development of this toolbox was to compile a continuously extensible, standard tool, which is useful for any Matlab user for one's aim. In Chapter 1 of the downloadable related documentation one can find a theoretical introduction containing the theory of the algorithms, the definition of the validity measures and the tools of visualization, which help to understand the programmed Matlab files. Chapter 2 deals with the exposition of the files and the description of the particular algorithms, and they are illustrated with simple examples, while in Chapter 3 the whole Toolbox is tested on real data sets during the solution of three clustering problems: comparison and selection of algorithms; estimating the optimal number of clusters; and examining multidimensional data sets.
About the Toolbox
The Fuzzy Clustering and Data Analysis Toolbox is a collection of Matlab functions. The toolbox provides five categories of functions:
 Clustering algorithms. These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.
 Evaluation with cluster prototypes. On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2dimensional case the functions draw a contourmap in the data space to visualize the results.
 Validation. The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).
 Visualization. The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a multidimensional scaling method described by Sammon.
 Examples. An example based on industrial data set to present the usefulness of these toolbox and algorithms.
 Janos Abonyi, Ph.D
Head of the Department of Process Engineering University of Veszprem P.O.Box 158 H8200, Veszprem, Hungary Tel: +3688624209 or 3688622793 Fax: +3688421709 www.fmt.vein.hu/softcomp
You can order our new book (Fuzzy Model Identification for Control) from Birkhauser Boston (Springer  NY) http://www.springerny.com/detail.tpl?cart=1048164347947749&ISBN=0817642382 or from Amazon.com http://www.amazon.com/exec/obidos/ASIN/0817642382/



