Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Fuzzy Clustering and Data Analysis Toolbox
Replies: 0

 Janos Abonyi Posts: 6 Registered: 12/7/04
Fuzzy Clustering and Data Analysis Toolbox
Posted: Apr 21, 2005 1:39 PM

Fuzzy Clustering and Data Analysis Toolbox

The first release of the toolbox is now available from

http://www.fmt.vein.hu/softcomp/fclusttoolbox/

The purpose of the development of this toolbox was to compile a
continuously extensible, standard tool, which is useful for any
Matlab user for one's aim. In Chapter 1 of the downloadable related
documentation one can find a theoretical introduction containing the
theory of the algorithms, the definition of the validity measures
and the tools of visualization, which help to understand the
programmed Matlab files. Chapter 2 deals with the exposition of the
files and the description of the particular algorithms, and they are
illustrated with simple examples, while in Chapter 3 the whole
Toolbox is tested on real data sets during the solution of three
clustering problems: comparison and selection of algorithms;
estimating the optimal number of clusters; and examining
multidimensional data sets.

The Fuzzy Clustering and Data Analysis Toolbox is a collection of
Matlab functions. The toolbox provides five categories of functions:

- Clustering algorithms. These functions group the given data set
into clusters by different approaches: functions Kmeans and Kmedoid
are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy
partitioning methods with different distance norms.

- Evaluation with cluster prototypes. On the score of the clustering
results of a data set there is a possibility to calculate membership
for "unseen" data sets with these set of functions. In 2-dimensional
case the functions draw a contour-map in the data space to visualize
the results.

- Validation. The validity function provides cluster validity
measures for each partition. It is useful when the number of cluster
is unknown a priori. The optimal partition can be determined by the
point of the extrema of the validation indexes in dependence of the
number of clusters. The indexes calculated are: Partition
Coefficient (PC), Classification Entropy (CE), Partition Index (SC),
Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI)
and Alternative Dunn Index (DII).

- Visualization. The Visualization part of this toolbox provides the
modified Sammon mapping of the data. This mapping method is a
multidimensional scaling method described by Sammon.

- Examples. An example based on industrial data set to present the
usefulness of these toolbox and algorithms.

--------------------------
Janos Abonyi, Ph.D

Head of the Department of Process Engineering
University of Veszprem
P.O.Box 158 H-8200, Veszprem, Hungary
Tel: +36-88-624209 or 36-88-622793
Fax: +36-88-421-709
www.fmt.vein.hu/softcomp

You can order our new book (Fuzzy Model Identification for Control)
from Birkhauser Boston (Springer - NY)
http://www.springer-ny.com/detail.tpl?cart=1048164347947749&ISBN=0817642382
or from Amazon.com
http://www.amazon.com/exec/obidos/ASIN/0817642382/