Re: a method for detecting multivariate outliers..
Jun 6, 2006 3:30 PM


nitish wrote: > dear greg, > thanks for ur reply.. as per ur information regarding the multivarite > outlier detection,how would i come to know that clustering, regression > or clasification is to be done if i have data in raw format.
Why are you doing anything with the data (except looking for typos) if you don't know how it is going to be used?
Hope this helps.
Greg
> Greg Heath wrote: > > nitish wrote: > > > hi all, > > > Let me tell u all that i am not interested in doing clustering, > > > classification or regression. i want to detect outliers out of > > > multivariate data. > > > > You missed the point. > > > > Outliers can only be defined with respect to some model of > > the data. The three tasks of clustering, classification and > > regression are the most common bases for models and > > sometimes lead to different models. > > > > In order to detect outliers you have to define them first. > > > > What will the data be used for? > > > > > for exmple in case of univariate data ,we use box > > > plot or see if all data points lie in st deviation +/ coeff*mean. > > > > mean +/ coeff* stdev ? > > > > Clearly innapropriate for skewed distributions. Quartiles > > are preferable. > > > > > Std > > > error can also be used in place of st deviation. points which lie out > > > of this range are declared as ouliers for a particular var(univariate). > > > i know how to do this analysis with the help of box plots in > > > Statistica. > > > > Obviously, it isn't practical to try to use definitions based on > > multiple > > applications of univariate criteria when the dimensionality is high. > > > > > similarly i want to know if there is some direct or indirect method to > > > detect outliers in case of multivariate data.. this is a kind of > > > exploratory analysis which is required before any specialised > > > multivarite analysis.. let me tell u all again that i am supposed to do > > > this analysis in Statistica as i am working as a technical consultant > > > for Statistica in India > > > > You still haven't answered the questions I asked. > > > > Hope this helps. > > > > Greg > > > > > Herman Rubin wrote: > > > > In article <1149166493.004023.56190@i40g2000cwc.googlegroups.com>, > > > > Greg Heath <heath@alumni.brown.edu> wrote: > > > > > > > > >budhi wrote: > > > > >> is there any direct/indirect method of detecting ouliers from a > > > > >> multivariate data. I am interested in doing this analysis in Statistica > > > > >> software. Please do let me know if this analysis can be done in spss or > > > > >> minitab. > > > > > > > > >Size of data matrix? > > > > >Are Ivs and DVs labeled? > > > > >Are you interested in Clustering, Classification or Regression? > > > > > > > > Let me add a BIG one. What is an outlier? > > > >  > > > > This address is for information only. I do not claim that these views > > > > are those of the Statistics Department or of Purdue University. > > > > Herman Rubin, Department of Statistics, Purdue University > > > > hrubin@stat.purdue.edu Phone: (765)4946054 FAX: (765)4940558



