Go Back   Rhinocerus > Newsgroup > Newsgroup comp.soft-sys.sas

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 10-16-2006, 09:34 PM
David L Cassell
Guest
 
Posts: n/a
Default Re: To statisticians : Cluster Analysis

job.alerte@GMAIL.COM wrote back:
>
>Dear experts !
>
>Here are a little more precisions on my data and the aim of the study
>(it' a little more complex):



Yes, they're always a *little* more complex. Where 'little' is somewhere
between 'this will add a month to the job' to 'this is going to take the
ghost of Paul Erdos to work out the theory for it'. :-)


>- I defined two responsiveness criteria to treatment, say A and B, both
>binary Yes / No: one derived from a continuous measurement, the other
>one derived from a quality of life scale. Then, A and B are known for
>all of the patients.



I think that you mgit be better off if you did NOT turn these into
categorical variables here. Use as much information as possible, and
use continuous variables when you're doing cluster analysis.


>- I've got about 150 patients, on which X1, X2, .... , X15 were
>measured, all of them either categorical or categorized to facilitate
>odds-ratios estimations and further classification.



Okay, categorizing can make sense when you need to interpret
ORs. And particularly when you need to explain ORs to other
people.

For the cluster analysis, I would go back to the uncategorized
variables whenever possible.


>- I've already planned 2 logistic regression in order to determine
>which of these factors improve responsiveness, which obstruct it and
>which have no impact: one model for each responsiveness criterion.
>
>- This drug had already shown efficience for A, but not for B in
>previous studies.
>
>- The aim of the cluster analysis is to identify a subgroup of patients
>who are best likely to show correct responsiveness for both A and B,
>then to describe them. I agree that, with only one responsiveness
>criterion, a cluster analysis would not be of interest because groups
>were already formed since I defined a "diagnostic" variable "Responsive
>/ Not responsive". But here, I've got two criteria and I think it makes
>more sense.
>
>I put in the cluster analysis the predictors I put in the regression
>models, regardless of the significance of effect.
>
>I'm somewhat restricted because these analyses were required by the
>protocol of the trial and we're too short with deadline to amend it
>now. But if I had to decide on my own, I would have plan a multivariate
>regression, and component / correspondance analyses, ...
>
>What is your opinion ?



Well, a cluster analysis might help you here. A factor analysis or
principal
component analysis might help also. If you want a single descriptor of
'correct A and B' vs. not, a principal component may be a lot more
useful than the cluster analysis. Or else you're going to end up
classifying
points as 'coorect A and B' vs. not, and then performing another analysis
to decribe that behavior.

You might want to start out with a simple plot, with A and B on your axes,
and the patients plotted out. You should be able to see which patients
are ending up in the right quadrant of your graph. Then you can check that
your analysis is giving you meaningful results. And if there is nothing
useful
in this graph, then you may not be able to get useful information out of
an analysis. One of my mottos is:

"If you can see it in a graph, you should be able to find it in an analysis.
If you can find it in an analysis, you should be able to see it in a graph."


>Is the Ward's method performant for a set of categorical descriptors ?
>Which other distance could suit ? I was very surprised to not find the
>Chi2 distance in the methods proposed by Proc Cluster (option method =
>) whereas it seems a quite simple and natural similarity measurement...



Ward's is not designed for categorical descriptors. It basically
(implicitly)
assumes multivariate normality. I don't recommend it.

What do you mean by the 'Chi2' distance? Do you mean simple Euclidean
distance?

>Other questions :
>
>1 / What about when descriptor include both categorical and continuous
>variables ?
>
>Do we have to define a distance for each continuous - continuous /
>continuous - categorical / categorical -categorical type of combination
>and carry out the cluster analysis from a table of distances ?
>
>2 / What about the QoL scales ? Should they be analysed as continuous
>variables ?
>
>Thanks in advance for your lights !
>
>Catherine.



You can use a mixture of continuous and discrete variables, but you
will be happier if you step back to as many continuous variables as
possible.

If you have a mixture of these, then there is no simple way to define
one distance for one class of variables and another distance for a second
class of variables. Cluster analysis doesn't work that way.

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

__________________________________________________ _______________
Stay in touch with old friends and meet new ones with Windows Live Spaces
http://clk.atdmt.com/MSN/go/msnnkwsp...aspx&mkt=en-us
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Vedr.: Re: Vedr.: Re: Cluster analysis in a geographical setting Lars Thomassen Newsgroup comp.soft-sys.sas 0 07-26-2005 11:30 AM
Re: Vedr.: Re: Cluster analysis in a geographical setting Talbot Michael Katz Newsgroup comp.soft-sys.sas 0 07-22-2005 04:22 PM
Re: Cluster analysis for binary data Dennis G. Fisher Newsgroup comp.soft-sys.sas 0 07-07-2005 05:49 PM
Re: Cluster analysis for binary data Wensui Liu Newsgroup comp.soft-sys.sas 0 07-07-2005 03:39 PM
Hierarchical cluster analysis vs Twostep cluster analysis in SPSSwith dicotomized data Magnus Alderling Newsgroup comp.soft-sys.sas 1 01-24-2005 05:51 PM



All times are GMT. The time now is 10:36 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.