|
|||
|
job.alerte@GMAIL.COM wrote back:
> >Dear experts ! > >Here are a little more precisions on my data and the aim of the study >(it' a little more complex): Yes, they're always a *little* more complex. Where 'little' is somewhere between 'this will add a month to the job' to 'this is going to take the ghost of Paul Erdos to work out the theory for it'. :-) >- I defined two responsiveness criteria to treatment, say A and B, both >binary Yes / No: one derived from a continuous measurement, the other >one derived from a quality of life scale. Then, A and B are known for >all of the patients. I think that you mgit be better off if you did NOT turn these into categorical variables here. Use as much information as possible, and use continuous variables when you're doing cluster analysis. >- I've got about 150 patients, on which X1, X2, .... , X15 were >measured, all of them either categorical or categorized to facilitate >odds-ratios estimations and further classification. Okay, categorizing can make sense when you need to interpret ORs. And particularly when you need to explain ORs to other people. For the cluster analysis, I would go back to the uncategorized variables whenever possible. >- I've already planned 2 logistic regression in order to determine >which of these factors improve responsiveness, which obstruct it and >which have no impact: one model for each responsiveness criterion. > >- This drug had already shown efficience for A, but not for B in >previous studies. > >- The aim of the cluster analysis is to identify a subgroup of patients >who are best likely to show correct responsiveness for both A and B, >then to describe them. I agree that, with only one responsiveness >criterion, a cluster analysis would not be of interest because groups >were already formed since I defined a "diagnostic" variable "Responsive >/ Not responsive". But here, I've got two criteria and I think it makes >more sense. > >I put in the cluster analysis the predictors I put in the regression >models, regardless of the significance of effect. > >I'm somewhat restricted because these analyses were required by the >protocol of the trial and we're too short with deadline to amend it >now. But if I had to decide on my own, I would have plan a multivariate >regression, and component / correspondance analyses, ... > >What is your opinion ? Well, a cluster analysis might help you here. A factor analysis or principal component analysis might help also. If you want a single descriptor of 'correct A and B' vs. not, a principal component may be a lot more useful than the cluster analysis. Or else you're going to end up classifying points as 'coorect A and B' vs. not, and then performing another analysis to decribe that behavior. You might want to start out with a simple plot, with A and B on your axes, and the patients plotted out. You should be able to see which patients are ending up in the right quadrant of your graph. Then you can check that your analysis is giving you meaningful results. And if there is nothing useful in this graph, then you may not be able to get useful information out of an analysis. One of my mottos is: "If you can see it in a graph, you should be able to find it in an analysis. If you can find it in an analysis, you should be able to see it in a graph." >Is the Ward's method performant for a set of categorical descriptors ? >Which other distance could suit ? I was very surprised to not find the >Chi2 distance in the methods proposed by Proc Cluster (option method = >) whereas it seems a quite simple and natural similarity measurement... Ward's is not designed for categorical descriptors. It basically (implicitly) assumes multivariate normality. I don't recommend it. What do you mean by the 'Chi2' distance? Do you mean simple Euclidean distance? >Other questions : > >1 / What about when descriptor include both categorical and continuous >variables ? > >Do we have to define a distance for each continuous - continuous / >continuous - categorical / categorical -categorical type of combination >and carry out the cluster analysis from a table of distances ? > >2 / What about the QoL scales ? Should they be analysed as continuous >variables ? > >Thanks in advance for your lights ! > >Catherine. You can use a mixture of continuous and discrete variables, but you will be happier if you step back to as many continuous variables as possible. If you have a mixture of these, then there is no simple way to define one distance for one class of variables and another distance for a second class of variables. Cluster analysis doesn't work that way. HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 __________________________________________________ _______________ Stay in touch with old friends and meet new ones with Windows Live Spaces http://clk.atdmt.com/MSN/go/msnnkwsp...aspx&mkt=en-us |
|
|
||||
|
||||
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Vedr.: Re: Vedr.: Re: Cluster analysis in a geographical setting | Lars Thomassen | Newsgroup comp.soft-sys.sas | 0 | 07-26-2005 11:30 AM |
| Re: Vedr.: Re: Cluster analysis in a geographical setting | Talbot Michael Katz | Newsgroup comp.soft-sys.sas | 0 | 07-22-2005 04:22 PM |
| Re: Cluster analysis for binary data | Dennis G. Fisher | Newsgroup comp.soft-sys.sas | 0 | 07-07-2005 05:49 PM |
| Re: Cluster analysis for binary data | Wensui Liu | Newsgroup comp.soft-sys.sas | 0 | 07-07-2005 03:39 PM |
| Hierarchical cluster analysis vs Twostep cluster analysis in SPSSwith dicotomized data | Magnus Alderling | Newsgroup comp.soft-sys.sas | 1 | 01-24-2005 05:51 PM |