|
|||
|
On Tue, 17 May 2005 isaac.neuhaus@BMS.COM wrote:
> I am trying to understand how the different types of sum of squares are > calculated and I haven't been able to understand how SAS calculates the > type III sum of squares when there are empty cells in the design. I > looked at Donald Macnaughton SS.sas code which has been an excellent > teaching resource but when it comes to a design with empty cells the > type III sum of squares (HTI) do not agree with those produced by proc > glm in SAS. Can anybody point to where I can learn how to manually > calculate them or even better explain how? Isaac, First, you should read Chapters 13-15 of Milliken and Johnson, "Analysis of Messy Data", Vol 1. They provide more detail and information than I could hope to in this short note. Here is a brief GLM analysis (though one could perhaps do even better with MIXED) that is based on the approach from the above reference: Proc tabulate shows the nature of the empty cells: ---------------------------- | | b | | |-------------------| | | 1 | 2 | | |---------+---------| | | N |Mean | N |Mean | |------+---+-----+---+-----| |a | | | | | |1 | 1| 7.0| 2| 9.5| |2 | 1| 1.0| .| .| |3 | 1| 4.0| 2| 5.0| |4 | .| .| 1| 5.0| ---------------------------- I've recoded A and B into one variable called lvl which is lvl = 10*a + b; so the cell means layout is now: ------------------ | | N |Mean | |------+---+-----| |lvl i | | | |11 1 | 1| 7.0| |12 2 | 2| 9.5| |21 3 | 1| 1.0| |31 4 | 1| 4.0| |32 5 | 2| 5.0| |42 6 | 1| 5.0| ------------------ The variable i is the index found in the ESTIMATE and CONTRAST statements in the second GLM step: With empty cells compare the TYPE III and TYPE IV sums of Squares: proc glm data=one; class a b; model y = a b a*b / ss3 ss4; run; quit; Source DF Squares Mean Square F Value Pr > F Model 5 57.00000000 11.40000000 45.60 0.0216 Error 2 0.50000000 0.25000000 Corrected Total 7 57.50000000 Source DF Type III SS Mean Square F Value Pr > F a 3 36.30000000 12.10000000 48.40 0.0203 b 1 4.08333333 4.08333333 16.33 0.0561 a*b 1 0.75000000 0.75000000 3.00 0.2254 Source DF Type IV SS Mean Square F Value Pr > F a 3* 28.80000000 9.60000000 38.40 0.0255 b 1* 4.08333333 4.08333333 16.33 0.0561 a*b 1 0.75000000 0.75000000 3.00 0.2254 * NOTE: Other Type IV Testable Hypotheses exist which may yield different SS. Now how does one reproduce these Type IV SS? Use the cell means model with GLM: proc glm data=one; class lvl; model y = lvl / ss4; /* Sums of Squares for A: assume the following contrasts A 1 vs 2 in b=1 A 1 vs 3 in b=1,2 A 3 vs 4 in b=2 */ contrast 'A' lvl 1 0 0 -1 0 0, lvl 1 1 0 -1 -1 0, lvl 0 0 0 0 1 -1; ESTIMATE 'A 1 vs 2 in B=1' lvl 1 0 0 -1 0 0; ESTIMATE 'A 1 vs 3 in B=1,2' lvl 1 1 0 -1 -1 0 / DIVISOR=2; ESTIMATE 'A 3 vs 4 in B=2' lvl 0 0 0 0 1 -1; /* one contrast for B= 1 vs 2 in A=1,3 */ CONTRAST 'B' lvl 1 -1 0 1 -1 0 ; ESTIMATE 'B' lvl 1 -1 0 1 -1 0 /DIVISOR=2; /* Interaction comes from A=1,3 in B=1,2 */ contrast 'AB' lvl 1 -1 0 -1 1 0; ESTIMATE 'AB' lvl 1 -1 0 -1 1 0 /DIVISOR=2; run; quit; < edited output > The GLM Procedure Class Level Information Class Levels Values lvl 6 11 12 21 31 32 42 Dependent Variable: y Sum of Source DF Squares Mean Square F Value Pr > F Model 5 57.00000000 11.40000000 45.60 0.0216 Error 2 0.50000000 0.25000000 Corrected Total 7 57.50000000 Source DF Type IV SS Mean Square F Value Pr > F lvl 5 57.00000000 11.40000000 45.60 0.0216 Contrast DF Contrast SS Mean Square F Value Pr > F A 3 28.80000000 9.60000000 38.40 0.0255 B 1 4.08333333 4.08333333 16.33 0.0561 AB 1 0.75000000 0.75000000 3.00 0.2254 NOTE: the contrast results are the same as the TYPE IV when the model y = a + b + ab was entered in the first GLM step You can also compute estimates from the above contrasts: Standard Parameter Estimate Error t Value Pr > |t| A 1 vs 2 in B=1 3.00000000 0.70710678 4.24 0.0513 A 1 vs 3 in B=1,2 3.75000000 0.43301270 8.66 0.0131 A 3 vs 4 in B=2 -0.00000000 0.61237244 -0.00 1.0000 B -1.75000000 0.43301270 -4.04 0.0561 AB -0.75000000 0.43301270 -1.73 0.225 Robin High ("sometimes known to also have missing cells") Univ. of Oregon |
|
|
||||
|
||||
|
|
|
|||
|
Robin High wrote:
> On Tue, 17 May 2005 isaac.neuhaus@BMS.COM wrote: > > > I am trying to understand how the different types of sum of squares are > > calculated and I haven't been able to understand how SAS calculates the > > type III sum of squares when there are empty cells in the design. I > > looked at Donald Macnaughton SS.sas code which has been an excellent > > teaching resource but when it comes to a design with empty cells the > > type III sum of squares (HTI) do not agree with those produced by proc > > glm in SAS. Can anybody point to where I can learn how to manually > > calculate them or even better explain how? > > Isaac, > > First, you should read Chapters 13-15 of Milliken and Johnson, "Analysis > of Messy Data", Vol 1. They provide more detail and information than I > could hope to in this short note. > > Here is a brief GLM analysis (though one could perhaps do even better with > MIXED) that is based on the approach from the above reference: > > Proc tabulate shows the nature of the empty cells: > > ---------------------------- > | | b | > | |-------------------| > | | 1 | 2 | > | |---------+---------| > | | N |Mean | N |Mean | > |------+---+-----+---+-----| > |a | | | | | > |1 | 1| 7.0| 2| 9.5| > |2 | 1| 1.0| .| .| > |3 | 1| 4.0| 2| 5.0| > |4 | .| .| 1| 5.0| > ---------------------------- > > I've recoded A and B into one variable called lvl which is > > lvl = 10*a + b; > > so the cell means layout is now: > > ------------------ > | | N |Mean | > |------+---+-----| > |lvl i | | | > |11 1 | 1| 7.0| > |12 2 | 2| 9.5| > |21 3 | 1| 1.0| > |31 4 | 1| 4.0| > |32 5 | 2| 5.0| > |42 6 | 1| 5.0| > ------------------ > > The variable i is the index found in the ESTIMATE and CONTRAST statements > in the second GLM step: > > With empty cells compare the TYPE III and TYPE IV sums of Squares: > > proc glm data=one; > class a b; > model y = a b a*b / ss3 ss4; > run; quit; > > Source DF Squares Mean Square F Value Pr > F > > Model 5 57.00000000 11.40000000 45.60 0.0216 > Error 2 0.50000000 0.25000000 > Corrected Total 7 57.50000000 > > > Source DF Type III SS Mean Square F Value Pr > F > > a 3 36.30000000 12.10000000 48.40 0.0203 > b 1 4.08333333 4.08333333 16.33 0.0561 > a*b 1 0.75000000 0.75000000 3.00 0.2254 > > Source DF Type IV SS Mean Square F Value Pr > F > > a 3* 28.80000000 9.60000000 38.40 0.0255 > b 1* 4.08333333 4.08333333 16.33 0.0561 > a*b 1 0.75000000 0.75000000 3.00 0.2254 > > * NOTE: Other Type IV Testable Hypotheses exist which may yield different SS. > > Now how does one reproduce these Type IV SS? > > Use the cell means model with GLM: > > proc glm data=one; > class lvl; > > model y = lvl / ss4; > > /* Sums of Squares for A: assume the following contrasts > A 1 vs 2 in b=1 > A 1 vs 3 in b=1,2 > A 3 vs 4 in b=2 > */ > contrast 'A' lvl 1 0 0 -1 0 0, > lvl 1 1 0 -1 -1 0, > lvl 0 0 0 0 1 -1; > ESTIMATE 'A 1 vs 2 in B=1' lvl 1 0 0 -1 0 0; > ESTIMATE 'A 1 vs 3 in B=1,2' lvl 1 1 0 -1 -1 0 / DIVISOR=2; > ESTIMATE 'A 3 vs 4 in B=2' lvl 0 0 0 0 1 -1; > > /* one contrast for B= 1 vs 2 in A=1,3 */ > > CONTRAST 'B' lvl 1 -1 0 1 -1 0 ; > ESTIMATE 'B' lvl 1 -1 0 1 -1 0 /DIVISOR=2; > > /* Interaction comes from A=1,3 in B=1,2 */ > > contrast 'AB' lvl 1 -1 0 -1 1 0; > ESTIMATE 'AB' lvl 1 -1 0 -1 1 0 /DIVISOR=2; > > run; quit; > > < edited output > > > The GLM Procedure > > Class Level Information > > Class Levels Values > lvl 6 11 12 21 31 32 42 > > > Dependent Variable: y > > Sum of > Source DF Squares Mean Square F Value Pr > F > > Model 5 57.00000000 11.40000000 45.60 0.0216 > Error 2 0.50000000 0.25000000 > Corrected Total 7 57.50000000 > > Source DF Type IV SS Mean Square F Value Pr > F > lvl 5 57.00000000 11.40000000 45.60 0.0216 > > Contrast DF Contrast SS Mean Square F Value Pr > F > > A 3 28.80000000 9.60000000 38.40 0.0255 > B 1 4.08333333 4.08333333 16.33 0.0561 > AB 1 0.75000000 0.75000000 3.00 0.2254 > > NOTE: the contrast results are the same as the TYPE IV when the model > y = a + b + ab was entered in the first GLM step > > You can also compute estimates from the above contrasts: > > Standard > Parameter Estimate Error t Value Pr > |t| > > A 1 vs 2 in B=1 3.00000000 0.70710678 4.24 0.0513 > A 1 vs 3 in B=1,2 3.75000000 0.43301270 8.66 0.0131 > A 3 vs 4 in B=2 -0.00000000 0.61237244 -0.00 1.0000 > B -1.75000000 0.43301270 -4.04 0.0561 > AB -0.75000000 0.43301270 -1.73 0.225 > > > Robin High ("sometimes known to also have missing cells") > Univ. of Oregon Thank you for the explanation. Is there a typo in the following contrast? /* Sums of Squares for A: assume the following contrasts A 1 vs 2 in b=1 A 1 vs 3 in b=1,2 A 3 vs 4 in b=2 / contrast 'A' lvl 1 0 0 -1 0 0, lvl 1 1 0 -1 -1 0, lvl 0 0 0 0 1 -1; Shouldn't it be: contrast 'A' lvl 1 0 -1 0 0 0, lvl 1 1 0 -1 -1 0, lvl 0 0 0 0 1 -1; Do you also have an executive summary like this one for the type III sum of squares? Thanks again, Isaac |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: when should I use Type I SS and Type III SS in glm/mixed? | Nordlund, Dan | Newsgroup comp.soft-sys.sas | 0 | 02-24-2009 06:55 PM |
| Re: help reading data from Excel (pivot table) with empty cells! | Wilson, Nancy | Newsgroup comp.soft-sys.sas | 0 | 07-24-2008 08:20 PM |
| Re: Type III Sum of Squares | David L Cassell | Newsgroup comp.soft-sys.sas | 0 | 08-18-2005 09:50 PM |
| Re: Type I-IV Sum of squares with empty cells | baogong jiang | Newsgroup comp.soft-sys.sas | 0 | 05-19-2005 03:35 PM |
| Type I-IV Sum of squares with empty cells | isaac.neuhaus@bms.com | Newsgroup comp.soft-sys.sas | 0 | 05-17-2005 07:49 PM |