estimate by weighted proportion in SAS, compared to R emmean's weights = "prop"

I am trying to replicate the R emmeans package's weight="prop" in SAS. However, only reasonable method I could find was to explicitly state the average of one variable and the proportion of another.

for example in R,

test <- data.frame(
  y = c(14, 31, 13, 50, 100),    
  x1 = c(117, 115, 113, 111, 132),    
  x2 = c("t01", "t01", "t02", "t02", "t03")    
)

mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
    
x1   emmean SE   df lower.CL upper.CL
118  41.6   4.47  1    -15.2     98.4

wherea in SAS:

data example;
 input y x1 x2 $;
 datalines;
 14 117 t01
 31 115 t01
 13 113 t02
 50 111 t02
 100 132 t03
 ;
run;

proc mixed data = example;
            class x2;
            model y = x1 x2 /s;
            estimate 'mean CHG' intercept 1 x1 117.6 x2
            0.4
            0.4
            0.2 / e cl;
run;

where 117.6 is the average of variable x1 and 0.4, 0.4, 0.2 being the proportion of elements in x2

There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.

Thank you.

for example in R,

test <- data.frame(
  y = c(14, 31, 13, 50, 100),    
  x1 = c(117, 115, 113, 111, 132),    
  x2 = c("t01", "t01", "t02", "t02", "t03")    
)

mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
    
x1   emmean SE   df lower.CL upper.CL
118  41.6   4.47  1    -15.2     98.4

wherea in SAS:

data example;
 input y x1 x2 $;
 datalines;
 14 117 t01
 31 115 t01
 13 113 t02
 50 111 t02
 100 132 t03
 ;
run;

proc mixed data = example;
            class x2;
            model y = x1 x2 /s;
            estimate 'mean CHG' intercept 1 x1 117.6 x2
            0.4
            0.4
            0.2 / e cl;
run;

where 117.6 is the average of variable x1 and 0.4, 0.4, 0.2 being the proportion of elements in x2

There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.

Thank you.

Share Improve this question asked yesterday aiorr 5991 gold badge4 silver badges13 bronze badges

Are you asking how you could use code generation to create the estimate statement for you? – Tom Commented 19 hours ago

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

It is not hard to get the numbers you need for the ESTIMATE statement from PROC SUMMARY.

proc summary data=example chartype ;
  class x2 ;
  types () x2 ;
  var x1 ;
  output out=means n=n mean= ;
run;

You could then easily use that information to write the estimate statement to a text file that could then be added to your PROC MIXED step with %INCLUDE statement.

You could even adjust it to handle any number of continuous variables or class variables.

%let ds=example;
%let vars=x1;
%let class=x2 ;
filename code temp;

proc summary data=&ds chartype ;
  class &class ;
  types () &class ;
  var &vars ;
  output out=means n=n mean= ;
run;

data _null_;
  file code;
  set means end=eof;
  by _type_;
  array vars &vars;
  array class &class;
  if _n_=1 then do;
     put 'estimate "emmeans" intercept 1 ';
     do i=1 to dim(vars);
       _name_ = vname(vars[i]);
       put _name_ vars[i] ;
     end;
     total=n ;
     retain total;
  end;
  else do;
    _name_=vname(class[indexc(_type_,'1')]);
    if first._type_ then put _name_ @ ;
    percent = n/total;
    put percent @;
    if last._type_ then put ;
  end;
  if eof then put '/ e cl; ' ;
run;

Now you can run your model and include the generated ESTIMATE statement.

proc mixed data = &ds ;
  class &class;
  model y = &vars &class/s;
%include code / source2;
run;

To make it more robust you might need to exclude observations with missing values from the PROC SUMMARY step so that the right denominator is used when calculating the percentages. And if you want to use it when you don't have both continuous and class variables you will probably also need to adjust.

There is an easier alternative in the obsmargin= option for the LSMEANS statement. The issue then is that you need a class variable to call that statement on, I don't think you can get the overall mean without a bit of trickery in your specific case. Still, pretty short and sweet:

data example;
 input y x1 x2 $;
 dummy = 1; /* For dummy overall class, can be whatever */
 datalines;
 ...
 ;
run;

proc mixed data = example;
            class x2 dummy;
            model y = x1 x2 dummy / s;
            lsmeans dummy / om=example at means;
run;

%*                Standard                           ;
%*   Estimate      Error     t Value    Pr > |t|     ;
%*   41.6000      4.4721       9.30      0.0682      ;

Note that there are disadvantages to this approach which might necessitate subsetting of your data or manual calculation as @Tom showed, and this might well be shared behaviour with emmeans. Specifically, om= and at means will use the margins & means from the data as they are provided: if you have repeated measures or hierarchical data every row will still be counted as a unique value (e.g. a subject appearing 3 times will contribute 3 times to that margin/mean). This is quite often not exactly what you want in that case, but the procedure is ignorant to that.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

estimate by weighted proportion in SAS, compared to R emmean's weights = "prop" - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)