最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

estimate by weighted proportion in SAS, compared to R emmean's weights = "prop" - Stack Overflow

programmeradmin5浏览0评论

I am trying to replicate the R emmeans package's weight="prop" in SAS. However, only reasonable method I could find was to explicitly state the average of one variable and the proportion of another.

for example in R,

test <- data.frame(
  y = c(14, 31, 13, 50, 100),    
  x1 = c(117, 115, 113, 111, 132),    
  x2 = c("t01", "t01", "t02", "t02", "t03")    
)

mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
    
x1   emmean SE   df lower.CL upper.CL
118  41.6   4.47  1    -15.2     98.4

wherea in SAS:

data example;
 input y x1 x2 $;
 datalines;
 14 117 t01
 31 115 t01
 13 113 t02
 50 111 t02
 100 132 t03
 ;
run;

proc mixed data = example;
            class x2;
            model y = x1 x2 /s;
            estimate 'mean CHG' intercept 1 x1 117.6 x2
            0.4
            0.4
            0.2 / e cl;
run;

where 117.6 is the average of variable x1 and 0.4, 0.4, 0.2 being the proportion of elements in x2

There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.

Thank you.

I am trying to replicate the R emmeans package's weight="prop" in SAS. However, only reasonable method I could find was to explicitly state the average of one variable and the proportion of another.

for example in R,

test <- data.frame(
  y = c(14, 31, 13, 50, 100),    
  x1 = c(117, 115, 113, 111, 132),    
  x2 = c("t01", "t01", "t02", "t02", "t03")    
)

mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
    
x1   emmean SE   df lower.CL upper.CL
118  41.6   4.47  1    -15.2     98.4

wherea in SAS:

data example;
 input y x1 x2 $;
 datalines;
 14 117 t01
 31 115 t01
 13 113 t02
 50 111 t02
 100 132 t03
 ;
run;

proc mixed data = example;
            class x2;
            model y = x1 x2 /s;
            estimate 'mean CHG' intercept 1 x1 117.6 x2
            0.4
            0.4
            0.2 / e cl;
run;

where 117.6 is the average of variable x1 and 0.4, 0.4, 0.2 being the proportion of elements in x2

There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.

Thank you.

Share Improve this question asked yesterday aiorraiorr 5991 gold badge4 silver badges13 bronze badges 1
  • Are you asking how you could use code generation to create the estimate statement for you? – Tom Commented 19 hours ago
Add a comment  | 

2 Answers 2

Reset to default 1

It is not hard to get the numbers you need for the ESTIMATE statement from PROC SUMMARY.

proc summary data=example chartype ;
  class x2 ;
  types () x2 ;
  var x1 ;
  output out=means n=n mean= ;
run;

You could then easily use that information to write the estimate statement to a text file that could then be added to your PROC MIXED step with %INCLUDE statement.

You could even adjust it to handle any number of continuous variables or class variables.

%let ds=example;
%let vars=x1;
%let class=x2 ;
filename code temp;

proc summary data=&ds chartype ;
  class &class ;
  types () &class ;
  var &vars ;
  output out=means n=n mean= ;
run;

data _null_;
  file code;
  set means end=eof;
  by _type_;
  array vars &vars;
  array class &class;
  if _n_=1 then do;
     put 'estimate "emmeans" intercept 1 ';
     do i=1 to dim(vars);
       _name_ = vname(vars[i]);
       put _name_ vars[i] ;
     end;
     total=n ;
     retain total;
  end;
  else do;
    _name_=vname(class[indexc(_type_,'1')]);
    if first._type_ then put _name_ @ ;
    percent = n/total;
    put percent @;
    if last._type_ then put ;
  end;
  if eof then put '/ e cl; ' ;
run;

Now you can run your model and include the generated ESTIMATE statement.

proc mixed data = &ds ;
  class &class;
  model y = &vars &class/s;
%include code / source2;
run;

To make it more robust you might need to exclude observations with missing values from the PROC SUMMARY step so that the right denominator is used when calculating the percentages. And if you want to use it when you don't have both continuous and class variables you will probably also need to adjust.

There is an easier alternative in the obsmargin= option for the LSMEANS statement. The issue then is that you need a class variable to call that statement on, I don't think you can get the overall mean without a bit of trickery in your specific case. Still, pretty short and sweet:

data example;
 input y x1 x2 $;
 dummy = 1; /* For dummy overall class, can be whatever */
 datalines;
 ...
 ;
run;

proc mixed data = example;
            class x2 dummy;
            model y = x1 x2 dummy / s;
            lsmeans dummy / om=example at means;
run;

%*                Standard                           ;
%*   Estimate      Error     t Value    Pr > |t|     ;
%*   41.6000      4.4721       9.30      0.0682      ;

Note that there are disadvantages to this approach which might necessitate subsetting of your data or manual calculation as @Tom showed, and this might well be shared behaviour with emmeans. Specifically, om= and at means will use the margins & means from the data as they are provided: if you have repeated measures or hierarchical data every row will still be counted as a unique value (e.g. a subject appearing 3 times will contribute 3 times to that margin/mean). This is quite often not exactly what you want in that case, but the procedure is ignorant to that.

发布评论

评论列表(0)

  1. 暂无评论