I am trying to replicate the R emmeans
package's weight="prop"
in SAS.
However, only reasonable method I could find was to explicitly state the average of one variable and the proportion of another.
for example in R,
test <- data.frame(
y = c(14, 31, 13, 50, 100),
x1 = c(117, 115, 113, 111, 132),
x2 = c("t01", "t01", "t02", "t02", "t03")
)
mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
x1 emmean SE df lower.CL upper.CL
118 41.6 4.47 1 -15.2 98.4
wherea in SAS:
data example;
input y x1 x2 $;
datalines;
14 117 t01
31 115 t01
13 113 t02
50 111 t02
100 132 t03
;
run;
proc mixed data = example;
class x2;
model y = x1 x2 /s;
estimate 'mean CHG' intercept 1 x1 117.6 x2
0.4
0.4
0.2 / e cl;
run;
where 117.6 is the average of variable x1
and 0.4, 0.4, 0.2 being the proportion of elements in x2
There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.
Thank you.
I am trying to replicate the R emmeans
package's weight="prop"
in SAS.
However, only reasonable method I could find was to explicitly state the average of one variable and the proportion of another.
for example in R,
test <- data.frame(
y = c(14, 31, 13, 50, 100),
x1 = c(117, 115, 113, 111, 132),
x2 = c("t01", "t01", "t02", "t02", "t03")
)
mod <- lm(y ~ x1 + x2, data = test)
summary(emmeans(mod, "x1", weights="prop"))
x1 emmean SE df lower.CL upper.CL
118 41.6 4.47 1 -15.2 98.4
wherea in SAS:
data example;
input y x1 x2 $;
datalines;
14 117 t01
31 115 t01
13 113 t02
50 111 t02
100 132 t03
;
run;
proc mixed data = example;
class x2;
model y = x1 x2 /s;
estimate 'mean CHG' intercept 1 x1 117.6 x2
0.4
0.4
0.2 / e cl;
run;
where 117.6 is the average of variable x1
and 0.4, 0.4, 0.2 being the proportion of elements in x2
There has to be a smarter way to specify the weight in proportion to the frequencies than explicitly writing them out.
Thank you.
Share Improve this question asked yesterday aiorraiorr 5991 gold badge4 silver badges13 bronze badges 1- Are you asking how you could use code generation to create the estimate statement for you? – Tom Commented 19 hours ago
2 Answers
Reset to default 1It is not hard to get the numbers you need for the ESTIMATE statement from PROC SUMMARY.
proc summary data=example chartype ;
class x2 ;
types () x2 ;
var x1 ;
output out=means n=n mean= ;
run;
You could then easily use that information to write the estimate statement to a text file that could then be added to your PROC MIXED step with %INCLUDE statement.
You could even adjust it to handle any number of continuous variables or class variables.
%let ds=example;
%let vars=x1;
%let class=x2 ;
filename code temp;
proc summary data=&ds chartype ;
class &class ;
types () &class ;
var &vars ;
output out=means n=n mean= ;
run;
data _null_;
file code;
set means end=eof;
by _type_;
array vars &vars;
array class &class;
if _n_=1 then do;
put 'estimate "emmeans" intercept 1 ';
do i=1 to dim(vars);
_name_ = vname(vars[i]);
put _name_ vars[i] ;
end;
total=n ;
retain total;
end;
else do;
_name_=vname(class[indexc(_type_,'1')]);
if first._type_ then put _name_ @ ;
percent = n/total;
put percent @;
if last._type_ then put ;
end;
if eof then put '/ e cl; ' ;
run;
Now you can run your model and include the generated ESTIMATE statement.
proc mixed data = &ds ;
class &class;
model y = &vars &class/s;
%include code / source2;
run;
To make it more robust you might need to exclude observations with missing values from the PROC SUMMARY step so that the right denominator is used when calculating the percentages. And if you want to use it when you don't have both continuous and class variables you will probably also need to adjust.
There is an easier alternative in the obsmargin=
option for the LSMEANS
statement. The issue then is that you need a class variable to call that statement on, I don't think you can get the overall mean without a bit of trickery in your specific case. Still, pretty short and sweet:
data example;
input y x1 x2 $;
dummy = 1; /* For dummy overall class, can be whatever */
datalines;
...
;
run;
proc mixed data = example;
class x2 dummy;
model y = x1 x2 dummy / s;
lsmeans dummy / om=example at means;
run;
%* Standard ;
%* Estimate Error t Value Pr > |t| ;
%* 41.6000 4.4721 9.30 0.0682 ;
Note that there are disadvantages to this approach which might necessitate subsetting of your data or manual calculation as @Tom showed, and this might well be shared behaviour with emmeans
. Specifically, om=
and at means
will use the margins & means from the data as they are provided: if you have repeated measures or hierarchical data every row will still be counted as a unique value (e.g. a subject appearing 3 times will contribute 3 times to that margin/mean). This is quite often not exactly what you want in that case, but the procedure is ignorant to that.