最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Extracting model coefficients for formula terms - Stack Overflow

programmeradmin2浏览0评论

If I fit an lm or glm with factors as explanatory variables, I get a model with estimates and SEs for each level of the factor (minus the baseline level). How can I reliably extract that info for estimates relating to each term in the formula?

Example:

> hec = reshape2::melt(HairEyeColor)
> head(hec)
   Hair   Eye  Sex value
1 Black Brown Male    32
2 Brown Brown Male    53
3   Red Brown Male    10

Then fit a model with two factors:

m = glm(value ~ -1 + Hair + Eye, data=hec)

The fitted coefficients are available in a named vector (and as a full table with these rownames via summary(m)$coef):

> coef(m)
HairBlack HairBrown   HairRed HairBlond   EyeBlue  EyeHazel  EyeGreen 
   22.500    44.750    17.875    24.875    -0.625   -15.875   -19.500 

but now I want to get the first four out since they relate to Hair, and the next three as they relate to Eye.

I could do this using string pattern matching (which I have been doing for specific examples, but I'm doing this often enough I want to get past this) but I am hoping there's a package or a function or an easy way to do this without recourse to string matching, which can possibly break if you construct your data perversely enough and get identical matching coefficient names...

(In the following example, both foo and foob terms generate a coefficient named "foobaz".)

> d = data.frame(foo=sample(c("bar","baz","qux","fnord"),100,TRUE), foob=sample(c("ar","az","sqq","moo"),100,TRUE), y=rnorm(100))
> coef(glm(y~foo+foob, data=d))
(Intercept)      foobaz    foofnord      fooqux      foobaz     foobmoo 
-0.37880701  0.40920067  0.10645248  0.19536840  0.21929754 -0.05912854 
    foobsqq 
 0.37137127 

This is also fragile for interaction terms, where the order of interacted factors depends on the order the terms appear in the formula. You could have X:Y in your formula and get a term out with Y:X if Y has already appeared in the formula. Great.

I've looked at the fitted model object for clues as to how outputs relate to formula terms, and I think something could be done using the $terms component, then using that to see how many levels are in each factor, and getting that many elements from the coefficients at a time, but also that requires compensating for intercept terms, and then interaction terms add another level of complexity.

So before I code this all up, has it been done?

If I fit an lm or glm with factors as explanatory variables, I get a model with estimates and SEs for each level of the factor (minus the baseline level). How can I reliably extract that info for estimates relating to each term in the formula?

Example:

> hec = reshape2::melt(HairEyeColor)
> head(hec)
   Hair   Eye  Sex value
1 Black Brown Male    32
2 Brown Brown Male    53
3   Red Brown Male    10

Then fit a model with two factors:

m = glm(value ~ -1 + Hair + Eye, data=hec)

The fitted coefficients are available in a named vector (and as a full table with these rownames via summary(m)$coef):

> coef(m)
HairBlack HairBrown   HairRed HairBlond   EyeBlue  EyeHazel  EyeGreen 
   22.500    44.750    17.875    24.875    -0.625   -15.875   -19.500 

but now I want to get the first four out since they relate to Hair, and the next three as they relate to Eye.

I could do this using string pattern matching (which I have been doing for specific examples, but I'm doing this often enough I want to get past this) but I am hoping there's a package or a function or an easy way to do this without recourse to string matching, which can possibly break if you construct your data perversely enough and get identical matching coefficient names...

(In the following example, both foo and foob terms generate a coefficient named "foobaz".)

> d = data.frame(foo=sample(c("bar","baz","qux","fnord"),100,TRUE), foob=sample(c("ar","az","sqq","moo"),100,TRUE), y=rnorm(100))
> coef(glm(y~foo+foob, data=d))
(Intercept)      foobaz    foofnord      fooqux      foobaz     foobmoo 
-0.37880701  0.40920067  0.10645248  0.19536840  0.21929754 -0.05912854 
    foobsqq 
 0.37137127 

This is also fragile for interaction terms, where the order of interacted factors depends on the order the terms appear in the formula. You could have X:Y in your formula and get a term out with Y:X if Y has already appeared in the formula. Great.

I've looked at the fitted model object for clues as to how outputs relate to formula terms, and I think something could be done using the $terms component, then using that to see how many levels are in each factor, and getting that many elements from the coefficients at a time, but also that requires compensating for intercept terms, and then interaction terms add another level of complexity.

So before I code this all up, has it been done?

Share Improve this question edited Apr 1 at 13:19 Darren Tsai 36.3k5 gold badges25 silver badges57 bronze badges asked Apr 1 at 8:37 SpacedmanSpacedman 94.5k12 gold badges147 silver badges230 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 10

Example:

d <- data.frame(foo = sample(c("bar","baz","qux","fnord"), 100, TRUE),
                foob = sample(c("ar","az","sqq","moo"), 100, TRUE),
                y = rnorm(100))
mod <- glm(y ~ foo + foob, data = d)

model.matrix() is a function that generates a design matrix for a model. Apart from this, it also contains an attribute "assign", an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column.

attr(model.matrix(mod), "assign")
# [1] 0 1 1 1 2 2 2

Value 0 corresponds to the intercept (if any), and positive values to terms in the order given by the "term.labels" attribute of the terms structure corresponding to the model.

attr(terms(mod),  "term.labels") # or labels(terms(mod))
# [1] "foo"  "foob"

By combining the above, we can write a function that extracts the coefficients of specific term(s) from a model.

getCoef <- function(term, model) {
  ind <- attr(model.matrix(model), "assign")
  lab <- labels(terms(model))
  coef(model)[ind %in% match(term, lab)]
}
getCoef("foo", mod)
#      foobaz    foofnord      fooqux
#  0.12634573  0.27871612 -0.04792015

getCoef("foob", mod)
#     foobaz    foobmoo    foobsqq
# -0.1596896 -0.1452948 -0.0532562

getCoef(c("foo", "foob"), mod)
#      foobaz    foofnord      fooqux      foobaz     foobmoo     foobsqq
#  0.12634573  0.27871612 -0.04792015 -0.15968959 -0.14529477 -0.05325620
# (the two foobaz's from `foo` and `foob` are all extracted)

If a nonexistent term is provided, ...

getCoef("bar", mod)
# named numeric(0)

Try dummy.coef . It will give a named list, one component per factor, with class "dummy_coef" having its own print method. It will include all levels of the factors including ones omitted by coef. When using treatment contrasts, as in the example, the extra level will always be zero. In this example the Brown level of Eye has no coef component but dum$Eye does have a zero dummy.coef component.

hec <- reshape2::melt(HairEyeColor)
m <- glm(value ~ -1 + Hair + Eye, data=hec)
dum <- dummy.coef(m); dum

## Full coefficients are 
##                                         
## Hair:      Black   Brown     Red   Blond
##           22.500  44.750  17.875  24.875
## Eye:       Brown    Blue   Hazel   Green
##           0.000  -0.625 -15.875 -19.500

dum$Hair
##  Black  Brown    Red  Blond 
## 22.500 44.750 17.875 24.875 

dum$Eye
##   Brown    Blue   Hazel   Green 
##   0.000  -0.625 -15.875 -19.500 
发布评论

评论列表(0)

  1. 暂无评论