atom feed18 messages in org.r-project.r-sig-mixed-modelsRe: [R-sig-ME] lme and prediction int...
FromSent OnAttachments
D ChawsFeb 18, 2010 9:25 am 
walmes zevianiFeb 18, 2010 1:02 pm 
D ChawsFeb 20, 2010 9:01 am 
Ben BolkerFeb 20, 2010 1:27 pm 
D ChawsFeb 25, 2010 9:31 pm 
D ChawsApr 3, 2010 10:12 pm 
Douglas BatesApr 4, 2010 5:38 am 
Ben BolkerApr 4, 2010 6:53 pm 
Douglas BatesApr 5, 2010 7:39 am 
D ChawsApr 5, 2010 7:48 pm 
Jarrod HadfieldApr 6, 2010 9:59 am 
David HsuApr 6, 2010 1:59 pm 
D ChawsApr 6, 2010 9:17 pm 
John MaindonaldApr 6, 2010 10:47 pm 
Douglas BatesApr 7, 2010 6:53 am 
Douglas BatesApr 7, 2010 7:14 am 
Emmanuel CharpentierApr 7, 2010 2:36 pm 
D ChawsApr 8, 2010 12:10 pm 
Subject:Re: [R-sig-ME] lme and prediction intervals
From:Douglas Bates (bat@stat.wisc.edu)
Date:Apr 5, 2010 7:39:39 am
List:org.r-project.r-sig-mixed-models

On Sun, Apr 4, 2010 at 8:54 PM, Ben Bolker <bol@ufl.edu> wrote:

Douglas Bates wrote:

On Sat, Apr 3, 2010 at 11:12 PM, D Chaws <cat.@gmail.com> wrote:

Ok, issue solved for the most straightforward random effects cases. Not sure about nested random effects or more complex cases.

Assuming that you can make sense of lsmeans in such a case.  You may notice that lsmeans are not provided in base and recommended R packages.  That isn't an oversight.  Try to explain what lsmeans are in terms of the probability model.

Anyway, if you are happy with it, then go for it.  I'll just give you a warning from a professional statistician that they are a nonsensical construction.

 lsmeans may not make sense in general (I don't really know, I have a somewhat weird background that mostly doesn't include SAS), but there's nothing wrong with wanting predictions and standard errors of predictions, which be definable (?) if one can specify (a) whether a given random effect is set to zero or included at its conditional mean/mode value (or, for a simulation, chosen from a normal distribution with the appropriate variance-covariance structure (b) whether random effects not included in the prediction (and the residual error) are included in the SE or not.  I agree that specifying all this is not as easy as specifying "level", but can't one in principle do this by specifying which random effects are in/out of the prediction or the SE?

Your first sentence is absolutely right - those whose backgrounds do not include an introduction to SASspeak have difficulty in understanding what lsmeans are, and that group includes me. Once again, I have phrased my objections poorly. What I should have said is that I do not understand what lsmeans are. I have tried to read the documentation on them in various SAS publications and also in some books and I still can't make sense of them.

I have a strong suspicion that, for most users, the definition of lsmeans is "the numbers that I get from SAS when I use an lsmeans statement". My suggestion for obtaining such numbers is to buy a SAS license and use SAS to fit your models.

Those who have read Bill Venables unpublished paper, "Exegeses on Linear Models" (just put the title into a search engine) will recognize this situation. Insightfull or whatever their name was at the time had important customers (read "pharmaceutical companies") who wanted them to change S-PLUS so that it created both Type I and Type III sums of squares. They consulted with statisticians who knew S-PLUS well who told them "don't do that, it doesn't make sense". Of course the marketing folks won out and the company proceeded to ignore this advice and implement (poorly) the Type X sums of squares where X is the number that means "give me something that I will regard as a marginal sum of squares for a factor in the presence of non-ignorable interactions". Apparently the fact that such a concept doesn't make sense is not an adequate reason to avoid emulating SAS in producing these numbers.

I should have phrased my objection as a deficiency in my background. I don't know what lsmeans are and therefore cannot advise anyone on how to calculate them. If you or anyone else can explain to me - in terms of the random variables Y and B and the model parameters - what you wish to calculate then I can indicate how it could be calculated. I think that lsmeans are, in some sense, elements of the mean of Y but I don't know if they are conditional on a value of B or not. If they are means of Y then there must be a parameter vector beta specified. This is where I begin to lose it. Many people believe that they can specify an incomplete parameter vector and evaluate something that represents means. In other words, many people believe that when there are multiple factors in the fixed-effects formula they can evaluate the mean response for levels of factor A in some way that is marginal with respect to the levels of the other factors or numerical covariates. I can't understand how that can be done.

So I need to know what the values of the fixed-effects parameters, beta, should be and whether you want to condition on a particular value, B = b, or evaluate the mean of the marginal distribution of Y, in the sense of integrating with respect to the distribution of B. If the latter, then you need to specify the parameters that determine the distribution of B. If the former, then I imagine that you wish to evaluate the mean of the distribution of Y conditional on B at the BLUPs. As you know I prefer to use the term "conditional means" or, for more general models like GLMMs, "conditional modes", instead of BLUPs. The values returned by ranef for a linear mixed model are the conditional means of B given the observed value Y = y evaluated at the parameter estimates. To say that you want the conditional mean of Y given B = the conditional mean of B given the observed y is a bit too intricate for me to understand. I really don't know how to interpret such a concept.

 My hope is that, after building code for a reasonable number of examples, the general principles will become sufficiently clear that a method with an appropriate interface can then be written (note use of the passive voice).  The hardest part I discovered for doing this with existing lme and lme4 objects is recalculating the random-effects design matrix appropriately when a new set of data (with different random-effects factor structure) is specified ...

You may find the sparse.model.matrix function in the Matrix package helpful.