Simple Linear Regression
Example Data  244  238 
Fitting Linear Models
Description
lm
is used to fit linear models.
It can be used to carry out regression,
single stratum analysis of variance and
analysis of covariance (although aov
may provide a more
convenient interface for these).
Usage
lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...)
Arguments
formula 
an object of class 
data 
an optional data frame, list or environment (or object
coercible by 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
weights 
an optional vector of weights to be used in the fitting
process. Should be 
na.action 
a function which indicates what should happen
when the data contain 
method 
the method to be used; for fitting, currently only

model, x, y, qr 
logicals. If 
singular.ok 
logical. If 
contrasts 
an optional list. See the 
offset 
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be 
... 
additional arguments to be passed to the low level regression fitting functions (see below). 
Details
Models for lm
are specified symbolically. A typical model has
the form response ~ terms
where response
is the (numeric)
response vector and terms
is a series of terms which specifies a
linear predictor for response
. A terms specification of the form
first + second
indicates all the terms in first
together
with all the terms in second
with duplicates removed. A
specification of the form first:second
indicates the set of
terms obtained by taking the interactions of all terms in first
with all terms in second
. The specification first*second
indicates the cross of first
and second
. This is
the same as first + second + first:second
.
If the formula includes an offset
, this is evaluated and
subtracted from the response.
If response
is a matrix a linear model is fitted separately by
leastsquares to each column of the matrix.
See model.matrix
for some further details. The terms in
the formula will be reordered so that main effects come first,
followed by the interactions, all secondorder, all thirdorder and so
on: to avoid this pass a terms
object as the formula (see
aov
and demo(glm.vr)
for an example).
A formula has an implied intercept term. To remove this use either
y ~ x  1
or y ~ 0 + x
. See formula
for
more details of allowed formulae.
NonNULL
weights
can be used to indicate that different
observations have different variances (with the values in
weights
being inversely proportional to the variances); or
equivalently, when the elements of weights
are positive
integers w_i, that each response y_i is the mean of
w_i unitweight observations (including the case that there are
w_i observations equal to y_i and the data have been
summarized).
lm
calls the lower level functions lm.fit
, etc,
see below, for the actual numerical computations. For programming
only, you may consider doing likewise.
All of weights
, subset
and offset
are evaluated
in the same way as variables in formula
, that is first in
data
and then in the environment of formula
.
Value
lm
returns an object of class
"lm"
or for
multiple responses of class c("mlm", "lm")
.
The functions summary
and anova
are used to
obtain and print a summary and analysis of variance table of the
results. The generic accessor functions coefficients
,
effects
, fitted.values
and residuals
extract
various useful features of the value returned by lm
.
An object of class "lm"
is a list containing at least the
following components:
coefficients 
a named vector of coefficients 
residuals 
the residuals, that is response minus fitted values. 
fitted.values 
the fitted mean values. 
rank 
the numeric rank of the fitted linear model. 
weights 
(only for weighted fits) the specified weights. 
df.residual 
the residual degrees of freedom. 
call 
the matched call. 
terms 
the 
contrasts 
(only where relevant) the contrasts used. 
xlevels 
(only where relevant) a record of the levels of the factors used in fitting. 
offset 
the offset used (missing if none were used). 
y 
if requested, the response used. 
x 
if requested, the model matrix used. 
model 
if requested (the default), the model frame used. 
na.action 
(where relevant) information returned by

In addition, nonnull fits will have components assign
,
effects
and (unless not requested) qr
relating to the linear
fit, for use by extractor functions such as summary
and
effects
.
Using time series
Considerable care is needed when using lm
with time series.
Unless na.action = NULL
, the time series attributes are
stripped from the variables before the regression is done. (This is
necessary as omitting NA
s would invalidate the time series
attributes, and if NA
s are omitted in the middle of the series
the result would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to
line up series, so that the time shift of a lagged or differenced
regressor would be ignored. It is good practice to prepare a
data
argument by ts.intersect(..., dframe = TRUE)
,
then apply a suitable na.action
to that data frame and call
lm
with na.action = NULL
so that residuals and fitted
values are time series.
Note
Offsets specified by offset
will not be included in predictions
by predict.lm
, whereas those specified by an offset term
in the formula will be.
Author(s)
The design was inspired by the S function of the same name described in Chambers (1992). The implementation of model formula by Ross Ihaka was based on Wilkinson & Rogers (1973).
References
Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 22, 392–9.
Examples
require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl < c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt < c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group < gl(2, 10, 20, labels = c("Ctl","Trt")) weight < c(ctl, trt) lm.D9 < lm(weight ~ group) lm.D90 < lm(weight ~ group  1) # omitting intercept anova(lm.D9) summary(lm.D90) opar < par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... par(opar) ### less simple examples in "See Also" above