LINEST

The LINEST function returns an array of the statistics for a straight line that best fits the given data using the least squares method.

LINEST(known-y-values, known-x-values, nonzero-y-intercept, more-stats)

known-y-values: A collection containing the known y values. known-y-values must contain number values. If there is only one collection of known x values, known-y-values can be any size. If there is more than one collection of known x values, known-y-values can be either one column containing the values or one row containing the values, but not both.

known-x-values: An optional collection containing the known x values. known-x-values must contain number values. If omitted, it will be assumed to be a set of the same size as known-y-values beginning with 1—for example, 1, 2, 3 if there are three known-y-values. If there is only one set of known x values, known-x-values, if specified, should be the same size as known-y-values. If there is more than one set of known x values, each row/column of known-x-values is considered to be one set and the size of each row/column must be the same as the size of the row/column of known-y-values.

nonzero-y-intercept: An optional modal value specifying how the y intercept (constant b) should be calculated.

normal (1, TRUE, or omitted): The value of the y intercept (constant b) should be calculated normally.

force 0 value (0, FALSE): The value of the y intercept (constant b) should be forced to be 0.

more-stats: An optional modal value specifying whether additional statistical information should be returned.

no additional stats (0, FALSE, or omitted): Do not return additional regression statistics in the returned array.

additional stats (1, TRUE): Return additional regression statistics in the returned array.

Notes

Examples

Given the following table of known-x-values (cells A2:A6) and known-y-values (cells B2:B6):

A

B

1

X

Y

2

0

-1

3

8

10

4

9

12

5

4

5

6

1

3

=INDEX(LINEST(A2:A6, B2:B6, 1, 0), 1) returns approximately 0.752707581227437, given a normal (1) value for non-zero-y-intercept. This is the best-fit line slope because we specified we wanted the first value from the array returned by INDEX and we only specified one set of known-x-values.

=INDEX(LINEST(A2:A6, B2:B6, 1, 0), 2) returns approximately 0.0342960288808646, which is b, the intercept for the best-fit line. The intercept was returned because we specified we wanted the second value from the array returned by INDEX, which would be the second value because we only specified one set of known-x-values.

Contents of the array of additional statistics

LINEST can include additional statistical information in the array returned by the function. For purposes of the following discussion, suppose that there are five sets of known x values, in addition to the known y values. Suppose further that the known-x-values are in five table rows or five table columns. Based on this, the array returned by LINEST would contain the following values.

1

2

3

4

5

6

1

S5

S4

S3

S2

S1

b

2

SE5

SE4

SE3

SE2

SE1

SEb

3

C

SEy

4

F

DF

5

R1

R2

Row 1, column 1 contains S5 (the slope for the fifth set of known-x-values) continuing through column 5, which would contain S1 (the slope for the first set of known-x-values). Note that the slope related to each of the sets of known-x-values are returned in reverse order.

The last cell in row 1 contains b, the y intercept for the known x values. In our example, this would be row 1 column 6.

Row 2, column 1 contains SE5 (the standard error for the coefficient associated with the fifth set of known-x-values) continuing through column 5, which would contain SE1 (the standard error coefficient for the first set of known-x-values). These values are returned in reverse order; that is, if there are five known x value sets, the value for the fifth set is returned first in the array. This is the same way the slope values are returned.

The last cell in row 2 contains SEb, the standard error associated with the y-intercept value (b). In our example, this would be row 2 column 6.

Row 3, column 1 contains C, the coefficient of determination. This statistic compares estimated and actual y values. If it is 1, there is no difference between the estimated y value and the actual y value. This is known as perfect correlation. If the coefficient of determination is 0, there is no correlation and the given regression equation is not helpful in predicting a y value.

Row 3, column 2 contains SEy, the standard error associated with the y value estimate.

Row 4, column 1 contains F, the F observed value. The F observed value can be used to help determine whether the observed relationship between the dependent and independent variables occurs by chance.

Row 4, column 2 contains DF, the degrees of freedom. Use the degrees of freedom statistic to help determine a confidence level.

Row 5, column 1 contains R1, the regression sum of squares.

Row 5, column 2 contains R2, the residual sum of squares.

Here are some things to keep in mind about the array of additional statistics: