# Correlation and Regression Formulas

Make your calculations faster taking the help of Correlation and Regression Formulas over here. You will find Formulas List of Correlation and Regression right from basic to advanced level. Use them and simplify the problems rather than going with prolonged calculations. Refer to the Correlation and Regression Formulae Sheet and compute your problems at a faster pace.

## Correlation and Regression Formulae Sheet

Quickly grab the opportunity and utilize the Correlation and Regression Formulas List to solve the problems related to the concept. Go through the Formula Sheet of Correlation and Regression existing and learn the concept in a simple manner. Save your time by applying the Formulae and get the results quickly rather than going with lengthy calculations part.

**1. Co-variance**

If two variables x and y takes the values x_{1}, x_{2}, x_{3}….x_{n} and y_{1}, y_{2}, y_{3}….y_{n} then covariance is defined as

Cov(x, y) = \(\frac{\Sigma(x-\bar{x})(y-\bar{y})}{n}\)

Where \(\overline{\mathrm{x}} \text { and } \overline{\mathrm{y}}\) are the means of x and y series respectively.

**2. Coefficient of Correlation**

Karl Pearson gave the following formula for the calculation of correlation coefficient between two variables x and y

r_{xy} = \(\frac{\Sigma(x-\bar{x})(y-\bar{y})}{\sqrt{\Sigma(x-\bar{x})^{2} \Sigma(y-\bar{y})^{2}}}\) and r_{xy} = \(\frac { Cov(x,y) }{ \sigma _{ { x } }\sigma _{ { y } } } =\frac { Cov(x,y) }{ \sqrt { Var(x).Var(y) } } \)

**3. Rank Correlation**

Rank correlation is the correlation between different ranks or grades of the two characteristics. It is given by

1 – \(\frac{6 \Sigma \mathrm{d}^{2}}{\mathrm{n}\left(\mathrm{n}^{2}-1\right)}\); Here d^{2} = \(\sum_{i=1}^{n}\left\{\left(x_{i}-\bar{x}\right)-\left(y_{i}-\bar{y}\right)\right\}^{2}\)

where Σ d^{2} = sum of the squares of the difference of two ranks and n is the number of pairs of observations.

**4. Properties of Correlation Coefficient (r)**

(a) r lies between – 1 and + 1

(b) The correlation is

- perfect and positive if r = + 1
- perfect and negative if r = – 1
- not correlated if r = 0
- positive if r > 0
- negative if r < 0

(c) It is independent of the change of origin and scale.

(d) It is a pure number and hence unitless

(e) If x and y are independent then r = 0

**5. Line of Regression**

(i) Line of regression of y on x

\((y-\overline { y } )=\frac { Cov.(x,y) }{ \sigma _{ x }^{ 2 } } (x-\overline { x } )or(y-\overline { y } )=r.\frac { \sigma _{ y } }{ \sigma _{ x } } (x-\overline { x } )\)

(ii) Line of regression of x on y

\((x-\overline { x } )=\frac { Cov.(x,y) }{ \sigma _{ y }^{ 2 } } (y-\overline { y } )or(x-\overline { x } )=r\frac { \sigma _{ x } }{ \sigma _{ y } } (y-\overline { y } )\)

**6. Regression Coefficient**

(i) The Regression Coefficient of y on x is denoted by b_{yx} and is given by

\(b_{ yx }-r\cdot \frac { \sigma _{ y } }{ \sigma _{ x } } =\frac { { cov }\cdot (x,y) }{ \sigma _{ x }^{ 2 } } \)

This represents the change in the values of y corresponding to a unit change in x.

(ii) The Coefficient of Regression of x on y is denoted by bxy and is given by

\(b_{ xy }=r\frac { \sigma _{ x } }{ \sigma _{ y } } =\frac { { cov }\cdot (x,y) }{ \sigma _{ y }^{ 2 } } \)

This represents the change in the value of x corresponding to a unit change in y.

**7. Properties of Regression Coefficient**

(i) r = \(\sqrt{b_{y x} \cdot b_{x y}}\) i.e. the coefficient of correlation is the Geometric Mean between the two Regression Coefficients.

(ii) If byx > 1, then bxy < 1, i.e. If one of the Regression Coefficient is greater then unity then the other will be less than unity.

(iii) b_{yx} is called the slope of regression line y on x and b_{xy} is called the slope of regression line x on y.

(iv) b_{yx} + b_{xy} > 2 \(\sqrt{b_{y x} \cdot b_{x y}}\) or b_{yx} + b_{xy} > 2r i.e the Arithmetic Mean of the regression coefficient is greater than the Correlation Coefficient.

(v) The product of lines of regression’s gradients is given by \(\frac{\sigma_{y}^{2}}{\sigma_{x}^{2}}\)

(vi) If the angle between lines of regression is θ then tan θ = \(\left(\frac{1-r^{2}}{r}\right) \cdot\left(\frac{\sigma_{x} \sigma_{y}}{\sigma_{x}^{2}+\sigma_{y}^{2}}\right)\)

(vii) If both b_{yx} and b_{xy} are positive, then r will be positive and if both b_{yx} & b_{xy} are negative then r will be negative

**8. Important points on Regression lines**

- If r = 0, then tan 0 is not defined i.e. θ = π/2. Thus, if two variables are not correlated, then the lines of regression are perpendicular to each other.
- If r = ± 1, then tan θ = 0 i.e. θ = 0. Thus the regression lines are coincident
- If regression lines arey = ax + b& x = cy + d then \(\bar{x}=\frac{b c+d}{1-a c}\) and \(\bar{y}=\frac{a d+b}{1-a c}\)

**9. Standard error of Prediction**

The deviation of the predicted value from the observed value is known as the standard error of prediction and is defined as

S_{y} = \(\sqrt{\left\{\frac{\Sigma\left(\mathrm{y}-\mathrm{y}_{\mathrm{p}}\right)^{2}}{\mathrm{n}}\right\}}\); where y is actual value and y_{p} is predicted value.

In relation to coefficient of correlation, it is given by

- Standard error of estimate of x is S
_{x}= \(\sigma_{x} \sqrt{1-r^{2}}\) - Standard error of estimate of y is S
_{y}= \(\sigma_{y} \sqrt{1-r^{2}}\)