Scatter Plots and correlation coefficient

Scatter plots are useful when we want to determine whether there is a relationship between two quantitative variables. The two quantitative variables are referred to as bivariant data. This is usually written in the form (xi , yi ) where i = 1, 2, 3, …, n. Hence X is our independent variable and y is our dependent variable.

Our bivariant data can be represented graphically with our scatter plot. Additionally, this can be represented numerically using the following numerical summaries:

  • The mean of the x and mean of the y data sets will give us our centre of the data.
  • The horizontal spread will be given by the standard deviation of the x data set where most data will fall within 2 SD either way of the mean.
  • The vertical spread will likewise be given by the standard deviation of the y data set where most data will fall within 2 SD either way of the mean.
  • The clustering about the line provides us with the final piece of the puzzle – this is known as the correlation coefficient, r. This will tell us how close the data is to a linear line.

The correlation coefficient ranges from -1 to 1, where positive value means one variable increases, the other variable increases. Where as, a negative value means one variable decreasing correlates with the other variable increasing. The more positive or negative the value the stronger the trend with a zero meaning there is no noticeable trend.

To model our data we can use the SD line, however, it does not use the r value hence it has its limitations. To find the SD line, the point for the mean of x and the mean of y is as well as the point for the mean of X + the standard deviation of X and the mean of Y + the standard deviation of Y. This model does not account for the clustering about the line resulting in it over estimating the RHS and underestimating the data on the LHS.

( x¯ ,y¯) to (x¯ + SDx, y¯ + SDy)

A better way of modelling the data is to use the regression line. This accounts for the R value:

(x¯, y¯) to (x¯ + SDx, y¯ + rSDy)

This straight line is in the form y = mx + b, however, you can only find y with x value and can not do this in reverse.

Leave a Reply