Scatter Plots and correlation coefficient

Scatter plots are useful when we want to determine whether there is a relationship between two quantitative variables. The two quantitative variables are referred to as bivariant data. This is usually written in the form (xi , yi ) where i = 1, 2, 3, …, n. Hence X is our independent variable and y is our dependent variable.

Our bivariant data can be represented graphically with our scatter plot. Additionally, this can be represented numerically using the following numerical summaries:

  • The mean of the x and mean of the y data sets will give us our centre of the data.
  • The horizontal spread will be given by the standard deviation of the x data set where most data will fall within 2 SD either way of the mean.
  • The vertical spread will likewise be given by the standard deviation of the y data set where most data will fall within 2 SD either way of the mean.
  • The clustering about the line provides us with the final piece of the puzzle – this is known as the correlation coefficient, r. This will tell us how close the data is to a linear line.

The correlation coefficient ranges from -1 to 1, where positive value means one variable increases, the other variable increases. Where as, a negative value means one variable decreasing correlates with the other variable increasing. The more positive or negative the value the stronger the trend with a zero meaning there is no noticeable trend.

To model our data we can use the SD line, however, it does not use the r value hence it has its limitations. To find the SD line, the point for the mean of x and the mean of y is as well as the point for the mean of X + the standard deviation of X and the mean of Y + the standard deviation of Y. This model does not account for the clustering about the line resulting in it over estimating the RHS and underestimating the data on the LHS.

( x¯ ,y¯) to (x¯ + SDx, y¯ + SDy)

A better way of modelling the data is to use the regression line. This accounts for the R value:

(x¯, y¯) to (x¯ + SDx, y¯ + rSDy)

This straight line is in the form y = mx + b, however, you can only find y with x value and can not do this in reverse.

Leave a Reply

Homework emailed to you weeklyNever fall behind with physics, biology, chemistry, science extension and mathematics homework

Subscribe below to get homework emailed to you each week for free!

Join 27,053 other students getting ahead

%d bloggers like this: