The normal curve is arguably one of the most important curves in statistics and data science. A lot of natural phenomena follow the normal curve. For example, human weight, height, weather patterns (rain fall), exam scores all follow the normal distribution curve.

A standard normal curve has a mean of 0 and a standard deviation of 1.

A general normal curve has a mean as defined by the data and a standard deviation as defined by the data.

The normal curve is plotted by the following formula. However, R does this for us and generates a plot.

Since the normal curve tells us the distribution of data points, finding the area under the curve will tell us the probability of getting a certain data point or set of data points.

To calculate the lower tail, type this in R:

# this will tell you the probability of getting a value less then x

> pnorm (x)

To find the upper tail, type this in R:

# this will tell you the probability of getting a value greater then x

> pnorm (x, lower.tail=F)

To find the probability of getting a certain interval:

# this will tell you the probability of getting a value greater then xa and less then xb

> pnorm (xb) – pnorm (xa)

To find the provability under your normal distribution plot

# this will tell you the probability of getting a value greater then x

> pnorm (x, mean, sd)

One SD out from a normal distribution you will find 68% of the results, 2 SD out from a normal distribution you will find 95% of the results and three SD out from the center you will find 99.7% of the results.

The z score tells you how many SD a point is from the mean. To find this use the following formula:

(data point – mean) / sd.

If you plot the z score on the x axis and the value of that variable related to the points on the y axis – you will generate a linear graph.

When measurement are taken, the accuracy of the measurement also follows this normal distribution. This randomness in the measurement is known as the chance error. If the machine taking the measurements is not finely calibrated a systematic error will occur where every measurement is off the true value by the same amount – this is called bias whether intentional or accidental.

The individual measurement can be summarized with the following formula.

Individual measurement = exact value + chance error + bias