In our previous post, we talked about Normal Distribution and its properties. In this post, we extend those ideas and discuss about Standard Normal Distribution in detail.
What is a Standard Normal Distribution?
A Normal Distribution with mean 0 and standard deviation 1 is called a Standard Normal Distribution. Mathematicallty, it is given as below.
For comparison, have a look at the Normal Probability Distribution Function. If you substitute mean as 0 ,standard deviation as 1, you derive the standard normal probability distribution function
Need for a standard normal probability distribution function
We need to extract probability information about events that we are interested in. For this, first we need to convert any normal random variable x into a standard normal random variable z. Mathematically this is given by below equation. \begin{equation} z = \dfrac{x-\mu}{\sigma} \end{equation}
How to interpret z values in a standard normal distribution function?
z values represent how many standard deviations $\sigma$ away (positive or negative) does the random variable x lie from the mean $\mu$. For a random variable x which is one standard deviation $\sigma$ above mean $\mu$ \begin{equation} \begin{aligned} z &= \dfrac{x-\mu}{\sigma} \\[10pt] x &= \mu+\sigma \\[10pt] z &= \dfrac{\mu+\sigma -\mu}{\sigma} \\[10pt] z &= 1 \end{aligned} \end{equation} Similarly, for a random normal variable x equal to mean $\mu$, we can derive z=0. We can say that the standard random normal variable is 0 standard deviations away from the mean.
Calculate Cumulative Normal Distribution Function in Python
How do you calculate probability events for your data if you assume your data to be normally distributed. We will work through a dataset and understand this better. The dataset is linked here and is taken from Kaggle. The data gives information on Medical cost incurred by any patient along with demographic and other information like BMI, Age , Charges incurred. Go ahead and download the dataset to understand this section better.
Let's import all libraries and load the dataset.
import pandas as pd
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
from scipy import stats
from scipy.stats import norm
df = pd.read_csv("..path_to_dataset/insurance.csv")
df.head()
Let's look at how BMI values are distributed for people
fig, ax = plt.subplots()
s = df["bmi"]
s.plot.kde(ax = ax,legend = False, title = "Distribution of BMI")
s.plot.hist(density=True, ax=ax)
ax.set_xlabel('BMI')
ax.set_ylabel('Probability')
ax.grid(axis='y')
##Mean and Standard Deviation of BMI###
np.mean(df["bmi"]), np.std(df["bmi"])
--->(30.66339686098655, 6.0959076415894256)
What is the probability of finding BMI less than equal to 20?. The corresponding z value for this event can be calculated as below.
\begin{equation} z = (x-np.mean(df["bmi"]))/np.std(df["bmi"]) \end{equation} z turns out to be -1.74. So the question can be reformulated as What is the probability of finding a standard random normal variable less than equal to -1.74 standard deviations away from mean?
##Deriving z value###
x= 20
mean = np.mean(df["bmi"])
stdev= np.std(df["bmi"])
z = (x-mean)/stdev
print(z)
--->1.7492713945065959
###Probability of finding a random normal variable less than equal to 20
norm.cdf(20, loc= mean, scale = stdev)*100
--->4.012205908022238
So, there is a 4% probability of finding someone with a BMI of 20 or less.
What is the probability of finding someone with a BMI between 25 to 35? The corresponding z values are -0.92 and 0.71 and we can now reformulate the problem as What is the probability of finding a standard random normal variable between -0.92 standard deviation and 0.711 standard deviation from mean? Note that since we are using norm.cdf, we do not need to derive z values. We just have to feed the random variable x, mean and standard deviation to the norm.cdf function
(norm.cdf(35,loc=mean, scale= stdev)- norm.cdf(25,loc=mean, scale= stdev))*100
--->58.51486537490964
So, there is a 58.5% probability of finding someone with a BMI between 25 and 35.
Finally, what is the probability of finding someone with a BMI of 45 or more?
Turns out, z value is 2.35. We can now reformulate the problem as What is the probability of finding a standard random normal variable which is 2.35 standard deviations or greater from mean?
(1- norm.cdf(45,loc=mean,scale=stdev))*100
--->0.9340388925257126
So, there is a 0.93% of finding someone with a BMI of 45 or more. A very rare event indeed
Calculate inverse of normal cumulative distribution function in Python
Many a times, we are given probability information and we are expected to come up with the range of random normal variable. An example would help here to understand.
What is the range of BMI values for people in lowest 10% BMI category? Can I say anyone with BMI less than or equal to 35 belongs to lowest 10% category? Or is it less than equal to 30? Or 15? Let's find out.
norm.ppf(0.1,loc=mean,scale=stdev)
--->22.85117687949233
Roughly people with BMI of 23 or less belong to the lowest 10% category. This can be visualized in the figure below
x = np.arange(np.min(df['bmi']), np.max(df['bmi']), 0.05)
y = norm.pdf(x,loc=mean,scale=stdev)
plt.plot(x,y)
plt.fill_between(x,y,where=(x<=22.85))#<====
plt.title("Bottom 10% BMI")#
plt.show()
What is the range of BMI values for people in highest 5% BMI category? Can I say anyone with BMI equal to 35 or more belongs to highest 5% category? Or is it 32 or more? Or 45 or more? Let's find out.
norm.isf(0.05,loc=mean,scale=stdev)
--->40.69027265481611
Alternatively, we can do this as well
norm.ppf(1-0.05,loc=mean,scale=stdev)
--->40.69027265481611
Roughly, people with BMI of 40.5 or above belong to the highest 5% category. This can be visualized in the figure below.
x = np.arange(np.min(df['bmi']), np.max(df['bmi']), 0.05)
y = norm.pdf(x,loc=mean,scale=stdev)
plt.plot(x,y)
plt.fill_between(x,y,where=(x>=40.69))
plt.title("Top 5% BMI")
plt.show()
This concludes our discussion on Standard Normal Distribution function. The Colab link is provided for anyone to play with this.
betmatik
ReplyDeletekralbet
betpark
tipobet
slot siteleri
kibris bahis siteleri
poker siteleri
bonus veren siteler
mobil ödeme bahis
UAZOC
dijital kartvizit
ReplyDeletereferans kimliği nedir
binance referans kodu
referans kimliği nedir
bitcoin nasıl alınır
resimli magnet
LZMWD