Skip to main content

Standard Normal Distribution with examples using Python

Understanding Normal Distribution and its Properties using Python

Understanding Normal Distribution and its Properties using Python

A Normal or Gaussian distribution is used to represent continuous random variables. BMI of people, height of people amongst other phenomena tend to follow a Normal distribution. It is generally used to describe a lot of natural phenomena around us. A normal distribution generally follows a bell curve. Let's see this in action.

A normal distribution is defined by 2 parameters viz. Mean and Standard Deviation. This is how you can define this distribution using the Stats functionality from Scipy.


import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
import scipy 

x= np.linspace(0,700,1000000)##Create evenly spaced numbers from 0 to 400
r1 = norm.rvs(loc=350,scale=50,size=1000000) ###Create samples with mean=350 and stdev=50

Notice the rvs attribute of norm. We will talk about it in a while. Let's see how the plot for this distribution looks like.


fig, ax = plt.subplots(1, 1)
ax.set_xlim([0, 700])
ax.hist(r1, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.plot(x, norm.pdf(x,loc=350,scale=50),
       'r-', lw=5, alpha=0.6)
plt.show()
Fig 1: Normal Distribution with Mean 350 and Stdev 50

Mathematically, equation of normal distribution is given below. rvs attribute is kind of tricky to explain. My understanding is that we create a number of samples x such that it follows the distribution under study. In this case, we will find more instances of x values close to 350 and less instances of values towards left of 100 and towards right of 500. When we plot a histogram of these values, it would follow a nice bell curve.

Let's see if we can derive some of these values on our own without any library. We will define our own normal distribution function.


mean=350
stdev=50
def normal(x,mean,stdev):
  return 1/(stdev*np.sqrt(2*np.pi)) * np.exp((-1*(x-mean)**2)/(2*stdev)**2)

normal(x[500000:500000+10],mean,stdev)   ##Our definition
--->array([0.00797885, 0.00797885, 0.00797885, 0.00797885, 0.00797885,
       0.00797885, 0.00797885, 0.00797885, 0.00797885, 0.00797885])
       
n1 = norm.pdf(x,mean,stdev)  ###Stats function
n1[500000:500000+10]
--->array([0.00797885, 0.00797885, 0.00797885, 0.00797885, 0.00797885,
       0.00797885, 0.00797885, 0.00797885, 0.00797885, 0.00797885])

As we can see, we can easily retrieve values derived from a normal distribution on our own. Let's create 2 more normal distributions with same mean but different standard deviations.


r1 = norm.rvs(loc=350,scale=50,size=1000000)
r2 = norm.rvs(loc=350,scale=75,size=1000000) ###Create a number of samples which follows the normal distribution
r3 = norm.rvs(loc=350,scale=125,size=1000000) ###Create a number of samples which follows the normal distribution
fig, ax = plt.subplots(1, 1)
ax.set_xlim([0, 700])
ax.hist(r1, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.hist(r2, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.hist(r3, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.plot(x, norm.pdf(x,loc=350,scale=50),
       'r-', lw=5, alpha=0.6, label='stdev 50')
ax.plot(x, norm.pdf(x,loc=350,scale=75),
       'b-', lw=5, alpha=0.6, label='stdev 75')
ax.plot(x, norm.pdf(x,loc=350,scale=150),
       'g-', lw=5, alpha=0.6, label='stdev 150')
ax.legend(loc='best', frameon=False)
plt.show()
Fig 2. Three Normal Distrbutions with same mean and different standard deviations

We see that the standard deviation of the distribution determines the shape of the distribution. The distribution with Stdev 50 has a narrower but taller distribution while Stdev with 125 is flatter and wider.

Important Properties of Normal Distribution

  1. A normal distribution is defined by 2 parameters. Mean and Standard Deviation.
  2. Highest point in the distribution is at its mean. Median and mode are also same as mean.
  3. More the standard deviation, more variation in the data, flatter is the curve.
  4. Normal distribution is symmetric. Shape of curve to left is symmetric to the shape of curve to the right. In effect, skewness is 0.
  5. 
    mean,variance,skewness,kurtosis = norm.stats(loc=350,scale=50,moments="mvsk")
    mean, np.sqrt(variance),skewness,kurtosis
    --->(array(350.), 50.0, array(0.), array(0.)) 
    ###Skewness is 0##
    
  6. 68.3% values of a normal random variable lie within 1 standard deviation.
    95.4% values of a normal random variable lie within 2 standard deviation.
    99.7% values of a normal random variable lie within 3 standard deviation.
  7. 
    ##Within 1 standard deviation###
    len(np.where((r1>=mean-1*stdev) & (r1<=mean+1*stdev))[0])/len(r1)*100  #<>
    --->68.3155
    
    ##Within 2 standard deviation###
    len(np.where((r1>=mean-2*stdev) & (r1<=mean+2*stdev))[0])/len(r1)*100 #<>
    --->95.4419
    
    ##Within 3 standard deviation###
    len(np.where((r1>=mean-3*stdev) & (r1<=mean+3*stdev))[0])/len(r1)*100 #<>
    ---> 99.7363
    

    You can verify the same for r2 and r3. We have covered most of the points about Normal Distribution in this post. We will talk about standard normal distribution in a later post. I have linked a Colab Notebook for anyone who wants to play with this.

Comments

Popular posts from this blog

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 2

In the previous post , we set up the context to utilize embeddings for categorical features. In this post, we will figure out how to create these embeddings and combine them with other continuous features to build a neural network model. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also available in Kaggle . Metadata about this dataset is available on the respective websites. To follow this post, it is recommended to download the dataset from Kaggle. Most of the features are self explanatory. Embedding Creation A few definitions first. Levels in a categorical feature represent unique values available for that categorical feature. For e.g. MARRIAGE has levels 0,1,2,3. Each level of a categorical feature is represented by a vector of numbers. So, if you stack up all the levels together and all the vectors together, you can imagine levels to be a colum...

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1 In this post, we will talk about using embeddings for categorical features using PyTorch. This post will be broken down into following parts. Dataset Download Data Understanding Data Preprocessing Embedding Creation Define Dataset and Dataloaders in PyTorch Neural Network definition in PyTorch The Training Loop Model Validation The idea about using Embeddings from Categorical Features was first mooted during a Kaggle contest and a paper was also published on this. In the context of NLP and word embeddings, we represent each word in an n dimesnional vector space. In a similar way, we can represent any categorical feature in an n dimesnional vector space as well. 1. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also av...

Standard Normal Distribution with examples using Python

Standard Normal Distribution with examples In our previous post, we talked about Normal Distribution and its properties . In this post, we extend those ideas and discuss about Standard Normal Distribution in detail. What is a Standard Normal Distribution? A Normal Distribution with mean 0 and standard deviation 1 is called a Standard Normal Distribution . Mathematicallty, it is given as below. Fig 1:Standard Normal Probability Distribution Function For comparison, have a look at the Normal Probability Distribution Function. If you substitute mean as 0 ,standard deviation as 1, you derive the standard normal probability distribution function Fig 2: Normal Probability Distribution Function Need for a standard normal probability distribution function We need to extract probability information about events that we are interested in. For this, first we need to convert any normal random variable...