Skip to main content

Standard Normal Distribution with examples using Python

Continuous Uniform Probability Distribution with Python

Continuous Uniform Probability Distribution

In an earlier post , we had discussed about Random Variables and what is a continuous random variable. An extension of those ideas comes in the form of a distribution called a Continuous Uniform Probability Distribution. In a continuous probability distribution, we are not interested in exact events. As an example of this, we would not be so bothered to know the chances of a duration of flight to be exactly 180 minutes or 177 minutes. We would be more keen in knowing the chances of a flight duration being between 150 minutes to 185 minutes. Time intervals are more important to extract probability information in this case.

The simplest form of continuous probability distribution function is a unform probability distribution function. It is also called a rectangular distribution function due to its inherent rectangular shape.

Let's assume that the flight duration from Bangalore to Delhi roughly takes on average 200 minutes. We can safely assume that flight duration for this route to be between 150 minutes to 250 minutes. The reason is , there could be scenarios where the flight has been given early clearance by the ATC or due to congestion at the airport, the take off to Delhi could be delayed. So, flight duration from 150 to 250 minutes is equally probable. Let's see all of this in action using Python.

Let's import all required libraries first. We will primarily using the stats functionality from Scipy for this.


import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import uniform
import scipy 

Let's assume that the flight duration from Bangalore to Delhi is uniformly distributed between 150 to 250 minutes.


x= np.linspace(0,400,1000000)##Create evenly spaced numbers from 0 to 400
r = uniform.rvs(loc=150,scale=100,size=10000) ###Create a number of samples which follows the uniform distribution

uniform.rvs creates a lot of samples which follow the uniform distrbution. Think about loc as the starting point where the probability is non zero and scale as width of the distribution.

We can verify if samples have been created as we had wished for. We will see that there are no samples below 150 and above 250 minutes.


np.sum(r>250), np.sum(r<150)###<>
--->(0,0)

We can have a look at how the distribution looks like below.

  
fig, ax = plt.subplots(1, 1)
ax.set_xlim([0, 400])
ax.hist(r, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.plot(x, uniform.pdf(x,loc=150,scale=100),
       'r-', lw=5, alpha=0.6, label='uniform pdf')
Fig 1: Continuous Uniform Probability Distribution

We see that the distribution has a nice rectangular shape to it. Formally, this is how it is described mathematically as given below

Fig 2: Mathematical Representation of Continuous Uniform Probability Distribution

What would be the area of such a rectangle? No prize for guessing !!!


height = 0.01
width = 250-150
print(height * width)
---> 1

Area under a continuous probability distribution function is always 1.

Cumulative Density Function (CDF)

Let's stick with the same example. CDF represents the probability of an event less than equal to the value. Let's say, I want to find the probability of observing flight duration less than or equal to 200 minutes.


uniform.cdf(x=200,loc=150,scale=100)
---> 0.5  

There is a 50% chance of a flight duration less than or equal to 200 minutes. What is the Probability of Flight Duration between 200 and 225 minutes?


uniform.cdf(225,150,100) - uniform.cdf(200,150,100)
--->0.25

There is a 25% chance of a flight duration between 200 and 225 minutes. How does the CDF look for this scenario.


plt.plot(x,uniform.cdf(x,150,100))
plt.xlabel('X')
plt.ylabel('Cumulative Probability')
plt.show()  
Fig 3: Cumulative Probability Density Function

Finally, if we want to find mean, variance and standard deviation of this distribution, this is how we calculate.

  
uniform.mean(loc=150,scale=100), uniform.var(loc=150,scale=100), uniform.std(loc=150,scale=100)

---> (200.0, 833.3333333333333, 28.867513459481287)

Here is the Colab Link if you want to play with this.

Comments

Popular posts from this blog

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 2

In the previous post , we set up the context to utilize embeddings for categorical features. In this post, we will figure out how to create these embeddings and combine them with other continuous features to build a neural network model. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also available in Kaggle . Metadata about this dataset is available on the respective websites. To follow this post, it is recommended to download the dataset from Kaggle. Most of the features are self explanatory. Embedding Creation A few definitions first. Levels in a categorical feature represent unique values available for that categorical feature. For e.g. MARRIAGE has levels 0,1,2,3. Each level of a categorical feature is represented by a vector of numbers. So, if you stack up all the levels together and all the vectors together, you can imagine levels to be a colum...

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1

How to adopt Embeddings for Categorical features in Tabular Data using PyTorch's nn.Embedding( )-- Part 1 In this post, we will talk about using embeddings for categorical features using PyTorch. This post will be broken down into following parts. Dataset Download Data Understanding Data Preprocessing Embedding Creation Define Dataset and Dataloaders in PyTorch Neural Network definition in PyTorch The Training Loop Model Validation The idea about using Embeddings from Categorical Features was first mooted during a Kaggle contest and a paper was also published on this. In the context of NLP and word embeddings, we represent each word in an n dimesnional vector space. In a similar way, we can represent any categorical feature in an n dimesnional vector space as well. 1. Dataset Download We will utilize the UCI machine learning repo which has a dataset on credit card default for customers in Taiwan. This dataset is also av...

Standard Normal Distribution with examples using Python

Standard Normal Distribution with examples In our previous post, we talked about Normal Distribution and its properties . In this post, we extend those ideas and discuss about Standard Normal Distribution in detail. What is a Standard Normal Distribution? A Normal Distribution with mean 0 and standard deviation 1 is called a Standard Normal Distribution . Mathematicallty, it is given as below. Fig 1:Standard Normal Probability Distribution Function For comparison, have a look at the Normal Probability Distribution Function. If you substitute mean as 0 ,standard deviation as 1, you derive the standard normal probability distribution function Fig 2: Normal Probability Distribution Function Need for a standard normal probability distribution function We need to extract probability information about events that we are interested in. For this, first we need to convert any normal random variable...