In an earlier post , we had discussed about Random Variables and what is a continuous random variable. An extension of those ideas comes in the form of a distribution called a Continuous Uniform Probability Distribution. In a continuous probability distribution, we are not interested in exact events. As an example of this, we would not be so bothered to know the chances of a duration of flight to be exactly 180 minutes or 177 minutes. We would be more keen in knowing the chances of a flight duration being between 150 minutes to 185 minutes. Time intervals are more important to extract probability information in this case.
The simplest form of continuous probability distribution function is a unform probability distribution function. It is also called a rectangular distribution function due to its inherent rectangular shape.
Let's assume that the flight duration from Bangalore to Delhi roughly takes on average 200 minutes. We can safely assume that flight duration for this route to be between 150 minutes to 250 minutes. The reason is , there could be scenarios where the flight has been given early clearance by the ATC or due to congestion at the airport, the take off to Delhi could be delayed. So, flight duration from 150 to 250 minutes is equally probable. Let's see all of this in action using Python.
Let's import all required libraries first. We will primarily using the stats functionality from Scipy for this.
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import uniform
import scipy
Let's assume that the flight duration from Bangalore to Delhi is uniformly distributed between 150 to 250 minutes.
x= np.linspace(0,400,1000000)##Create evenly spaced numbers from 0 to 400
r = uniform.rvs(loc=150,scale=100,size=10000) ###Create a number of samples which follows the uniform distribution
uniform.rvs creates a lot of samples which follow the uniform distrbution. Think about loc as the starting point where the probability is non zero and scale as width of the distribution.
We can verify if samples have been created as we had wished for. We will see that there are no samples below 150 and above 250 minutes.
np.sum(r>250), np.sum(r<150)###<>
--->(0,0)
We can have a look at how the distribution looks like below.
fig, ax = plt.subplots(1, 1)
ax.set_xlim([0, 400])
ax.hist(r, density=True, bins='auto', histtype='stepfilled', alpha=0.2)
ax.plot(x, uniform.pdf(x,loc=150,scale=100),
'r-', lw=5, alpha=0.6, label='uniform pdf')
We see that the distribution has a nice rectangular shape to it. Formally, this is how it is described mathematically as given below
What would be the area of such a rectangle? No prize for guessing !!!
height = 0.01
width = 250-150
print(height * width)
---> 1
Area under a continuous probability distribution function is always 1.
Cumulative Density Function (CDF)
Let's stick with the same example. CDF represents the probability of an event less than equal to the value. Let's say, I want to find the probability of observing flight duration less than or equal to 200 minutes.
uniform.cdf(x=200,loc=150,scale=100)
---> 0.5
There is a 50% chance of a flight duration less than or equal to 200 minutes. What is the Probability of Flight Duration between 200 and 225 minutes?
uniform.cdf(225,150,100) - uniform.cdf(200,150,100)
--->0.25
There is a 25% chance of a flight duration between 200 and 225 minutes. How does the CDF look for this scenario.
plt.plot(x,uniform.cdf(x,150,100))
plt.xlabel('X')
plt.ylabel('Cumulative Probability')
plt.show()
Finally, if we want to find mean, variance and standard deviation of this distribution, this is how we calculate.
uniform.mean(loc=150,scale=100), uniform.var(loc=150,scale=100), uniform.std(loc=150,scale=100)
---> (200.0, 833.3333333333333, 28.867513459481287)
Here is the Colab Link if you want to play with this.
Comments
Post a Comment