Submission Guidelines

If you are an artist or a photographer and would like to participate in this little endeavour, just shoot me an email wezel2@gmail.com, or respond to this piece in the Comments section. There do not…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Random Variables and Probability Distributions

In this blog I will be covering the basic concepts of Random Variables and their Probability Distributions.

An experiment is any activity that can be repeated infinite number of times and has a well defined set of possible outcomes. The set of all possible outcomes is the Sample Space of the Experiment. And an event is a subset of the Sample Space.

Consider the rolling of a die. Rolling of a die is an experiment that can be repeated any number of times. There can only be six possible outcomes when you roll a die. So, the sample space will be {1, 2, 3, 4, 5, 6}. And the event could be getting an even number when you roll a die. Hence the event will be a subset of the Sample Space, in this case {2, 4, 6}.

When you roll a die, even though we know the set of possible outcomes, the outcome itself is random. And each outcome is associated with a certain probability. A random variable is nothing but the outcome along with the probability of that outcome. In our example, the random variable can take on any of the values in the Sample Space and each outcome has a probability of 1/6.

Mathematically, a random variable is a function that assigns a real number to each outcome in the sample space of a random experiment. A random variable is denote by an upper case letter such as X. After an experiment is conducted, the measured value of the random variable is denoted by a lowercase letter such as x.

Now that we’ve learnt what a random variable is, let’s understand the different types of random variables and their distributions.

A discrete random variable is a random variable with a finite or countably infinite range.
Examples: The outcomes of rolling a die is countably finite. Where as, if we consider the set of all natural numbers, it is countably infinite. However, it is still a discrete random variable. Another example of a countably infinite random variable would be the number of stars in the universe.

We already know that a random variable has a certain probability associated with it. The probability distribution of a random variable X gives us the probabilities associated with each of the possible values X can take. In case of rolling of a die, the probability of each value X can take is the same. So the probability distribution in this case will be:
P(X=1) = 1/6, P(X=2) = 1/6 and so on.

Note that the values of x take on all possible cases. And the sum of the probabilities add to 1.
Mathematically, this can be written as f(x) = P(X = x). The set of ordered pairs (x, f(x)) is called the probability function, probability mass function or probability distribution function of the discrete random variable X. f(x) is considered a probability mass function if it satisfies the following conditions:

However, many times we may wish to compute the probability that the random variable X be lesser than or equal to some real number x. Writing
F(x) = P(X ≤ x) for every real number x, we define F(x) to be the cumulative distribution function of the random variable X.

When the range of a random variable is on a continuous scale, it is called a continuous random variable.
Examples: Height, Weight, Amount of rainfall, etc.

The probability of a continuous random variable assuming exactly any of its values is 0. Hence, the probability distribution for a continuous random variable cannot be given in tabular form. The probability for a continuous random variable is always computed at intervals : P(a≤X≤b).

The probability distribution of a continuous random variable can be stated as a formula; and f(x) is called the probability density function, or simply a density function, of X. In case of a continuous random variable, f(x) is considered a probability density function if it satisfies the following conditions:

And the cumulative distribution function F(x) of a continuous random variable is given by:

Until now, we considered the outcomes of an experiment to have values assumed by a single random variable. But in real life scenarios, we might have to record outcomes based on several random variables.

For example, in a study to predict if a person susceptible to diabetes, we might need to consider multiple variables like weight, family heredity, level of activity etc.

If X and Y are two discrete random variables, the probability distribution for their simultaneous occurrence is represented as f(x,y). This function is referred as joint probability distribution function of X and Y.

The function is a joint probability distribution or mass function if:

Given the joint probability distribution of X and Y, we can obtain the probability distribution of X alone by summing f(x,y) over the values of Y; and similarly we can obtain the probability distribution of Y alone by summing f(x,y) over the values of X. These distributions are called the marginal probability distributions of X and Y respectively.

When X and Y are continuous random variables, the joint density function is given by f(x,y) such that:

And the marginal distributions of X and Y are give by:

We might also want to calculate the condition probability. That is the probability of event X occurring given that event Y has already occurred.
Example: If we draw an Ace from a pack of cards, what is the probability that we will draw an Ace for the second time?

In such cases, the conditional probability can b calculated using the joint probability and the marginal probability. If X and Y are two random variables(discrete or continuous):

The conditional distribution of Y given that X=x is give by:

And similarly, the conditional distribution of X given Y=y is:

Just like we can calculate the average of a given set of points, we can calculate the most expected value/center/average of a random variable X or the mean of the probability distribution of X. This is denoted by E(X) or the Greek letter Mu.

For a discrete random variable:

For a continuous random variable:

In the case of joint distributions the expectation is given by:
For a discrete case:

For a continuous case:

In order to understand the variability of the distribution, we have to compute the variance. This will tell us the spread of values around the mean.

The variance is given by:

We already know how to calculate the E(X). Similarly we can calculate E(X²).

So far we have learnt about Discrete and Continuous Random Variables and their probability distributions. Often, the outcomes of experiments can be observed to have the same general type of behavior. This behavior can prove to be important in describing several real life random phenomena. So let’s go over some of the important discrete and continuous distributions.

The uniform distribution consists of the simplest random variable which has an equal probability for all it’s outcomes.
Examples: Rolling of a fair die -the probability of occurrence of each number is equal to 1/6.
Tossing a fair coin -the probability of getting heads is equal to the probability of getting tails.
Therefore, for a uniform distribution, f(x) = 1/n, where n is number of possible outcomes. The mean and variance are given by:

If a random variable can only have two outcomes (usually success or failure) then it follows a binomial distribution. The most obvious example would be the outcome of a test -whether you test positive or negative for Cancer.

The mean and variance of Binomial Distribution are given by:

Now if you were to conduct a sequence of the same experiments or trials, and outcome of each experiment can have only 2 values — failure or success, this would comprise of a Bernoulli Process. And for each trial, the probability of success remains the same.

Consider a Binomial Distribution. And let us repeat the binomial experiment until we get a certain number of successes. We are now interested in kth success occurring in the xth trial. Experiments of this kind follow a negative binomial distribution.

Let us understand this better with the help of an example. Consider a study on the success of a drug. Say that we are interested in knowing the probability of the 3rd patient to have a positive effect and also the patient is the 10th patient to receive the drug. So in this case we are interested in knowing when the success occurred as well the number of the success.

The probability depends on the number of successes as well as the probability of success on a given trial. If the repeated independent trials result in a success with probability p and a failure with probability q = 1-p, then the probability distribution of the random variable X, the number of the trial on which the kth success occurs is :

Taking the same drug study as an example, say we are only interested in knowing when the first success occurs, in this case, we can use the geometric distribution. The geometric distribution is a special case of the negative binomial distribution. The probability distribution is given by:

The mean and variance of a random variable following the geometric distribution is given by:

When you are interested in the number of outcomes that occur during a given time interval, we can use the Poisson Distribution. The time interval can be of any length — a day, a week, a month, or a year or more. The random variable X represents the number of observations.
Examples: Number of people dying in a year due to an infectious disease, number of calls received in a call center on a particular day.

The probability distribution of a Poisson random variable X is given by:

where t is the time interval and lambda is the average number of outcomes in unit time.

For the Poisson Random Variable both Expectation and Variance are same as Lambda.

Similar to the discrete case, one of the simplest continuous distributions is the uniform distributions. We know that the probability for a Continuous Random Variable is always defined over an interval. For a uniform continuous random variable, the density is uniform in a closed interval, say [A, B]. The density function is given by:

Applications of the Uniform Distribution are not as abundant as other distributions as it is based on the assumption that the probability of X in an interval of fixed length is constant.

The normal distribution is one of most important distributions in the field of statistics. It’s curve called the normal curve takes on a bell-shape and the normal distribution is a symmetrical distribution. There are many applications of normal distribution -rainfall studies, meteorological studies etc. Th normal distribution is often referred to as the Gaussian Distribution.

The density of the normal random variable is given by:

The normal curve

The distribution of a normal distribution with mean 0 and variance 1 is called a standard normal distribution. We can transform all the observations of any normal random variable X into a new set of observations of a new normal random variable Z with mean 0 and variance 1.

Once we reduce transform the observations of X to Z, we can use the Z tables to give us the probability.

The exponential distribution is a special case of the Gamma Distribution. Both have a large number of applications. Time between arrivals at a service facility, time to failure of a component can be modeled by exponential distribution.

The density function for gamma distribution is given by:

alpha is the shape parameter and beta is the rate parameter.

When alpha = 1 in the gamma distribution, it is called the exponential distribution.

Density function for exponential distribution:

The chi-squared is another special case of gamma distribution and it is obtained by letting α = v/2 and β = 2, where v is called the degrees of freedom.

The density function is given by:

In this blog, I have tried to explain the idea of random variables and provide a brief overview into the vast world of distributions in statistics. To better understand the applications, the references mentioned will be very useful.

Walpole, Myers, Myers, & Ye. (2012). Probability & Statistics for Engineers & Scientists. Prentice Hall
Montgomery, & Runger. (2003). Applied Statistics & Probability for Engineers. John Wiley & Sons, Inc.

Add a comment

Related posts:

Stop Drinking and Lose Weight

One of the greatest benefits of quitting drinking or taking a prolonged break? Well that’s hard to say because, honestly, there are so many of them (pervasive joy, well-being, deep sleep, and feeling…

Secret History

Secret history harvesting the heart Card catalog the holding net Trace elements of flamed alphabets More human than humane Night woman harvesting the heart Blue gemini hide away A drop of midnight…

How to Level Up Your Freelance Career

I have been freelance for eight years and it’s been a rollercoaster. When I started out, I’d just bought a house and there were times where my partner had to pay for my share of the mortgage. But…