This is one of the important theorem and one need to have
clear understanding about it. It took a while for me to understand this in a
manner in which I can correlate with a more real example instead of coin toss
or a dice roll. Below is my understanding about this theorem and Statistical Paradigm. Please feel free to correct me on this. Will appreciate your comments and views.
Bayes Theorem: This theorem is based upon Bayesian Statistics.
First We need to understand what a Bayesian Statistic is ?
Consider a scenario which has random process. This process
produces multiple possible outcomes. The process depends upon some unknown
features/parameters. Usually the process
is described in terms of probability distribution of its outcomes which in turn depends upon
some unknown features.
To visualize this situation we are organizing a football match
between Brazil and Germany. So playing a football match is a process. There are multiple possible outcomes of a football match like it will get draw or who will win it or who
will loose it or total number of goal scored during the match.
One way of describing the upcoming football match is to
describe by Probability Distribution of Number of Goals scored by both the team
together. This distribution will give
the probability of scoring 0 goal, 1 goal, 2 goal, 3 goal, 4 goal, 5 goal and
so on ..for the match in question.
One way to calculate probabilities is based upon the
historical data of the previous matches played between Germany and Brazil
without considering parameters/features/factors which impacts the total number of goals scored like number of penalty stroke and number of penalty corner or free hit given in
the match. This paradigm of Statistics is known as Frequentist Approach. As
in this case parameters that impacts total number of goals scored were ignored. So if we want to know the probability that
more than 5 goals will be scored in this match we need to look upon total
number of matches played between Brazil and Germany and have to check number of goals
scored in each match. But this approach won’t consider the parameters that
impact number of goals. This approach makes some assumptions like:
- · Underlying parameters/features are constants. i.e. in our case assumption is same number of penalty corner or penalty stroke or free hit will be awarded in this match as it was given in the earlier matches.
- · Number of parameters are fixed so that no additional parameters can be considered.
- · Playing a football match is repeatable random process.
While in case of
Bayesian Statistics we can calculate the probability of given outcome using
a given data (data contains values for parameters/features). So in case of
Bayesian approach Data is fixed. And parameters are described
probabilistically. So the main advantage of Bayesian approach is the ability to
use the prior knowledge. Which may became reason for its criticism as understanding
and assumption of prior knowledge may be different for different people so we
may get different results.
In our example of
Football match between Brazil and Germany if we want to follow Bayesian
Approach then probability of scoring 5 goals (also called as Posterior) will be
calculated by considering different parameters. Like we might be calculating
probability by considering there will be 2 penalty stroke and 3 penalty
corners. So we may want to do this calculation for different combination of
penalty stroke and penalty corners. And most importantly prior knowledge about
this process (matches between Germany and Brazil) will be used during the calculation.
So Posterior Probability (seeing 5 goals) = ( Prior
Probability * Likelihood of Parameters)/Marginal
So we need to know the prior probability. Prior probabilities are intrinsically subjective - your prior information is different from mine . So knowledge and understanding about prior probabilities could be different for different people can get different results for Bayes Theorem.
Bayes Theorem:
We need to briefly understand following terms:
Considering our football match example:
Hypothesis = Number of Goals scored will greater than 5. Denoted by “H”.
Parameters = Number of penalty stroke and penalty corners in
the match. Denoted by “D”.
Posterior Probability = Defined as the probability of
hypothesis being true under the given data. In our example of football match it
is the probability that number of goal scores during the match is greater than
5
given the value of penalty score and penalty corners. This value will be
calculated for each possible value of penalty corner and penalty strokes. Denoted by P(H|D).
Prior Probability = Probability of goals scored 5 without
giving the number of penalty stroke and corners. This may be drawn by looking
at the large sample or may be derived from expert judgment. Denoted
by P(H).
Likelihood =Probability or likely values of parameters given
our hypothesis is true. In our example
there are two parameters like number of penalty score and penalty corners. So
likelihood will be the value of these two parameters when number of goal scored
in upcoming football match greater than 5 (our hypothesis). So to calculate
likelihood values we need a training data. Training data comprises of
statistics about the previous matches between Brazil and Germany with details
of number of penalty corners and stroke and number of goals scored during those
matches. Denoted by P(D|H).
Marginal/Evidence = It is the probability of getting these
particular parameter values (values of
number of penalty corners and number of penalty stroke) under all possible
Hypothesis. In this case there will be two hypothesis a) our hypothesis i.e.
total number of goals scored will be greater than 5 (b) Another Hypothesis
would number of goals scored during the match will be less than or equal to 5. Denoted
by P(D).
Now we want to predict the probability that during upcoming
match number of goals will be greater than 5, if there will be 2 penalty corner
and 3 penalty stroke awarded in match.
Prior Probability will be fixed depending upon our prior
knowledge.
Likelihood : We need to likelihood of both the number of
penalty corners being 2 and penalty stroke being 3 when number of goals scored
during the match is 5.
Marginal: This will be the probability of number of penalty
stroke and corner being 2 and 3 when number of goals scored will be greater
than 5 and when number of goal scores is less than 5.
Theorem can be written as bellows and shows Posterior is
directly proportional to prior knowledge and parameters.
P(H|D) = (P(H)*P(D|H))/P(D)
Important Assumption: It is assumed that all the
features/parameters/factors are independent. And they are not impacted by
presence or absence of other feature/parameters/factors.
Bayes Rule is a scoring algorithm and it provides a probabilistic estimates.
But It can be converted into a classification algorithm by selecting the
hypothesis that is most probable. Or we need to select the hypothesis for which
Posterior Value is maximum. This is called as Maximum Posterior or MAP decision
Rule.