Thursday 19 June 2014

Bayes Theorem and Bayesian Statistics : Simple Understanding


This is one of the important theorem and one need to have clear understanding about it. It took a while for me to understand this in a manner in which I can correlate with a more real example instead of coin toss or a dice roll. Below is my understanding about this theorem and Statistical Paradigm. Please feel free to correct me on this. Will appreciate your comments and views.

Bayes Theorem:   This theorem is based upon Bayesian Statistics. First We need to understand what a Bayesian Statistic is ?

Consider a scenario which has random process. This process produces multiple possible outcomes. The process depends upon some unknown features/parameters.  Usually the process is described in terms of probability distribution of its outcomes which in turn depends upon some unknown features.

To visualize this situation we are organizing a football match between Brazil and Germany. So playing a football match is a process. There are multiple possible outcomes of a football match like it will get draw or who will win it or who will loose it or total number of goal scored during the match.

One way of describing the upcoming football match is to describe by Probability Distribution of Number of Goals scored by both the team together.  This distribution will give the probability of scoring 0 goal, 1 goal, 2 goal, 3 goal, 4 goal, 5 goal and so on ..for the match in question.

One way to calculate probabilities is based upon the historical data of the previous matches played between Germany and Brazil without considering parameters/features/factors which impacts the total number of goals scored like number of penalty stroke and number of penalty corner or free hit given in the match. This paradigm of Statistics is known as Frequentist Approach.  As in this case parameters that impacts total number of goals scored were ignored.  So if we want to know the probability that more than 5 goals will be scored in this match we need to look upon total number of matches played between Brazil and Germany and have to check number of goals scored in each match. But this approach won’t consider the parameters that impact number of goals. This approach makes some assumptions like:
  • ·         Underlying parameters/features are constants. i.e. in our case assumption is same number of penalty corner or penalty stroke or free hit will be awarded in this match as it was given in the earlier matches.
  • ·         Number of parameters are fixed so that no additional parameters can be considered.
  • ·         Playing a football match is repeatable random process.

While in case of Bayesian Statistics we can calculate the probability of given outcome using a given data (data contains values for parameters/features). So in case of Bayesian approach Data is fixed. And parameters are described probabilistically. So the main advantage of Bayesian approach is the ability to use the prior knowledge. Which may became reason for its criticism as understanding and assumption of prior knowledge may be different for different people so we may get different results.

 In our example of Football match between Brazil and Germany if we want to follow Bayesian Approach then probability of scoring 5 goals (also called as Posterior) will be calculated by considering different parameters. Like we might be calculating probability by considering there will be 2 penalty stroke and 3 penalty corners. So we may want to do this calculation for different combination of penalty stroke and penalty corners. And most importantly prior knowledge about this process (matches between Germany and Brazil) will be used during the calculation.

So Posterior Probability (seeing 5 goals) = ( Prior Probability * Likelihood of Parameters)/Marginal

So we need to know the prior probability. Prior probabilities are intrinsically subjective - your prior information is different from mine . So knowledge and understanding about prior probabilities could be different for different people can get different results for Bayes Theorem.

Bayes Theorem:

We need to briefly understand following terms:

Considering our football match example:

Hypothesis = Number of Goals scored will greater than 5. Denoted by “H”.

Parameters = Number of penalty stroke and penalty corners in the match. Denoted by “D”.

Posterior Probability = Defined as the probability of hypothesis being true under the given data. In our example of football match it is the probability that number of goal scores during the match is greater than 5 
given the value of penalty score and penalty corners. This value will be calculated for each possible value of penalty corner and penalty strokes.  Denoted by P(H|D).

Prior Probability = Probability of goals scored 5 without giving the number of penalty stroke and corners. This may be drawn by looking at the large sample or may be derived from expert judgment. Denoted by P(H).

Likelihood =Probability or likely values of parameters given our hypothesis is true.  In our example there are two parameters like number of penalty score and penalty corners. So likelihood will be the value of these two parameters when number of goal scored in upcoming football match greater than 5 (our hypothesis). So to calculate likelihood values we need a training data. Training data comprises of statistics about the previous matches between Brazil and Germany with details of number of penalty corners and stroke and number of goals scored during those matches. Denoted by P(D|H).

Marginal/Evidence = It is the probability of getting these particular parameter values  (values of number of penalty corners and number of penalty stroke) under all possible Hypothesis. In this case there will be two hypothesis a) our hypothesis i.e. total number of goals scored will be greater than 5 (b) Another Hypothesis would number of goals scored during the match will be less than or equal to 5. Denoted by P(D).

Now we want to predict the probability that during upcoming match number of goals will be greater than 5, if there will be 2 penalty corner and 3 penalty stroke awarded in match.

Prior Probability will be fixed depending upon our prior knowledge.

Likelihood : We need to likelihood of both the number of penalty corners being 2 and penalty stroke being 3 when number of goals scored during the match is 5.

Marginal: This will be the probability of number of penalty stroke and corner being 2 and 3 when number of goals scored will be greater than 5 and when number of goal scores is less than 5.
Theorem can be written as bellows and shows Posterior is directly proportional to prior knowledge and parameters.

           P(H|D) = (P(H)*P(D|H))/P(D)

Important Assumption: It is assumed that all the features/parameters/factors are independent. And they are not impacted by presence or absence of other feature/parameters/factors.

Bayes Rule is a scoring algorithm and it provides a probabilistic estimates. But It can be converted into a classification algorithm by selecting the hypothesis that is most probable. Or we need to select the hypothesis for which Posterior Value is maximum. This is called as Maximum Posterior or MAP decision Rule.