Hypothesis testing and Bayesian inference in Data Science

Hypothesis testing and Bayesian inference in Data Science
What is Hypothesis testing?

1.  Hypothesis is a supposition/ assumption (made on limited evidence) which is yet to be proved and which requires more investigation
2.  Hypothesis testing is used to determine outcome of the study and to test whether the hypothesis is accepted or rejected.
3.  It can be used to see which advertisements work better, sales distribution etc.
4.  There is 1 tailed test and 2 tailed test.

Errors in Hypothesis testing?

1.  IF H0 is TRUE and H0 is accepted = NO ERROR
2.  IF H0 is FALSE and H0 is rejected = NO ERROR
3.  IF H0 is TRUE and H0 is rejected = TYPE 1 ERROR
4.  IF H0 is FALSE and H0 is accepted = TYPE 2 ERROR


5 steps in Hypothesis testing:

STEP 1 : STATE NULL AND ALTERNATIVE HYPOTHESIS 
STEP 2: CHOOSE LEVEL OF SIGNIFICANCE  (α)
STEP 3: FIND THE CRITICAL VALUES 
STEP 4 : FIND TEST STATISTICS 
STEP 5: DRAW CONCLUSION

EXAMPLE 1
Average weight of the population is 100 with standard deviation 20. A researcher believes that the value has changed. A sample population of 75 has the average of 110. Is there enough evidence that it is changed?

STEP 1 : STATE NULL AND ALTERNATIVE HYPOTHESIS 

Null Hypothesis
H0: µ = 100 [currently believed to be true]

Alternative Hypothesis
H1: µ ≠ 100 [being claimed by the researcher.]

2 tailed test will be performed as H1 (alternative hypothesis) consists of ‘≠’ sign.

STEP 2: CHOOSE LEVEL OF SIGNIFICANCE  (α)
Let us consider α to be 0.05
The remaining curve are will be 1-0.05 = 0.95

Hypothesis testing and Bayesian inference in Data Science

The 2 tails will be symmetrical and will have same area and each will be of 0.025


STEP 3: FIND THE CRITICAL VALUES 

We need to calculate the z score from the table
Z values are the critical values which separates the tail area and middle area.

Hypothesis testing and Bayesian inference in Data Science

For confidence level of 95 % Critical value is 1.96 (from the table)


STEP 4 : FIND TEST STATISTICS 

Z = x̅µ/ (σ/ √ n)

x̅= average of the sample
µ= average of the population
σ= standard deviation of the population
√ n= sample size

Z = x̅µ/ (σ/ √ n)

= 110 -100/(20/√ 75) = 4.33
Our test statistic value is 4.33

STEP 5: DRAW CONCLUSION

Critical value = 1.96 which is lower than 4.33 (falls in the rejection region)

Hypothesis testing and Bayesian inference in Data Science

Therefore, Null Hypothesis is rejected
Alternative hypothesis is accepted

So, from above, we can say that there is enough evidence that the average weight is changed.




EXAMPLE 2
Average weight of the population is 100 with standard deviation 15. A researcher believes that the value is lower. Weights of 5 random adults are 70,75,85,99,109. Is there enough evidence that it is changed?


STEP 1 : STATE NULL AND ALTERNATIVE HYPOTHESIS 
Null Hypothesis
H0: µ = 100 [currently believed to be true]
Alternative Hypothesis
H1: µ < 100 [being claimed by the researcher.]

1 tailed test will be performed as H1 (alternative hypothesis) consists of ‘<’ sign.

STEP 2: CHOOSE LEVEL OF SIGNIFICANCE  (α)
Let us consider α to be 0.05
The remaining curve are will be 1-0.05 = 0.95

Hypothesis testing and Bayesian inference in Data Science


STEP 3: FIND THE CRITICAL VALUES 

Area is 0.05
Number of sample size = 5
Df < sample size
So let’s take df = 4

Hypothesis testing and Bayesian inference in Data Science

So, critical value = 2.132



STEP 4 : FIND TEST STATISTICS 

Z = x̅µ/ (σ/ √ n)

x̅= average of the sample
µ= average of the population
σ= standard deviation of the population
√ n= sample size

x̅  =  70+ 75+85+99+109 / 5 =87.6

Z = x̅µ/ (σ/ √ n)

= 87.6 – 100/ (15/ √5) = -1.843
Our test statistic value is  -1.843

STEP 5: DRAW CONCLUSION

Since test statistic lies somewhere between critical value and the middle of the curve , it is not in the rejection region
Therefore, Null Hypothesis is accepted
Alternative hypothesis is rejected

So, from above, we accept   H0: µ = 100


What is Bayesian Inference? 
Bayesian Inference is a method of statistical inference.
It is used to update the probability of an hypothesis so that more information is available
Bayes Theorem is used which is based on conditional probability

Learn about Bayes Theorem

 For more Data Science blogs CLICK HERE