Basics of Naive Bayes Algorithm in Data Science - Definition,Advantages, Disadvantages, Applications, Basic implementation

WHAT IS NAIVE BAYES?

1.     It is a classification technique.
2.     Assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
3.     Based on Bayes theorem of conditional Probability.

For example, Having a Coco Cola after eating popcorn.
Let's say P(pop)= Probability of eating popcorn.
P(coco)= Probability of drinking Coco Cola
P(coco/pop)= Probability of having Coco Cola after eating popcorn.

WHY IT IS CALLED AS NAIVE?

It is called as Naive because the output may or may not turn out to be correct.
(You will understand that in the basic implementation section.)

TYPES


Basics of Naive Bayes Algorithm in Data Science - Definition,Advantages, Disadvantages, Applications, Basic implementation


1. Gaussian Naive Based Classifier
It is used for continuous values.
It follows Normal Distribution

2. Multinomial Naive Based Classifier
It is used for discrete counts.

3. Bernoulli Naive Bayes 
Classifier
This classifier is used for binary vectors.

ADVANTAGES

1.     Fast
2.     Highly scalable.
3.     Used for binary and Multi class Classification.
4.     Great Choice for text classification.
5.     Can easily train smaller data sets.

DISADVANTAGES

Naive Bayes considers that the features are independent of each other. However in real world, features depend on each other.

APPLICATIONS

1.     Text classification
2.     Spam filtering.
3.     Sentiment Analysis.
4.     Real time prediction.
5.     Multi class prediction.
6.     Recommendation system

BASIC IMPLEMENTATION OF GAUSSIAN NAIVE BAYES

sklearn is the package which is required to implement Naive Bayes.
Before that you should have NumPy and SciPy packages in the system.

Program

#Import Library of Gaussian Naive Bayes model import imp 

from sklearn.naive_bayes import GaussianNB
import numpy as np 

#assigning predictor and target variables 
#In this example, we have given various inputs to train the model
#if a person gets marks below 10 then it will be printed as fail that is 'f'
#else it will print pass that is 'p'

marks= np.array([[15,20,10],[17,15,18], [7,0,10], [12,10,20], [22,10,4], [14,20,12], [11,11,11], [17,16,15], [12,20,4], [12,17,23], [12,6,7], 
[12,16,12]]) 
result= np.array(['f','p','f','f','f','p','p','p', 'f','p','f', 'p']) 

#Create a Gaussian Classifier 

model = GaussianNB() 

# Train the model using the training sets

model.fit(marks,result) 

#Predict Output 

prediction= model.predict([[17,16,7]])
print('if the input is ([[17,16,7]])')
print(prediction) 
prediction= model.predict([[1,6,7]])
print('if the input is ([[1,6,7]])')
print(prediction) 

#NOTE: EVEN IF BELOW MARKS ARE GREATER THAN 10, IT STILL SHOWS
#FAIL.THE REASON FOR THIS IS THAT WE HAVE NOT GIVEN ENOUGH #INPUTS TO  COMPLETELY TRAIN A MODEL

prediction= model.predict([[20,15,13]])
print('if the input is ([[20,15,13]])')
print(prediction) 

#HOWEVER, IF YOU TAKE THE INPUTS THAT WE HAVE ALREADY TAKE #INTO CONSIDERATION, THEN IT IS GOING TO SHOW PASS

prediction= model.predict([[17,15,18]])
print('if the input is ([[17,15,18]])')
print(prediction) 

prediction= model.predict([[12,16,12]])
print('if the input is ([[12,16,12]])')
print(prediction) 

Output


if the input is ([[17,16,7]])
['f']
if the input is ([[1,6,7]])
['f']
if the input is ([[20,15,13]])
['f']
if the input is ([[17,15,18]])
['p']
if the input is ([[12,16,12]])
['p']



MORE DATA SCIENCE BLOGS   CLICK HERE

Basics of Naive Bayes Algorithm in Data Science - Definition,Advantages, Disadvantages, Applications, Basic implementation