WHAT DO YOU MEAN BY EXPLORING THE DATA?
1.
In
data science, initially problem is identified.
2.
After
the problem is identified, to answer the problem, we need to build models.
3.
Here
the first step is exploration of data.
4.
Data
can be 1 dimensional data (1D), 2-dimensional data (2D), multidimensional data
(nD).
5.
1D
data can be collection of numbers, time spent on web page, blogs visited etc.
6.
All
the statistics is applied here – minimum, maximum, mean, standard deviation.
7.
Visualization
of data can be used to enhance data understanding. (bar chart, pie chart,
histogram, scatter plots etc.)
8. Data visualization can be done using matplotlib in Python.
8. Data visualization can be done using matplotlib in Python.
WHAT IS WEB SCRAPING?
Web Scraping is already covered in the earlier blog.
WHAT IS DATA CLEANING?
1.
In
real world, there is lots of data and the data is really dirty.
2.
There
can be bad or missing data or data having errors.
3.
Data
cleaning is the process to detect corrupt, inaccurate, incorrect, incomplete data.
4.
After
cleaning the data, data should be made consistent.
5.
Data
cleansing can be done using pandas library
in Python.
WHAT IS DATA MUNGING?
1.
It
is also called as Data wrangling.
2.
Data
Munging is nothing but cleaning the messy data.
3.
It
is the process of transforming and mapping the data from 1 format to other.
It is used to clean
the raw data which will be used for analysis purpose.
WHAT IS DATA MANIPULATION?
Data manipulation is
adding, deleting, modifying the data; manipulating the data as per the requirements.
See : NumPy tutorial
DIMNESIONALITY REDUCTION
1.
It
is the process of reducing the number of random variables under consideration.
2.
It
is divided into feature selection and feature extraction.
3.
Dimensionality
reduction decreases the time and the storage space required.
4.
Easy
to visualize data.