Working with data in Data Science

Working with data in Data Science

Working with data in Data Science


WHAT DO YOU MEAN BY EXPLORING THE DATA?

1.  In data science, initially problem is identified.
2.  After the problem is identified, to answer the problem, we need to build models.
3.  Here the first step is exploration of data.
4.  Data can be 1 dimensional data (1D), 2-dimensional data (2D), multidimensional data (nD).
5.  1D data can be collection of numbers, time spent on web page, blogs visited etc.
6.  All the statistics is applied here – minimum, maximum, mean, standard deviation.
7.  Visualization of data can be used to enhance data understanding. (bar chart, pie chart, histogram, scatter plots etc.)
8. Data visualization can be done using matplotlib in Python.


WHAT IS WEB SCRAPING?

Web Scraping is already covered in the earlier blog.


WHAT IS DATA CLEANING?

1.  In real world, there is lots of data and the data is really dirty.
2.  There can be bad or missing data or data having errors.
3.  Data cleaning is the process to detect corrupt, inaccurate, incorrect, incomplete data.
4.  After cleaning the data, data should be made consistent.
5.  Data cleansing can be done using pandas library in Python.

WHAT IS DATA MUNGING?

1.  It is also called as Data wrangling.
2.  Data Munging is nothing but cleaning the messy data.
3.  It is the process of transforming and mapping the data from 1 format to other.
It is used to clean the raw data which will be used for analysis purpose.

WHAT IS DATA MANIPULATION?

Data manipulation is adding, deleting, modifying the data; manipulating the data as per the requirements.



DIMNESIONALITY REDUCTION

1.  It is the process of reducing the number of random variables under consideration.
2.  It is divided into feature selection and feature extraction.
3.  Dimensionality reduction decreases the time and the storage space required.
4.  Easy to visualize data.