WHAT IS WEB SCRAPING?
What is web scraping in data science with example programs
- The web contains large amount of data that can be structured or unstructured.
- Each website has a different layout, style etc.
- Web scraping is a technique used to extract information from websites.
PYTHON LIBRARIES USED
requests
- The requests library fetches the web page.
- The url of the web page should be mentioned in the geturl() function.
- If the page is downloaded successfully 200 is returned.
- Then use page.text to get the content of the web page along with the HTML tags.
- The content is not in a readable format.
- It becomes difficult to read because there is no alignment, spacing and indentation.
- Therefore, BeautifulSoup library is used.
BeautifulSoup
- Put data out of XML and HTML files.
- It prettifies the content and gives proper spacing, alignment and indentation.
- prettify() method in BeautifulSoup is used for this task.
- Then finally, extract single tags using BeautifulSoup "find_all" method.
PROGRAMS
Here are some easy programs on web scraping to get you started.