Web Scrapping using Python in Data Science Example program 2

Before moving to the code Learn HOW WEB SCRAPING WORKS?

Let's print the description of Daily Experience

#import requests and BeautifulSoup libraries required for web scrapping
import requests
from bs4 import BeautifulSoup

url='http://harshanavalkar.blogspot.com'
page=requests.get(url)
status =page.status_code
text=page.text

print('\n Output after status_code \n')
print(status)
print('\n Output after page.text \n')
print(text)

soup = BeautifulSoup(page.text, 'html.parser')

print('\n Output after prettify()\n ')
print(soup.prettify())

f1 =soup.find_all('p')
f2=soup.find_all('p')[0].get_text()

print('\n Output after find all \n')
print(f1)

print('\n Output after get text \n')
print(f2)

with open("wb.txt",'w') as w1:
w1.write(f2)
with open("wb.txt",'r') as r:
with open("Dailyblog.txt",'w') as w:
for line in r:
w.write(line)

OUTPUT

Output after status_code
200

Output after page.text [ Entire output of page.text is huge. The following output does not contain the entire output.]
g", "Sept", "Oct", "Nov", "Dec"];
var more_text = "View More";
var comments_text = "<span>Post </span>Comment";
var pagenav_prev = "Previous";
var pagenav_next = "Next";
//]]>
</script>
<script type='text/javascript'>
//<![CDATA[
// Plugin: SelectNav.js ~ url: https://github.com/lukaszfiszer/selectnav.js
window.selectnav=function(){"use strict";var e=function(e,t){function c(e){var t;if(!e)e=window.event;if(e.target)t=e.target;else if(e.srcElement)t=e.srcElement;if(t.nodeType===3)t=t.parentNode;if(t.value)window.location.href=t.value}function h(e){var t=e.nodeName.toLowerCase();return t==="ul"||t==="ol"}function p(e){for(var t=1;document.getElementById("selectnav"+t);t++);return e?"selectnav"+t:"selectnav"+(t-1)}function d(e){a++;var t=e.children.length,n="",l="",c=a-1;if(!t){return}if(c){while(c--){l+=o}l+=" "}for(var v=0;v<t;v++){var m=e.children[v].children[0];if(typeof m!=="undefined"){var g=m.innerText||m.textContent;var y="";if(r){y=m.className.search(r)!==-1||m.parentNode.className.search(r)!==-1?f:""}if(i&&!y){y=m.href===document.URL?f:""}n+='<option value="'+m.href+'" '+y+">"+l+g+"</option>";if(s){var b=e.children[v].children[1];if(b&&h(b)){n+=d(b)}}}}if(a===1&&u){n='<option value="">'+u+"</option>"+n}if(a===1){n='<select class="selectnav" id="'+p(true)+'">'+n+"</select>"}a--;return n}e=document.getElementById(e);if(!e){return}if(!h(e)){return}if(!("insertAdjacentHTML"in window.document.documentElement)){return}document.documentElement.className+=" js";var n=t||{},r=n.activeclass||"active",i=typeof n.autoselect==="boolean"?n.autoselect:true,s=typeof n.nested==="boolean"?n.nested:true,o=n.indent||"-",u=n.label||"Menu",a=0,f=" selected ";e.insertAdjacentHTML("afterend",d(e));var l=document.getElementById(p());if(l.addEventListener){l.addEventListener("change",c)}if(l.attachEvent){l.attachEvent("onchange",c)}return l};return function(t,n){e(t,n)}}();

Output after prettify is applied [ Entire output of page.text is huge. The following output does not contain the entire output.]

resources.blogblog.com/img/icon18_wrench_allbkg.png" width="18"/>
</a>
</span>
</span>
<div class="clear">
</div>
</div>
</div>
</div>
</div>
<div id="lowerbar-wrapper">
<div class="lowerbar ty-trigger section" id="Footer widget(2)">
<div class="widget FollowByEmail" data-version="1" id="FollowByEmail1">
<h2 class="title">
Newsletter
</h2>
<div class="widget-content">
<div class="follow-by-email-inner">
<form action="https://feedburner.google.com/fb/a/mailverify" method="post" onsubmit='window.open("https://feedburner.google.com/fb/a/mailverify?uri=blogspot/GNWbz", "popupwindow", "scrollbars=yes,width=550,height=520"); return true' target="popupwindow">
<table width="100%">
<tr>
<td>
<input class="follow-by-email-address" name="email" placeholder="Email address..." type="text"/>
</td>
<td width="64px">
<input class="follow-by-email-submit" type="submit" value="Submit"/>
</td>
</tr>
</table>
<input name="uri" type="hidden" value="blogspot/GNWbz"/>
<input name="loc" type="hidden" value="en_US"/>
</form>
</div>
</div>
<span class="item-control blog-admin">
</span>
</div>
</div>
</div>
<div id="lowerbar-wrapper">

Output after find_all

[<p class="description"><span>A personal blog about poems, photography, fiction, collaboration of writing and drawing, travel, art, lifestyle, Engineering blogs and India by Harsha Navalkar</span></p>, <p>
#import requests and BeautifulSoup libraries required for web scrapping import requests from bs4 import BeautifulSoup url='http:/...
</p>, <p></p>, <p></p>, <p></p>, <p></p>, <p></p>, <p class="contact-form-error-message" id="ContactForm1_contact-form-error-message"></p>, <p class="contact-form-success-message" id="ContactForm1_contact-form-success-message"></p>]

Output after get_text

A personal blog about poems, photography, fiction, collaboration of writing and drawing, travel, art, lifestyle, Engineering blogs and India by Harsha Navalkar

Output stored in Dailyblog.txt

WEB SCRAPING EXAMPLE 1

Concealed Chambers

Search This Blog

Web Scrapping using Python in Data Science Example program 2