Web Scrapping using Python in Data Science Example program 2

Web Scrapping using Python in Data Science Example program 2 Before moving to the code Learn HOW WEB SCRAPING WORKS?

Let's print the description of Daily Experience

Web Scrapping using Python in Data Science Example program 2


#import requests and BeautifulSoup libraries required for web scrapping
import requests
from bs4 import BeautifulSoup

url='http://harshanavalkar.blogspot.com'
page=requests.get(url)
status =page.status_code
text=page.text

print('\n Output after status_code \n')
print(status)
print('\n Output after page.text \n')
print(text)

soup = BeautifulSoup(page.text, 'html.parser')

print('\n Output after prettify()\n ')
print(soup.prettify())

f1 =soup.find_all('p')
f2=soup.find_all('p')[0].get_text()

print('\n Output after find all \n') 
print(f1)

print('\n Output after get text \n')
print(f2)


with open("wb.txt",'w') as w1:
 w1.write(f2)
with open("wb.txt",'r') as r:
   with open("Dailyblog.txt",'w') as w:
    for line in r:
       w.write(line)

OUTPUT 

Output after status_code 
200

Output after page.text  [ Entire output of page.text is huge. The following output does not contain the entire output.]
g", "Sept", "Oct", "Nov", "Dec"];
var more_text = "View More";
var comments_text = "<span>Post </span>Comment";
var pagenav_prev = "Previous";
var pagenav_next = "Next";
//]]>
</script>
<script type='text/javascript'>
//<![CDATA[
// Plugin: SelectNav.js ~ url: https://github.com/lukaszfiszer/selectnav.js
window.selectnav=function(){"use strict";var e=function(e,t){function c(e){var t;if(!e)e=window.event;if(e.target)t=e.target;else if(e.srcElement)t=e.srcElement;if(t.nodeType===3)t=t.parentNode;if(t.value)window.location.href=t.value}function h(e){var t=e.nodeName.toLowerCase();return t==="ul"||t==="ol"}function p(e){for(var t=1;document.getElementById("selectnav"+t);t++);return e?"selectnav"+t:"selectnav"+(t-1)}function d(e){a++;var t=e.children.length,n="",l="",c=a-1;if(!t){return}if(c){while(c--){l+=o}l+=" "}for(var v=0;v<t;v++){var m=e.children[v].children[0];if(typeof m!=="undefined"){var g=m.innerText||m.textContent;var y="";if(r){y=m.className.search(r)!==-1||m.parentNode.className.search(r)!==-1?f:""}if(i&&!y){y=m.href===document.URL?f:""}n+='<option value="'+m.href+'" '+y+">"+l+g+"</option>";if(s){var b=e.children[v].children[1];if(b&&h(b)){n+=d(b)}}}}if(a===1&&u){n='<option value="">'+u+"</option>"+n}if(a===1){n='<select class="selectnav" id="'+p(true)+'">'+n+"</select>"}a--;return n}e=document.getElementById(e);if(!e){return}if(!h(e)){return}if(!("insertAdjacentHTML"in window.document.documentElement)){return}document.documentElement.className+=" js";var n=t||{},r=n.activeclass||"active",i=typeof n.autoselect==="boolean"?n.autoselect:true,s=typeof n.nested==="boolean"?n.nested:true,o=n.indent||"-",u=n.label||"Menu",a=0,f=" selected ";e.insertAdjacentHTML("afterend",d(e));var l=document.getElementById(p());if(l.addEventListener){l.addEventListener("change",c)}if(l.attachEvent){l.attachEvent("onchange",c)}return l};return function(t,n){e(t,n)}}();

Output after prettify is applied [ Entire output of page.text is huge. The following output does not contain the entire output.]

resources.blogblog.com/img/icon18_wrench_allbkg.png" width="18"/>
           </a>
          </span>
         </span>
         <div class="clear">
         </div>
        </div>
       </div>
      </div>
     </div>
     <div id="lowerbar-wrapper">
      <div class="lowerbar ty-trigger section" id="Footer widget(2)">
       <div class="widget FollowByEmail" data-version="1" id="FollowByEmail1">
        <h2 class="title">
         Newsletter
        </h2>
        <div class="widget-content">
         <div class="follow-by-email-inner">
          <form action="https://feedburner.google.com/fb/a/mailverify" method="post" onsubmit='window.open("https://feedburner.google.com/fb/a/mailverify?uri=blogspot/GNWbz", "popupwindow", "scrollbars=yes,width=550,height=520"); return true' target="popupwindow">
           <table width="100%">
            <tr>
             <td>
              <input class="follow-by-email-address" name="email" placeholder="Email address..." type="text"/>
             </td>
             <td width="64px">
              <input class="follow-by-email-submit" type="submit" value="Submit"/>
             </td>
            </tr>
           </table>
           <input name="uri" type="hidden" value="blogspot/GNWbz"/>
           <input name="loc" type="hidden" value="en_US"/>
          </form>
         </div>
        </div>
        <span class="item-control blog-admin">
        </span>
       </div>
      </div>
     </div>
     <div id="lowerbar-wrapper">

Output after find_all

[<p class="description"><span>A personal blog about poems, photography, fiction, collaboration of writing and drawing, travel, art, lifestyle, Engineering blogs and India by Harsha Navalkar</span></p>, <p>
   #import requests and BeautifulSoup libraries required for web scrapping  import requests  from bs4 import BeautifulSoup   url='http:/...
</p>, <p></p>, <p></p>, <p></p>, <p></p>, <p></p>, <p class="contact-form-error-message" id="ContactForm1_contact-form-error-message"></p>, <p class="contact-form-success-message" id="ContactForm1_contact-form-success-message"></p>]

Output after get_text

A personal blog about poems, photography, fiction, collaboration of writing and drawing, travel, art, lifestyle, Engineering blogs and India by Harsha Navalkar


Output stored in Dailyblog.txt


Web Scrapping using Python in Data Science Example program 2