Web Scrapping using Python in Data Science Example program 1

Before moving to the code Learn HOW WEB SCRAPING WORKS?


Web Scrapping using Python in Data Science Example program 1



#import requests and BeautifulSoup libraries required for web scrapping
import requests
from bs4 import BeautifulSoup

url='http://www.wordsforlife.org.uk/songs/jack-and-jill-went-hill'
page=requests.get(url)
status =page.status_code
text=page.text

print('\n Output after status_code \n')
print(status)
print('\n Output after page.text \n')
print(text)

soup = BeautifulSoup(page.text, 'html.parser')

print('\n Output after prettify()\n ')
print(soup.prettify())

f1 =soup.find_all('p')
f2=soup.find_all('p')[0].get_text()

print('\n Output after find all \n') 
print(f1)

print('\n Output after get text \n')
print(f2)


with open("wb.txt",'w') as w1:
 w1.write(f2)
with open("wb.txt",'r') as r:
   with open("Newdoc.txt",'w') as w:
    for line in r:
       w.write(line)

OUTPUT 

Output after status_code 
200

Output after page.text  [ Entire output of page.text is huge. The following output does not contain the entire output.]
<div class="views-field-title-1">
                <span class="field-content"><a href="/hush-little-baby"><a href="/hush-little-baby">Hush, little baby</a> </a></span>
  </div>
</li>
          <li class="views-row views-row-34 views-row-even">  
  <div class="views-field-title-1">
                <span class="field-content"><a href="/songs/i-had-little-nut-tree"><a href="/songs/i-had-little-nut-tree">I Had a Little Nut Tree</a> </a></span>
  </div>
</li>
          <li class="views-row views-row-35 views-row-odd">  
  <div class="views-field-title-1">
                <span class="field-content"><a href="/songs/hear-thunder"><a href="/songs/hear-thunder">I Hear Thunder</a> </a></span>
  </div>
</li>
          <li class="views-row views-row-36 views-row-even">  
  <div class="views-field-title-1">
                <span class="field-content"><a href="/incy-wincy-spider"><a href="/incy-wincy-spider">Incy Wincy Spider</a> </a></span>
  </div>
</li>
          <li class="views-row views-row-37 views-row-odd">  
  <div class="views-field-title-1">
                <span class="field-content"><a href="/its-raining-its-pouring"><a href="/its-raining-its-pouring">It&#039;s raining, it&#039;s pouring</a> </a></span>
  </div>
</li>
          <li class="views-row views-row-38 views-row-even">  
  <div class="views-field-title-1">
                <span class="field-content"><a href="/songs/jack-and-jill-went-hill" class="active"><a href="/songs/jack-and-jill-went-hill" class="active">Jack and Jill went up the hill</a> </a></span>
  </div>

</li>

Output after prettify is applied [ Entire output of page.text is huge. The following output does not contain the entire output.]

<li class="views-row views-row-82 views-row-even">
                <div class="views-field-title-1">
                 <span class="field-content">
                  <a href="/songs/sing-song-sixpence">
                   <a href="/songs/sing-song-sixpence">
                    Sing a song of sixpence
                   </a>
                  </a>
                 </span>
                </div>
               </li>
               <li class="views-row views-row-83 views-row-odd">
                <div class="views-field-title-1">
                 <span class="field-content">
                  <a href="/songs/sleeping-bunnies">
                   <a href="/songs/sleeping-bunnies">
                    Sleeping bunnies
                   </a>
                  </a>
                 </span>
                </div>
               </li>
               <li class="views-row views-row-84 views-row-even">
                <div class="views-field-title-1">
                 <span class="field-content">
                  <a href="/solomon-grundy">
                   <a href="/solomon-grundy">
                    Solomon Grundy
                   </a>
                  </a>
                 </span>
                </div>

               </li>

Output after find_all

[<p>Jack and Jill went up the hill<br/>
To fetch a pail of water.<br/>
Jack fell down and broke his crown,<br/>
And Jill came tumbling after.</p>, <p>Up Jack got, and home did trot,<br/>
As fast as he could caper,<br/>
He went to bed to mend his head,<br/>

With vinegar and brown paper.</p>, <p></p>, <p></p>, <p></p>, <p><a href="/songs">&lt; Songs and Rhymes page</a></p>, <p><a href="http://www.literacytrust.org.uk" target="_blank"><img alt="National Literacy Trust" height="75" src="/sites/all/themes/blogbuzz/images/NLT_Logo_WfL_small.jpg" width="75"/></a>   National Literacy Trust © 2018          <b><a href="/about" title="About us">About us</a>  |  <a href="/accessibility" title="Accessibility">Accessibility</a> |  <a href="/legalstuff" title="Legal stuff">Legal stuff</a>  |  <a href="/competition-terms" title="Competition terms and conditions">Competition terms and conditions</a></b></p>]

Output after get_text

Jack and Jill went up the hill
To fetch a pail of water.
Jack fell down and broke his crown,
And Jill came tumbling after.

Output stored in Newdoc.txt


Web Scrapping using Python in Data Science Example program 1



Web Scrapping using Python in Data Science Example program