Monday 11 June 2012

Using Python and Beautiful Soup

           Python is very versatile and has a lot of cool tools . In this post i am going to show you how to scrape a website . In lots of cases we usually scrape website for a lot of reason . In my case i am going to scrape yahoo finance site for options . I am going to use Beautiful Soup  and urllib for that . BeautifulSoup is quite versatile . Although, it was once thought to be not maintained, the fellows at BeautifulSoup have done an excellent job. I found it easy to use quite good in terms of speed too.  You can use it from here . The latest version is stable . Also i have referred quite a bit from pythoncentral too.            

Here i am going to scrape the Options for Google as in figure below:

google options

We must first set up the page as below:

Loading ....
            Here the important thing to remember is that , the property of html tags are changed by yahoo so when it does , be sure to change the propery of the html tags in my case change attrs in the given code below . So we are going to use soup lets do some magic.
     
Loading ....
Use firebug or something to find the html tags and their corresponding properties . As the property keeps on changing so i am going to show how it is done step wise with the results from each step.
Firstly take a look at the site of yahoo finance here
            <
     
Loading ....
      Finally we can go off with the values for the same . 
     
Loading ....
             In the entire excercise we have obtained all the values needed . For my case i would like to now enter import it in excel and use IronSpread to calculate the cell values . I guess that will be in some other post . I will soon tell you how to use python with excel and bug off all the complicated vba scripts .Also write the macros . it will be pretty interesting stuff, if you want to take a look i suggest take a look at IronSpread.




No comments:

Post a Comment