• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

web surfing

Joined
5/1/08
Messages
137
Points
26
This question is oriented around python, but it is really about client authentication in general. My goal is to parse data from a website that requires a user name and password. Having been unsuccessful there, I decided to see if I could parse quantnet ( specifically the 9841 stats in finance course forum ). I have been dually unsuccessful, so I am wondering if there is something wrong with my method. Any help would be greatly appreciated. My code is in python, but I think it should be fairly universal to anyone with knowledge of http protocol:
C++:
import urllib2
my_username = 'blablabla'
my_password = 'ferferfer'

#stats forum
forum = 'http://www.quantnet.com/forum/forumdisplay.php?f=139'

#request object
request = urllib2.Request( forum ) 

#password manager
manager = urllib2.HTTPPasswordManagerWithDefaultRealm()
manager.add_password(realm = None,uri =  'http://www.quantnet.com',
                                         user=my_username, passwd = my_password)
#Authentication handler
handler = urllib2.HTTPBasicAuthenticationHandler( manager )

#build the opener
opener = urllib2.build_opener( handler )

# open the page
page = opener.open( request ).read()
For the lexical nightmare class names, I may have messed up spelling them here, but they are right in my script. I am able to open the page, and I get no error code ( code = 200 - ok ). However, when I look at the page I see that I'm logged in as a guest, and I am unable to view anything that would require a login. Further, I can see in the generated page where my request did generate an error because I am not logged in.

Note: I have also changed the user-agent header from python to Mozilla just in case quantnet doesn't like robots roaming around.
Many thanks in advance to anyone who can show me the light.
 
No need to redo what has been done thousands of times with different programming languages
google for mechanize and Beautiful Soup python modules.
 
Roger,

the issue with our script is a low level one. Quantnet uses form authntication so you will need to simulate sending a form (in code) to establish a session, keep that session alive (cookies, headers or some other sort of mechanism) and then retrieve the data.

BTW, I looked at the mechanize package suggested by joshbaptiste and it does exactly what you need. However, you will have to figure out which form fields should be sent accross with the credentials in order to login. here are couple of links from a google search:

http://www.palewire.com/tag/python-beautifulsoup-table-parse/

http://wwwsearch.sourceforge.net/mechanize/

Again, this is very low level. IM or email me and we can talk more.
 
Thank you both very much - mechanize is useful. Still having trouble with authentication, though. Does anyone know a good source of example code for this? Or if anyone could write a sample code here that describes how to do this ( simulating a browser, entering login info, and keeping the session alive to scour around the website ), I would be most grateful.


UPDATE: Just received a cease and desist from the data provider that I was trying to grab data from. Something's telling me I probably would not be able to connect to the site even if I knew what I was doing.
 
in most case, the provider will have a tech doc to provide way for you to grab the data in various coding language. Try that.
I have a script in Perl that I used to grab data from our provider, parse them into our local drive, unzip them, etc. Usually you provide your proxy or IP address so they add you to the white list.
You can adapt this script to Python. The provider interface is ASP through so we pass a string with all the parameters (login, pass, report type, etc). In my case, the login id is sent as plain text, not encoded anyway.

PM me if interested.
 
One way to do it...

With the help of Alain, Andy, Stefan, Pete and joshbaptiste, I finally found access onto QN. It's a big, big world out there, but there are few examples of how to gain access to password protected websites. So, for anyone who's interested, here is a way to do it in python.
C++:
from mechanize import Browser
from cookielib import CookieJar

#params
user = 'ferferfer'
passwd = 'blebleble' #this is my actual QN login!
page = 'http://www.quantnet.com/'

#get a cookie jar
cj = CookieJar()

#make a browser and set it up
br = Browser()
br.set_cookiejar(cj)

br.open(page) #home page opened.
br.select_form(nr=0) #the first form is the login panel
br['vb_login_username'] = user
br['vb_login_password'] = passwd  #just like mozilla would do.

result = br.submit()  #br just clicked the login button.
#huzzah
When parsing the resulting page, you can then verify the QN has signed you in and is redirecting you. Seeing a lot of potential here for grabbing data off the net.
 

Attachments

  • QNpy.JPG
    QNpy.JPG
    44.6 KB · Views: 32
Back
Top