Example: Code of the Wikipedia Fetcher
From Wiki of the E-Business and Web Science Research Group
This is a short demo of accessing the Web from Python. It is meant to demonstrate the power of Python for Web mash-ups. PLEASE do not run this programm extensively, since it accesses Wikipedia in an illegal way - the code is just meant to show the power of the urllib and urllib2 libraries.
# fetchWikipedia import urllib, urllib2, random, time URI = 'http://en.wikipedia.org/wiki/Special:Random' VALUES = {} # additional values for the http request USER_AGENTS = [ 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11', 'Opera/9.25 (Windows NT 5.1; U; en)', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12', 'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9' ] data = urllib.urlencode(VALUES) for i in range(100): # take random browser type r = random.randint(0, 4) headers = {'User-Agent' : USER_AGENTS[r]} # compose request req = urllib2.Request(URI, data, headers) # fetch page response = urllib2.urlopen(req) page = response.read() # show first 800 characters of HTML print i, page[:800] # wait a random amount of time time.sleep(random.random()+0.7)