怎样用Python进行网页抓取?

问题:

我想从网站上获取每日日出/日落时间。是否有可能用Python来刮取网页内容?使用哪些模块?有没有教程可用?

回答:

使用urllib2与辉煌的BeautifulSoup库:

import urllib2
from BeautifulSoup import BeautifulSoup
# or if you're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://example.com').read())

for row in soup('table', {'class': 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string
    # will print date and sunrise

Code问答: codewenda.com
Stackoverflow:Web scraping with Python

发表评论

电子邮件地址不会被公开。 必填项已用*标注

5 + 1 =