Beautiful编写简单爬虫实验

xiaoxiao2021-02-28 137

from urllib.request import urlopen from urllib.error import HTTPError from bs4 import BeautifulSoup def getTitle(url): try: html = urlopen(url) except HTTPError as e: return None try: bsObj = BeautifulSoup(html.read(), 'lxml') title = bsObj.body.h1 except AttributeError as e: return None return title title = getTitle("http://www.pythonscraping.com/pages/page1.html") if title == None: print("The title could not be found.") else: print(title)

输出：

<h1>An Interesting Title</h1>

转载请注明原文地址: https://www.6miu.com/read-26206.html

技术

最新回复(0)