python爬虫:不同解析网页方式添加报头

xiaoxiao2021-02-28  14

一、使用urllib.request

import urllib.request url = 'https://weheartit.com/discover/book/articles' #向服务器发送请求 req = urllib.request.Request(url) #添加报头,add_header()是方法,所以中间用逗号,注意与下面区分 req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36') data = urllib.request.urlopen(req).read() print(data)二、使用Requests解析网页

import requests from bs4 import BeautifulSoup url = 'https://weheartit.com/discover/book/articles' #字典里面对应的是键值对,所以中间用冒号 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36'} req = requests.get(url,headers=headers) # req.encoding = 'utf-8' soup = BeautifulSoup(req.content,'lxml') print(soup.encode('utf-8'))#打印时出现编码问题,需对打印内容进行编码这两种方法便于理解,所以对其进行总结,方便以后使用

转载请注明原文地址: https://www.6miu.com/read-2050228.html

最新回复(0)