urllib在python2和3之间的改写

xiaoxiao2022-06-12 19

要想改代码，心理上要能啃硬骨头。弄清楚源代码的原理，新旧版本变化的点，会提出问题百度，最后要会看error

第5章里基于python2的urllib2模块，下载网页的三种办法，我用的是PY3，urllib变了，查了下Python3中urllib使用介绍，终结的很全面。改了下

import urllib.request import http.cookiejar url="https://www.baidu.com" print("第一种方法：") response1 = urllib.request.urlopen(url) print(response1.getcode()) print(len(response1.read())) #添加data，http_header+url三者组成request print("第二种方法") request = urllib.request.Request(url) request.add_header("user-agent","Mozilla/5.0") response2 = urllib.request.urlopen(url) print(response2.getcode()) print(len(response2.read())) #添加特殊情景的处理器 print("第三种方法") cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) urllib.request.install_opener(opener) response3 = urllib.request.urlopen(url) print(response3.getcode()) print(cj) print(response3.read())

运行结果如下：

第一种方法： 200 227 第二种方法 200 227 第三种方法 200 <CookieJar[<Cookie BIDUPSID=A6465C9438A1D1A0C40E88CC17839B39 for .baidu.com/>, <Cookie PSTM=1540306742 for .baidu.com/>, <Cookie BD_NOT_HTTPS=1 for www.baidu.com/>]> b'<html>\r\n<head>\r\n\t<script>\r\n\t\tlocation.replace(location.href.replace("https://","http://"));\r\n\t</script>\r\n</head>\r\n<body>\r\n\t<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>\r\n</body>\r\n</html>' [Finished in 2.6s]

转载请注明原文地址: https://www.6miu.com/read-4932612.html

Java

最新回复(0)