python网络爬虫抓取图片

xiaoxiao2021-02-28 116

1.最近学习Python（Python3），试着网上爬下照片

2.参考了网上的代码，发现网页编码格式（utf-8，gbk等等）不同会导致抓取不到，所以利用chardet模块判断编码，小改了一下，效果还不错：

import urllib.request import re import chardet def getHtml(url): page =urllib.request.urlopen(url) html = page.read() chr=chardet.detect(html)['encoding'] if chr=='utf-8': html = html.decode() else : html =html.decode('GBK') return html def getImg(html): reg = r'src="(.+?\.jpg)"' imgre = re.compile(reg) imglist = re.findall(imgre,html) x=0 for imgurl in imglist: urllib.request.urlretrieve(imgurl,'D:\pic\%s.jpg' % x) x+=1 url="https://pixabay.com/" html = getHtml(url) print (getImg(html))

参考的博客：http://blog.csdn.net/longshengguoji/article/details/9946675

转载请注明原文地址: https://www.6miu.com/read-42018.html

技术

最新回复(0)