1.最近学习Python(Python3),试着网上爬下照片
2.参考了网上的代码,发现网页编码格式(utf-8,gbk等等)不同会导致抓取不到,所以利用chardet模块判断编码,小改了一下,效果还不错:
import urllib.request import re import chardet def getHtml(url): page =urllib.request.urlopen(url) html = page.read() chr=chardet.detect(html)['encoding'] if chr=='utf-8': html = html.decode() else : html =html.decode('GBK') return html def getImg(html): reg = r'src="(.+?\.jpg)"' imgre = re.compile(reg) imglist = re.findall(imgre,html) x=0 for imgurl in imglist: urllib.request.urlretrieve(imgurl,'D:\pic\%s.jpg' % x) x+=1 url="https://pixabay.com/" html = getHtml(url) print (getImg(html))参考的博客:http://blog.csdn.net/longshengguoji/article/details/9946675
