爬虫_初步(爬取豆瓣图片)

发布时间 2023-12-21 21:35:05作者: Eddie68

引用python库

from urllib.request import urlopen
import urllib.request,urllib.error
import re

  

 找到本机的headers

 

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0'}
url = 'https://movie.douban.com/top250' request = urllib.request.Request(url, headers=headers)#发送请求 response = urllib.request.urlopen(request) get_1 = response.read().decode('utf-8') get_1 imginfo = re.findall(r'<img width="100" alt="(.*?)" src="(.*?)" class="">',get_1) imginfo for i in range(0,20): imgurl = imginfo[i] imginfo[0] imgreq = urlopen(imgurl[1]) imgc = imgreq.read() imgf = open(r'D:\\A-lesson work\\CCUT\\img2\\'+ imgurl[0]+'.jpg','wb') imgf.write(imgc) imgf.close()

 比较重要的正则提取关键字 Python---re.findall的用法_python中re.findall用法-CSDN博客