Python批量抓取公众号音视频:轻松下载你想要的内容

admin
🌐 经济型:买域名、轻量云服务器、用途:游戏 网站等 《腾讯云》特点:特价机便宜 适合初学者用 点我优惠购买
🚀 拓展型:买域名、轻量云服务器、用途:游戏 网站等 《阿里云》特点:中档服务器便宜 域名备案事多 点我优惠购买
🛡️ 稳定型:买域名、轻量云服务器、用途:游戏 网站等 《西部数码》 特点:比上两家略贵但是稳定性超好事也少 点我优惠购买

温馨提示:这篇文章已超过521天没有更新,请注意相关的内容是否还可用!

之前发过python批量抓取公众号的教程,这次不抓取公众号阅读数数据,批量下载公众号文章音频视频,直接上代码:


def video(res, headers,date):

vid = re.search(r'wxv_.{19}',res.text)

# time.sleep(2)

if vid:

vid = vid.group(0)

print('视频id',vid)

url = f'httPs://mp.weixin.QQ.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&vid={vid}'

data = requests.get(url,headers=headers,timeout=1).json()

video_url = data['url_info'][0]['url']

video_data = requests.get(video_url,headers=headers)

print('正在下载视频:'+trimName(data['title'])+'.mp4')

with open(date+'___'+trimName(data['title'])+'.mp4','wb') as f:

f.write(video_data.content)

def audio(res,headers,date,title):

AIds = re.findall(r'"voice_id":"(.*?)"',res.text)

time.sleep(2)

tmp = 0

for id in aids:

tmp +=1

url = f'https://res.wx.qq.com/voice/getvoice?mediaid={id}'

audio_data = requests.get(url,headers=headers)

print('正在下载音频:'+title+'.mp3')

with open(date+'___'+trimName(title)+'___'+str(tmp)+'.mp3','wb') as f5:

f5.write(audio_data.content)

url = input('请输入文章链接:')

response = requests.get(url, headers=headers)

urls = re.findall('<a target="_blank" href="(https?://mp.weixin.qq.com/s?.*?)"',response.text)

urls.append(url)

print('文章总数',len(urls))

for mp_url in urls:

res = requests.get(html.unescape(mp_url),Proxies={'http': None,'https': None},verify=False, headers=headers)

content = res.text.replace('data-src', 'src').replace('//res.wx.qq.com', 'https://res.wx.qq.com')

try:

title = re.search(r'var msg_title = '(.*)'', content).group(1)

ct = re.search(r'var ct = "(.*)";', content).group(1)

date = time.strftime('%Y-%m-%d', time.localtime(int(ct)))

print(date,title)

audio(res,headers,date,title)

video(res,headers,date)

with open(date+'_'+title+'.html', 'w', encoding='utf-8') as f:

f.write(content)

except Exception as err:

with open(str(randint(1,10))+'.html', 'w', encoding='utf-8') as f:

f.write(content)

下载的音频,视频在当前目录,文章html可以用python再转pdf。


文章版权声明:除非注明,否则均为执刀人的工具库原创文章,转载或复制请以超链接形式并注明出处。

发表评论

快捷回复: 表情:
AddoilApplauseBadlaughBombCoffeeFabulousFacepalmFecesFrownHeyhaInsidiousKeepFightingNoProbPigHeadShockedSinistersmileSlapSocialSweatTolaughWatermelonWittyWowYeahYellowdog
验证码
评论列表 (暂无评论,343人围观)

还没有评论,来说两句吧...

目录[+]

取消
微信二维码
微信二维码
支付宝二维码