首页 > Python > 文章正文

Python爬虫实战:网易云音乐爬取

更新时间:2020-11-10

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。

 

 

本次目标

爬取网易云音乐

https://music.163.com/

 

环境

  • python 3.6
  • pycharm

爬虫代码

导入工具

import requests
import re

 

请求网站、解析网站数据

def get_music_url(music_id, music_title):
    url = 'https://api.zhuolin.wang/api.php'
    headers = {
        'Accept': '*/*',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Cache-Control': 'no-cache',
        'Connection': 'keep-alive',
        'Cookie': 'UM_distinctid=175aca5b31d39e-06d658eceb014a-3962420d-1fa400-175aca5b31e92e',
        'Host': 'api.zhuolin.wang',
        'Pragma': 'no-cache',
        'Referer': 'https://music.zhuolin.wang/',
        'Sec-Fetch-Dest': 'script',
        'Sec-Fetch-Mode': 'no-cors',
        'Sec-Fetch-Site': 'same-site',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
    }
    params = {
        'callback': 'jQuery111305698848623906863_1604919341715',
        'types': 'url',
        'id': '{}'.format(music_id),
        'source': 'netease',
        '_': '1604919341751',
    }
    response = requests.get(url=url, params=params, headers=headers)
    html_data = response.text
    if music_url == '':
        print('无音频下载链接')

def music_id():
    url = 'https://music.163.com/discover/toplist'
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    lis = re.findall('<li><a href="https://www.cnblogs.com/hhh188764/p/(.*?)">https://www.cnblogs.com/hhh188764/p/(.*?)</a></li>', response.text, re.S)[0:100]
    for i in lis:
        music_id = i[0].split('id=')[-1]
        title = i[1]
        pattern = re.compile(r"[\/\\\:\*\?\"\<\>\|]")  # '/ \ : * ? " < > |'
        music_title = re.sub(pattern, "_", title)  # 替换为下划线
        get_music_url(music_id, music_title)

 

保存数据

    else:
        path = '保存地址\\' + music_title + '.mp3'
        response = requests.get(url=music_url)
        with open(path, mode='wb') as f:
            f.write(response.content)
            print(music_title, music_url)

 

运行代码,结果如下图

 

相关文章
相关标签