利用selenuim以及无头浏览器爬取9酷网音乐

科技前沿 • 2025-03-12 15:59 • 阅读 47

大家好，我是讯享网，很高兴认识大家。

利用selenuim以及无头浏览器爬取9酷网音乐

这里使用selenuim爬取的原因也是因为比较直观并且如网页为动态加载时也可以进行爬取，还可以对网站进行操作。这也是selenium的优点所在

需要的导包

import requests from selenium import webdriver import time from lxml import etree from selenium.webdriver.chrome.options import Options

讯享网

首先是调用驱动和无头浏览器

讯享网chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') option = webdriver.ChromeOptions() option.binary_location=r'C:\Program Files\Google\Chrome\Application\chrome.exe'#这里是谷歌浏览器的位置 driver = webdriver.Chrome('C:\Program Files\Google\Chrome\Application\chromedriver.exe')#这里是驱动位置

接着get方法进入像要爬取的网站

讯享网

driver.get("https://www.9ku.com/music/t_m_hits.htm")#网站网址
html = driver.page_source#获取网站源代码 tree = etree.HTML(html)#使用xpath获取到想要获取的元素这里获取的是进入到音乐播放界面的链接的尾缀 detail_url = tree.xpath('//*[@id="f1"]/ol/li/a/@href') time.sleep(3) driver.quit() index = 0创建一个初始值为了后面使用

接着利用for循环加上链接头并且找到音乐地址与标题位置一并使用xpath拿取到

讯享网for i in detail_url: url = "https://www.9ku.com/"+i#for循环遍历链接尾加上链接头 driver1 = webdriver.Chrome('C:\Program Files\Google\Chrome\Application\chromedriver.exe')#再次调用驱动 driver1.get(url)#再次get方法获取新页面源码
    html1 = driver1.page_source tree1 = etree.HTML(html1) src = tree1.xpath('//*[@id="jp_audio_0"]/@src')#获取到音乐地址 title = tree1.xpath('//*[@id="mydiv1"]/div[2]/div[1]/div[1]/h1/text()')#获取标题 print(title) src_req = requests.get(url=src[0]).content#将获取的音乐保存到文件夹中并且每完成一个进行输出 with open(f"mp3/{title[0]}.mp3","wb")as f: f.write(src_req) index +=1 print(f"{title[0]},已完成第{index}个") driver1.quit()

最终爬取的音乐

在这里插入图片描述

利用selenuim以及无头浏览器爬取9酷网音乐

利用selenuim以及无头浏览器爬取9酷网音乐

相关推荐