當(dāng)前位置：首頁(yè) > news >正文

seo外包靠譜長(zhǎng)沙seo咨詢

news 2025/7/2 9:23:49

seo外包靠譜,長(zhǎng)沙seo咨詢,中英互譯網(wǎng)站怎么做,做網(wǎng)站主頁(yè)文章目錄前言導(dǎo)入依賴庫(kù)設(shè)置ChromeDriver的路徑創(chuàng)建Chrome WebDriver對(duì)象打開網(wǎng)頁(yè)找到結(jié)果元素創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)遍歷結(jié)果元素并提取數(shù)據(jù)提取標(biāo)題、作者、發(fā)布時(shí)間等信息判斷是否為目標(biāo)文章提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息將提取的數(shù)據(jù)存儲(chǔ)為…

文章目錄

前言
導(dǎo)入依賴庫(kù)
設(shè)置ChromeDriver的路徑
創(chuàng)建Chrome WebDriver對(duì)象
打開網(wǎng)頁(yè)
找到結(jié)果元素
創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)
遍歷結(jié)果元素并提取數(shù)據(jù)
提取標(biāo)題、作者、發(fā)布時(shí)間等信息
判斷是否為目標(biāo)文章
提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息
將提取的數(shù)據(jù)存儲(chǔ)為字典格式
將字典添加到數(shù)據(jù)列表中
保存數(shù)據(jù)為JSON文件
關(guān)閉WebDriver
完整代碼
- 運(yùn)行效果
結(jié)束語(yǔ)

在這里插入圖片描述

前言

本文介紹了如何使用Selenium和Chrome WebDriver來(lái)獲取【騰訊云 Cloud Studio 實(shí)戰(zhàn)訓(xùn)練營(yíng)】中的文章信息。在這篇文章中，我們首先導(dǎo)入了需要使用的依賴庫(kù)，然后設(shè)置了ChromeDriver的路徑，并創(chuàng)建了Chrome WebDriver對(duì)象。接著，我們使用WebDriver打開了指定的網(wǎng)頁(yè)，并等待頁(yè)面加載完成。隨后，通過定位元素的方式找到了搜索結(jié)果列表的父元素，并提取了每個(gè)搜索結(jié)果的標(biāo)題、作者、發(fā)布時(shí)間等信息。最后，我們將提取到的數(shù)據(jù)存儲(chǔ)為JSON文件，并關(guān)閉了WebDriver。

導(dǎo)入依賴庫(kù)

在這里插入圖片描述

from selenium import webdriver
import json
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time

這段代碼導(dǎo)入了需要使用的依賴庫(kù)，包括selenium、json，以及一些常用模塊。

設(shè)置ChromeDriver的路徑

在這里插入圖片描述

driver_path = ''

在這里，driver_path變量存儲(chǔ)了ChromeDriver的路徑，需要根據(jù)實(shí)際情況進(jìn)行設(shè)置。

創(chuàng)建Chrome WebDriver對(duì)象

driver = webdriver.Chrome(driver_path)

通過webdriver.Chrome()方法創(chuàng)建了一個(gè)Chrome WebDriver對(duì)象，并將其賦值給變量driver。

打開網(wǎng)頁(yè)

在這里插入圖片描述

url = 'https://so.csdn.net/so/search?spm=1001.2100.3001.7499&q=%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5&t=blog&u=&utm_medium=distribute.pc_search_hot_word.none-task-hot_word-alirecmd-1-%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5-null-null.172%5Ev8%5Etag_flag&depth_1-utm_source=distribute.pc_search_hot_word.none-task-hot_word-alirecmd-1-%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5-null-null.172%5Ev8%5Etag_flag'
driver.get(url)
time.sleep(5)

使用driver.get()方法打開了指定的網(wǎng)頁(yè)。這里的URL是搜索某個(gè)關(guān)鍵詞的CSDN博客鏈接。然后通過time.sleep()方法等待頁(yè)面加載完成。

找到結(jié)果元素

results = driver.find_element(By.CLASS_NAME, "so-result-list").find_elements(By.CLASS_NAME, "list-item")

使用driver.find_element()方法找到了搜索結(jié)果列表的父元素，再通過find_elements()方法找到所有的搜索結(jié)果元素，并將其賦值給變量results。

創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)

data = []

創(chuàng)建一個(gè)空列表data，用于存儲(chǔ)提取出的數(shù)據(jù)。

遍歷結(jié)果元素并提取數(shù)據(jù)

for result in results:...

遍歷結(jié)果元素列表results，對(duì)每一個(gè)結(jié)果元素進(jìn)行數(shù)據(jù)提取。

提取標(biāo)題、作者、發(fā)布時(shí)間等信息

    title = result.find_element(By.CLASS_NAME, "title").find_element(By.TAG_NAME, 'a').textauthor = result.find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, 'name-text').textpushTime = result.find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, 'time').text

通過find_element()方法找到標(biāo)題、作者和發(fā)布時(shí)間等元素，并使用.text屬性獲取對(duì)應(yīng)的文本內(nèi)容。

判斷是否為目標(biāo)文章

    if "實(shí)戰(zhàn)訓(xùn)練營(yíng)】" in title:...else:print(f'不是目標(biāo)文章, 當(dāng)前文章標(biāo)題是:{title}')

通過判斷標(biāo)題中是否包含關(guān)鍵字"實(shí)戰(zhàn)訓(xùn)練營(yíng)】"來(lái)確定是否為目標(biāo)文章。如果是目標(biāo)文章，則進(jìn)行下一步的數(shù)據(jù)提取；否則打印當(dāng)前文章的標(biāo)題。

提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息

        description = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "row2").texttry:read = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-view").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:read = 0try:zan = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-dig").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:zan = 0try:comment = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-comment").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:comment = 0

使用find_element()方法逐層查找目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等元素，并通過.text屬性獲取對(duì)應(yīng)的文本內(nèi)容。如果某個(gè)元素不存在，則將對(duì)應(yīng)的變量賦值為0。

將提取的數(shù)據(jù)存儲(chǔ)為字典格式

        item = {'title': title,  # 標(biāo)題'description': description, # 描述'read': read,  # 閱讀數(shù)量'zan': zan,  # 點(diǎn)贊數(shù)量'comment': comment,  # 評(píng)論數(shù)量'author': author, # 作者'pushTime': pushTime # 發(fā)布時(shí)間}

將提取到的標(biāo)題、描述、閱讀數(shù)量等信息存儲(chǔ)為一個(gè)字典item。

將字典添加到數(shù)據(jù)列表中

        data.append(item)

將提取到的字典item添加到數(shù)據(jù)列表data中。

保存數(shù)據(jù)為JSON文件

with open('data.json', 'w', encoding='utf-8') as f:json.dump(data, f, ensure_ascii=False, indent=4)

使用json.dump()方法將數(shù)據(jù)列表data以JSON格式保存到文件"data.json"中。

關(guān)閉WebDriver

driver.quit()

關(guān)閉Chrome WebDriver。

完整代碼

from selenium import webdriver
import json
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time# 設(shè)置ChromeDriver的路徑
driver_path = ''# 創(chuàng)建Chrome WebDriver對(duì)象
driver = webdriver.Chrome(driver_path)# 打開網(wǎng)頁(yè)
url = 'https://so.csdn.net/so/search?spm=1001.2100.3001.7499&q=%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5&t=blog&u=&utm_medium=distribute.pc_search_hot_word.none-task-hot_word-alirecmd-1-%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5-null-null.172%5Ev8%5Etag_flag&depth_1-utm_source=distribute.pc_search_hot_word.none-task-hot_word-alirecmd-1-%E8%85%BE%E8%AE%AF%E4%BA%91%20Cloud%20Studio%20%E5%AE%9E%E6%88%98%E8%AE%AD%E7%BB%83%E8%90%A5-null-null.172%5Ev8%5Etag_flag'
driver.get(url)
time.sleep(5)# 找到結(jié)果元素
results = driver.find_element(By.CLASS_NAME, "so-result-list").find_elements(By.CLASS_NAME, "list-item")# 創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)
data = []# 遍歷結(jié)果元素并提取數(shù)據(jù)
for result in results:time.sleep(5)title = result.find_element(By.CLASS_NAME, "title").find_element(By.TAG_NAME, 'a').textauthor = result.find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, 'name-text').textpushTime = result.find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, 'time').textif "實(shí)戰(zhàn)訓(xùn)練營(yíng)】" in title:description = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "row2").text# readEle = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "item-ft").find_element(#     By.CLASS_NAME, "btm-view")# zanEle = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "item-ft").find_element(#     By.CLASS_NAME, "btm-dig")# print(zanEle)# commentEle = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,#                                                                            "item-ft").find_element(#     By.CLASS_NAME, "btm-comment")try:read = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-view").find_element(By.CLASS_NAME, "num").text# read = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,#                                                                         "item-ft").find_element(#     By.CLASS_NAME, "btm-view").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:read = 0try:zan = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-dig").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:zan = 0try:comment = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME,"item-ft").find_element(By.CLASS_NAME, "btm-comment").find_element(By.CLASS_NAME, "num").textexcept NoSuchElementException:comment = 0# read = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, "btm-view").find_element(By.CLASS_NAME, "num").text# zan = result.find_element(By.CLASS_NAME, "item-bd__cont").find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, "btm-dig").find_element(By.CLASS_NAME, "num").text# comment = result.find_element(By.CLASS_NAME,"item-bd__cont").find_element(By.CLASS_NAME, "item-ft").find_element(By.CLASS_NAME, "btm-comment").find_element(By.CLASS_NAME, "num").textidx = result.get_attribute('i')# 將提取的數(shù)據(jù)存儲(chǔ)為字典格式item = {'title': title,  # 標(biāo)題'description': description, # 描述'read': read,  # 閱讀數(shù)量'zan': zan,  # 點(diǎn)贊數(shù)量'comment': comment,  # 評(píng)論數(shù)量'author': author, # 作者'pushTime': pushTime # 發(fā)布時(shí)間}print(idx)# 將字典添加到數(shù)據(jù)列表中data.append(item)else:print(f'不是目標(biāo)文章, 當(dāng)前文章標(biāo)題是:{title}')# 保存數(shù)據(jù)為JSON文件with open('data.json', 'w', encoding='utf-8') as f:json.dump(data, f, ensure_ascii=False, indent=4)# 關(guān)閉WebDriver
driver.quit()

運(yùn)行效果

運(yùn)行的數(shù)據(jù)會(huì)保存到j(luò)son 中
在這里插入圖片描述

結(jié)束語(yǔ)

通過本文的介紹，我們學(xué)習(xí)了如何使用Selenium和Chrome WebDriver進(jìn)行網(wǎng)頁(yè)數(shù)據(jù)爬取，掌握了定位元素、提取信息和數(shù)據(jù)存儲(chǔ)的相關(guān)技巧。這些技術(shù)對(duì)于獲取網(wǎng)頁(yè)上的數(shù)據(jù)非常有用，可以幫助我們實(shí)現(xiàn)自動(dòng)化的數(shù)據(jù)采集和處理。希望本文對(duì)您有所幫助！如果您對(duì)網(wǎng)頁(yè)數(shù)據(jù)爬取和數(shù)據(jù)處理有更多興趣和需求，可以繼續(xù)深入學(xué)習(xí)和探索相關(guān)內(nèi)容。祝您在數(shù)據(jù)領(lǐng)域取得更多的成果！

查看全文

http://m.aloenet.com.cn/news/32048.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

seo外包靠譜長(zhǎng)沙seo咨詢

文章目錄

前言

導(dǎo)入依賴庫(kù)

設(shè)置ChromeDriver的路徑

創(chuàng)建Chrome WebDriver對(duì)象

打開網(wǎng)頁(yè)

找到結(jié)果元素

創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)

遍歷結(jié)果元素并提取數(shù)據(jù)

提取標(biāo)題、作者、發(fā)布時(shí)間等信息

判斷是否為目標(biāo)文章

提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息

將提取的數(shù)據(jù)存儲(chǔ)為字典格式

將字典添加到數(shù)據(jù)列表中

保存數(shù)據(jù)為JSON文件

關(guān)閉WebDriver

完整代碼

運(yùn)行效果

結(jié)束語(yǔ)

相關(guān)文章：

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

文章目錄

前言

導(dǎo)入依賴庫(kù)

設(shè)置ChromeDriver的路徑

創(chuàng)建Chrome WebDriver對(duì)象

打開網(wǎng)頁(yè)

找到結(jié)果元素

創(chuàng)建一個(gè)空列表用于存儲(chǔ)數(shù)據(jù)

遍歷結(jié)果元素并提取數(shù)據(jù)

提取標(biāo)題、作者、發(fā)布時(shí)間等信息

判斷是否為目標(biāo)文章

提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息

將提取的數(shù)據(jù)存儲(chǔ)為字典格式

將字典添加到數(shù)據(jù)列表中

保存數(shù)據(jù)為JSON文件

關(guān)閉WebDriver

完整代碼

運(yùn)行效果

結(jié)束語(yǔ)

相關(guān)文章：

提取標(biāo)題、作者、發(fā)布時(shí)間等信息

提取目標(biāo)文章的描述、閱讀數(shù)量、點(diǎn)贊數(shù)量、評(píng)論數(shù)量等信息