當前位置：首頁 > news >正文

西鄉(xiāng)網(wǎng)站開發(fā)長沙seo關鍵詞排名

news 2025/7/2 4:29:19

西鄉(xiāng)網(wǎng)站開發(fā),長沙seo關鍵詞排名,wordpress手機版受錢嗎,刷東西網(wǎng)站怎么做通過輸入搜索的關鍵字，和搜索頁數(shù)范圍，爬出指定文本內(nèi)內(nèi)容并存入到txt文檔。代碼逐行講解。使用re、res、BeautifulSoup包讀取，代碼已測，可以運行。txt文檔內(nèi)容不亂碼。 import re import requests from bs4 import BeautifulS…

通過輸入搜索的關鍵字，和搜索頁數(shù)范圍，爬出指定文本內(nèi)內(nèi)容并存入到txt文檔。代碼逐行講解。

使用re、res、BeautifulSoup包讀取，代碼已測，可以運行。txt文檔內(nèi)容不亂碼。

import re
import requests
from bs4 import BeautifulSouptitles = []                                                             #存放文檔標題
urls = []                                                               #存放每個文檔鏈接keyword = input("請輸入想要查找的關鍵字：")                              
pagenum = input("請輸入想要查找的頁數(shù)：")                                        
txt_name = keyword + "：前" + pagenum + "頁內(nèi)容.txt"                        with open(txt_name,'w',encoding='utf-8') as f:                             # 創(chuàng)建txt文件f.write(txt_name + '\r')                                               # 將文件名寫入f.close()# 每頁內(nèi)容單獨爬取
for i in range(1, int(pagenum)+1):                               html = "http://www.ofweek.com/newquery.action?keywords="+keyword+"&type=1&pagenum=" + str(i)         # 根據(jù)關鍵詞和頁數(shù)生成鏈接resp = requests.get(html)                             # get獲取數(shù)據(jù)，訪問拼接后的url                                    resp.encoding = 'gb18030'                             # 讀取中文時不會出現(xiàn)亂碼content = resp.text                  # 拿到網(wǎng)站的數(shù)據(jù)，捕獲到的網(wǎng)頁內(nèi)容給content變量# html文件解析，解析響應的文件內(nèi)容，html.text 是 HTML 文檔的源代碼，# 'html.parser' 是解析器，用于指定如何解析 HTML 文檔bs = BeautifulSoup(content,'html.parser')#每個標題都存在類名為no-pic的li標簽里面for news in bs.select('div.zx-tl'): url = news.select('a')[0]['href']                     # 提取文章鏈接urls.append(url) title = news.select('a')[0].text                      # 提取文章標題titles.append(title)for i in range(len(urls)):                                    # 遍歷每篇文章的鏈接resp = requests.get(urls[i])resp.encoding='gb18030'content = resp.textbs = BeautifulSoup(content,'html.parser')#文章的內(nèi)容是存在類名為artical-content的div塊里面page_content = bs.select('div.artical-content')[0].textwith open(txt_name,'a',encoding='utf-8') as f:            # 寫入txt文件f.write("\n"+titles[i]+page_content)f.close()print("文件保存成功！")

查看全文

http://aloenet.com.cn/news/31509.html