邵陽市城市建設網(wǎng)站怎么做網(wǎng)絡宣傳推廣
環(huán)境準備
在開始之前,你需要確保你的Python環(huán)境已經(jīng)安裝了以下庫:
requests
:用于發(fā)送HTTP請求。BeautifulSoup
:用于解析HTML文檔。
如果你還沒有安裝這些庫,可以通過以下命令安裝:
pip install requests beautifulsoup4
豆瓣數(shù)據(jù)抓取步驟
import requests
from bs4 import BeautifulSoupurl = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
response = requests.get(url, headers=headers)
response.encoding = response.apparent_encoding
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.find_all('div', class_='item') # 根據(jù)實際的HTML結構來定位數(shù)據(jù)
data = []
for movie in movies:title = movie.find('span', class_='title').textrating = movie.find('span', class_='rating_num').textlink = 'https://movie.douban.com' + movie.find('a')['href']item = {'title': title, 'rating': rating, 'link': link}print(item)data.append(item)
抓取結果