當(dāng)前位置：首頁 > news >正文

福建省住建廳建設(shè)網(wǎng)站競價培訓(xùn)班

news 2025/7/1 20:46:01

福建省住建廳建設(shè)網(wǎng)站,競價培訓(xùn)班,可信網(wǎng)站身份驗證必須做嗎,外貿(mào) 網(wǎng)站模板Matplotlib 是一個 Python 的數(shù)據(jù)可視化庫，它能夠輕松創(chuàng)建各種類型的圖表和圖形；Matplotlib 可以在 Jupyter Notebooks、交互式應(yīng)用程序和腳本中使用，并支持多種繪圖樣式和格式； Matplotlib 最初是為科學(xué)計算而設(shè)計的&#xff0c…

Matplotlib 是一個 Python 的數(shù)據(jù)可視化庫，它能夠輕松創(chuàng)建各種類型的圖表和圖形；Matplotlib 可以在 Jupyter Notebooks、交互式應(yīng)用程序和腳本中使用，并支持多種繪圖樣式和格式；

Matplotlib 最初是為科學(xué)計算而設(shè)計的，可以用于繪制折線圖、散點圖、條形圖、面積圖、餅圖、直方圖等多種圖表類型。除了基本的圖表類型之外，Matplotlib 還支持更高級的數(shù)據(jù)可視化，如 3D 繪圖、動畫、地圖繪制等功能；

Matplotlib 提供了豐富的 API，包括函數(shù)式接口和面向?qū)ο蠼涌?#xff0c;用戶可以根據(jù)自己的需要選擇不同的接口進(jìn)行操作。利用 Matplotlib，用戶可以實現(xiàn)復(fù)雜的數(shù)據(jù)可視化，探索數(shù)據(jù)中的模式和關(guān)系，從而更好地理解數(shù)據(jù)并做出有意義的分析和預(yù)測；

除了提供 API 接口，Matplotlib 還有一些其他的特性，例如：

支持多種輸出格式：Matplotlib 可以將圖表輸出為多種格式，包括 PNG、PDF、SVG 等常見的圖像格式；
多種樣式風(fēng)格：Matplotlib 內(nèi)置了多種樣式風(fēng)格，用戶可以通過設(shè)置不同的風(fēng)格來快速改變圖表的樣式；
交互式可視化：Matplotlib 提供了多種交互式功能，如縮放、平移、旋轉(zhuǎn)等，用戶可以通過這些功能對圖表進(jìn)行交互式操作；
支持 LaTeX 公式：Matplotlib 支持在圖表中使用 LaTeX 公式，從而方便地繪制包含數(shù)學(xué)符號和公式的圖表；

總之，Matplotlib 是一個功能強大的數(shù)據(jù)可視化庫，提供了豐富的 API 和多種樣式風(fēng)格，可以幫助用戶輕松創(chuàng)建各種類型的圖表和圖形，從而更好地探索和理解數(shù)據(jù)；

matplotlib 官網(wǎng)

![toc]

1. 圖的結(jié)構(gòu)

使用 numpy 組織數(shù)據(jù), 使用 matplotlib API 進(jìn)行數(shù)據(jù)圖像繪制；

Python_Figure_Structure

一幅數(shù)據(jù)圖基本上包括如下結(jié)構(gòu)：

Data，數(shù)據(jù)區(qū)，包括數(shù)據(jù)點、描繪形狀；
Axis，坐標(biāo)軸，包括 X 軸、 Y 軸及其標(biāo)簽、刻度尺及其標(biāo)簽；
Title，標(biāo)題，數(shù)據(jù)圖的描述；
Legend，圖例，區(qū)分圖中包含的多種曲線或不同分類的數(shù)據(jù)；
Text，圖形文本；
Annotate，注解；

2. 繪圖步驟

導(dǎo)入 matplotlib 包相關(guān)工具包；
準(zhǔn)備數(shù)據(jù)，numpy 數(shù)組存儲；
繪制原始曲線；
配置標(biāo)題、坐標(biāo)軸、刻度、圖例；
添加文字說明、注解；
顯示、保存繪圖結(jié)果；

示例：con、sin、sqrt 函數(shù)的完整圖像

1. 導(dǎo)包

# 讓 matplotlib 繪制的圖嵌在當(dāng)前頁面中
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from pylab import *

2. 準(zhǔn)備數(shù)據(jù)

# 從 0. 開始 間隔為 0.2 的 10 以前的所有數(shù)
x = np.arange(0.,10, 0.2)
y1 = np.cos(x)
y2 = np.sin(x)
y3 = np.sqrt(x)

3. 繪制簡單曲線

# linewidth
plt.plot(x, y1, color='blue', linewidth=1.5,linestyle='-', marker='.', label=r'$y = cos{x}$')
plt.plot(x, y2, color='green', linewidth=1.5,linestyle='-', marker='*', label=r'$y = sin{x}$')
plt.plot(x, y3, color='m', linewidth=1.5, linestyle='-',marker='x', label=r'$y = \sqrt{x}$')

simple_curve

4. color 參數(shù)

color

r 紅色
g 綠色
b 藍(lán)色
c cyan
m 紫色
y 土黃色
k 黑色
w 白色

5. linestyle 線條樣式

line_style

6. marker 標(biāo)記

marks

7. 坐標(biāo)軸

# 坐標(biāo)軸上移
ax = plt.subplot(111)
# 去掉右邊的邊框線
ax.spines['right'].set_color('none')
# 去掉上邊的邊框線
ax.spines['top'].set_color('none')# 移動下邊邊框線，相當(dāng)于移動 X 軸
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))# 移動左邊邊框線，相當(dāng)于移動 y 軸
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))

axis

8. 刻度尺間隔 lim、刻度標(biāo)簽 ticks

# 設(shè)置 x, y 軸的刻度取值范圍
plt.xlim(x.min()*1.1, x.max()*1.1)
plt.ylim(-1.5, 4.0)# 設(shè)置 x, y 軸的刻度標(biāo)簽值
plt.xticks([2, 4, 6, 8, 10], [r'2', r'4', r'6', r'8', r'10'])
plt.yticks([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0],[r'-1.0', r'0.0', r'1.0', r'2.0', r'3.0', r'4.0'])

lim_ticks

9. 設(shè)置 X、Y 坐標(biāo)軸和標(biāo)題

# 設(shè)置標(biāo)題、x 軸、y 軸
plt.title(r'$the \ function \ figure \ of \ cos(), \ sin() \ and \ sqrt()$', fontsize=19)
plt.xlabel(r'$the \ input \ value \ of \ x$', fontsize=18, labelpad=10.8)
plt.ylabel(r'$y = f(x)$', fontsize=18, labelpad=12.5)

title

10. 文字描述與注解

# 數(shù)據(jù)圖中添加文字描述 text
plt.text(1., 1.38, r'$x \in [0.0, \ 10.0]$', color='k', fontsize=15)
plt.text(1., 1.18, r'$y \in [-1.0, \ 4.0]$', color='k', fontsize=15)

text

# 特殊點添加注解
plt.scatter([8,], [np.sqrt(8),], 50, color='m')  # 使用散點圖放大當(dāng)前點
plt.annotate(r'$2\sqrt{2}$', xy=(8, np.sqrt(8)), xytext=(8.05, 2.85), fontsize=16, color='#090909',arrowprops=dict(arrowstyle='->', connectionstyle='arc3, rad=0.1', color='#090909'))

annotate

11. 圖例設(shè)置

legend

# 在 plt.plot 函數(shù)中添加 label 參數(shù)后，使用 plt.legend(loc=’up right’)
# 或 不使用參數(shù) label, 直接使用如下命令：
plt.plot(x, y1, color='blue', linewidth=1.5,linestyle='-', marker='.', label=r'$y = cos{x}$')
plt.plot(x, y2, color='green', linewidth=1.5,linestyle='-', marker='*', label=r'$y = sin{x}$')
plt.plot(x, y3, color='m', linewidth=1.5, linestyle='-',marker='x', label=r'$y = \sqrt{x}$')
plt.legend(['cos(x)', 'sin(x)', 'sqrt(x)'], loc='upper right')

sample-legend

12. 網(wǎng)格線

plt.grid(True)

grid

13. 顯示與保存

# 顯示
plt.show()
# 保存
savefig('../images/ml-03-matplotlib/plot3d_ex.png', dpi=48)

3. 完整圖例

# coding:utf-8import numpy as np
import matplotlib.pyplot as plt
from pylab import *# 定義數(shù)據(jù)部分
x = np.arange(0., 10, 0.2)
y1 = np.cos(x)
y2 = np.sin(x)
y3 = np.sqrt(x)# 繪制 3 條函數(shù)曲線
plt.plot(x, y1, color='blue', linewidth=1.5,linestyle='-', marker='.', label=r'$y = cos{x}$')
plt.plot(x, y2, color='green', linewidth=1.5,linestyle='-', marker='*', label=r'$y = sin{x}$')
plt.plot(x, y3, color='m', linewidth=1.5, linestyle='-',marker='x', label=r'$y = \sqrt{x}$')# 坐標(biāo)軸上移
ax = plt.subplot(111)
ax.spines['right'].set_color('none')     # 去掉右邊的邊框線
ax.spines['top'].set_color('none')       # 去掉上邊的邊框線# 移動下邊邊框線，相當(dāng)于移動 X 軸
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data', 0))# 移動左邊邊框線，相當(dāng)于移動 y 軸
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))# 設(shè)置 x, y 軸的取值范圍
plt.xlim(x.min()*1.1, x.max()*1.1)
plt.ylim(-1.5, 4.0)# 設(shè)置 x, y 軸的刻度值
plt.xticks([2, 4, 6, 8, 10], [r'2', r'4', r'6', r'8', r'10'])
plt.yticks([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0],[r'-1.0', r'0.0', r'1.0', r'2.0', r'3.0', r'4.0'])# 添加文字
plt.text(4, 1.68, r'$x \in [0.0, \ 10.0]$', color='k', fontsize=15)
plt.text(4, 1.38, r'$y \in [-1.0, \ 4.0]$', color='k', fontsize=15)# 特殊點添加注解
plt.scatter([8,], [np.sqrt(8),], 50, color='m')  # 使用散點圖放大當(dāng)前點
plt.annotate(r'$2\sqrt{2}$', xy=(8, np.sqrt(8)), xytext=(8.5, 2.2), fontsize=16, color='#090909',arrowprops=dict(arrowstyle='->', connectionstyle='arc3, rad=0.1', color='#090909'))# 設(shè)置標(biāo)題、x軸、y軸
plt.title(r'$the \ function \ figure \ of \ cos(), \ sin() \ and \ sqrt()$', fontsize=19)
plt.xlabel(r'$the \ input \ value \ of \ x$', fontsize=18, labelpad=88.8)
plt.ylabel(r'$y = f(x)$', fontsize=18, labelpad=12.5)# 設(shè)置圖例及位置
plt.legend(loc='upper right')
# plt.legend(['cos(x)', 'sin(x)', 'sqrt(x)'], loc='up right')# 顯示網(wǎng)格線
plt.grid(True)# 顯示繪圖
plt.show()

sample_complete_drawing

4. 常用圖形

曲線圖，matplotlib.pyplot.plot(data)；
灰度圖，matplotlib.pyplot.hist(data)；
散點圖，matplotlib.pyplot.scatter(data)；
箱式圖，matplotlib.pyplot.boxplot(data)；

1. 曲線圖

x = np.arange(-5, 5, 0.1)
y = x ** 2
z = y ** 2
plt.plot(x, y)
plt.plot(x, z)

plot

2. 灰度圖

x = np.random.normal(size=1000)
plt.hist(x, bins=10)

hist

3. 散點圖

x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
plt.scatter(x,y)

scatter

4. 箱式圖

plt.boxplot(x)

boxplot

上邊緣（Q3+1.5IQR）、下邊緣（Q1-1.5IQR）、IQR=Q3-Q1
上四分位數(shù)（Q3）、下四分位數(shù)（Q1）
中位數(shù)
異常值
處理異常值時與 3 σ 標(biāo)準(zhǔn)的異同：統(tǒng)計邊界是否受異常值影響、容忍度的大小

5. 應(yīng)用案例：自行車租賃數(shù)據(jù)分析

關(guān)聯(lián)分析、數(shù)值比較：散點圖、曲線圖；
分布分析：灰度圖、密度圖；
涉及分類的分析：柱狀圖、箱式圖；

1. 導(dǎo)入數(shù)據(jù)

import pandas as pd
import urllib
import tempfile  # 創(chuàng)建臨時文件系統(tǒng)
import shutil  # 文件操作
import zipfile  # 壓縮解壓縮# 創(chuàng)建臨時目錄
temp_dir = tempfile.mkdtemp()
# 網(wǎng)絡(luò)數(shù)據(jù)
data_source = 'http://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip'
zipname = temp_dir + '/Bike-Sharing-Dataset.zip'
# 獲得數(shù)據(jù)
urllib.request.urlretrieve(data_source, zipname)# 創(chuàng)建一個 ZipFile 對象處理壓縮文件
zip_ref = zipfile.ZipFile(zipname, 'r')
# 解壓
zip_ref.extractall(temp_dir)
zip_ref.close()daily_path = temp_dir + '/day.csv'
daily_data = pd.read_csv(daily_path)
# 把字符串?dāng)?shù)據(jù)轉(zhuǎn)換成日期數(shù)據(jù)
daily_data['dteday'] = pd.to_datetime(daily_data['dteday'])
# 不關(guān)注的列
drop_list = ['instant', 'season', 'yr', 'mnth','holiday', 'workingday', 'weathersit', 'atemp', 'hum']
# inplace = true 表示在對象上直接操作
daily_data.drop(drop_list, inplace=True, axis=1)# 刪除臨時文件目錄
shutil.rmtree(temp_dir)# 查看數(shù)據(jù)
daily_data.head(10)

      dteday  weekday      temp  windspeed  casual  registered   cnt
0 2011-01-01        6  0.344167   0.160446     331         654   985
1 2011-01-02        0  0.363478   0.248539     131         670   801
2 2011-01-03        1  0.196364   0.248309     120        1229  1349
3 2011-01-04        2  0.200000   0.160296     108        1454  1562
4 2011-01-05        3  0.226957   0.186900      82        1518  1600
5 2011-01-06        4  0.204348   0.089565      88        1518  1606
6 2011-01-07        5  0.196522   0.168726     148        1362  1510
7 2011-01-08        6  0.165000   0.266804      68         891   959
8 2011-01-09        0  0.138333   0.361950      54         768   822
9 2011-01-10        1  0.150833   0.223267      41        1280  1321

2. 配置參數(shù)

# 引入 3.x 版本的出發(fā)和打印
from __future__ import division, print_function
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np# 在 notebook 中顯示繪圖結(jié)果
%matplotlib inline# 設(shè)置一些全局的資源參數(shù)
import matplotlib# 設(shè)置圖片尺寸 14 x 7
# rc: resource configuration
matplotlib.rc('figure', figsize=(14, 7))# 設(shè)置字體 14
matplotlib.rc('font', size = 14)# 不顯示頂部和右側(cè)的坐標(biāo)線
matplotlib.rc('axes.spines', top = False, right = False)# 不顯示網(wǎng)格
matplotlib.rc('axes', grid = False)# 設(shè)置背景顏色是白色
matplotlib.rc('axes', facecolor = 'white')

3. 關(guān)聯(lián)分析（散點圖 - 分析變量關(guān)系）

# 包裝一個散點圖的函數(shù)便于復(fù)用
def scatterplot(x_data, y_data, x_label, y_label, title):# 創(chuàng)建一個繪圖對象fig, ax = plt.subplots()# 設(shè)置數(shù)據(jù)、點的大小、點的顏色和透明度# http://www.114la.com/other/rgb.htmax.scatter(x_data, y_data, s=10, color='#539caf', alpha=0.75)# 添加標(biāo)題和坐標(biāo)說明ax.set_title(title)ax.set_xlabel(x_label)ax.set_ylabel(y_label)plt.show()# 繪制散點圖
scatterplot(x_data=daily_data['temp'], y_data=daily_data['cnt'], x_label='Normalized temperature (C)',y_label='Check outs', title='Number of Check Outs vs Temperature')

4. 關(guān)聯(lián)分析（曲線圖 - 擬合變量關(guān)系）

# 線性回歸 最小二乘
import statsmodels.api as sm# 獲得匯總信息
from statsmodels.stats.outliers_influence import summary_table# 線性回歸增加常數(shù)項 y=kx+b
x = sm.add_constant(daily_data['temp'])
y = daily_data['cnt']# 普通最小二乘模型，ordinary least square model
regr = sm.OLS(y, x)
res = regr.fit()# 從模型獲得擬合數(shù)據(jù)
# 置信水平alpha=5%，st數(shù)據(jù)匯總，data數(shù)據(jù)詳情，ss2數(shù)據(jù)列名
st, data, ss2 = summary_table(res, alpha=0.05)
fitted_values = data[:, 2]# 包裝曲線繪制函數(shù)
def lineplot(x_data, y_data, x_label, y_label, title):# 創(chuàng)建繪圖對象_, ax = plt.subplots()# 繪制擬合曲線，lw=linewidth，alpha=transparancyax.plot(x_data, y_data, lw=2, color='#539caf', alpha=1)# 添加標(biāo)題和坐標(biāo)說明ax.set_title(title)ax.set_xlabel(x_label)ax.set_ylabel(y_label)# 調(diào)用繪圖函數(shù)
lineplot(x_data=daily_data['temp'], y_data=fitted_values, x_label='Normalized temperature (C)',y_label='Check outs', title='Line of Best Fit for Number of Check Outs vs Temperature')

>>> x.size
1462>>> type(regr)
statsmodels.regression.linear_model.OLS>>> # st.head()
>>> pd.DataFrame.from_records(st.data).head()0           1            2             3            4            5   \
0  Obs     Dep Var    Predicted     Std Error      Mean ci      Mean ci
1       Population        Value  Mean Predict      95% low      95% upp
2  1.0       985.0  3500.155357     72.432281  3357.954604   3642.35611
3  2.0       801.0  3628.394108     68.827331  3493.270679  3763.517537
4  3.0      1349.0  2518.638497    106.979293  2308.614241  2728.6627546            7            8            9         10        11
0  Predict ci   Predict ci     Residual    Std Error   Student    Cook's
1     95% low      95% upp                  Residual  Residual         D
2  533.478562  6466.832152 -2515.155357  1507.649519 -1.668263  0.003212
3  662.048124  6594.740092 -2827.394108  1507.818393 -1.875156  0.003663
4 -452.061814  5489.338809 -1169.638497  1505.592554 -0.776863  0.001524>>> ss2
['Obs','Dep Var\nPopulation','Predicted\nValue','Std Error\nMean Predict','Mean ci\n95% low','Mean ci\n95% upp','Predict ci\n95% low','Predict ci\n95% upp','Residual','Std Error\nResidual','Student\nResidual',"Cook's\nD"]>>> data
array([[ 1.00000000e+00,  9.85000000e+02,  3.50015536e+03, ...,1.50764952e+03, -1.66826263e+00,  3.21190276e-03],[ 2.00000000e+00,  8.01000000e+02,  3.62839411e+03, ...,1.50781839e+03, -1.87515560e+00,  3.66326560e-03],[ 3.00000000e+00,  1.34900000e+03,  2.51863850e+03, ...,1.50559255e+03, -7.76862568e-01,  1.52350164e-03],...,[ 7.29000000e+02,  1.34100000e+03,  2.89695311e+03, ...,1.50654569e+03, -1.03279517e+00,  2.01463700e-03],[ 7.30000000e+02,  1.79600000e+03,  2.91355488e+03, ...,1.50658291e+03, -7.41781203e-01,  1.02560619e-03],[ 7.31000000e+02,  2.72900000e+03,  2.64792648e+03, ...,1.50594093e+03,  5.38357901e-02,  6.64260501e-06]])

5. 帶置信區(qū)間的曲線圖 - 評估權(quán)限擬合結(jié)果

# 獲得5%置信區(qū)間的上下界
predict_mean_ci_low, predict_mean_ci_upp = data[:, 4:6].T# 創(chuàng)建置信區(qū)間DataFrame，上下界
CI_df = pd.DataFrame(columns=['x_data', 'low_CI', 'upper_CI'])
CI_df['x_data'] = daily_data['temp']
CI_df['low_CI'] = predict_mean_ci_low
CI_df['upper_CI'] = predict_mean_ci_upp
CI_df.sort_values('x_data', inplace=True)  # 根據(jù)x_data進(jìn)行排序# 繪制置信區(qū)間
def lineplotCI(x_data, y_data, sorted_x, low_CI, upper_CI, x_label, y_label, title):# 創(chuàng)建繪圖對象_, ax = plt.subplots()# 繪制預(yù)測曲線ax.plot(x_data, y_data, lw=1, color='#539caf', alpha=1, label='Fit')# 繪制置信區(qū)間，順序填充ax.fill_between(sorted_x, low_CI, upper_CI,color='#539caf', alpha=0.4, label='95% CI')# 添加標(biāo)題和坐標(biāo)說明ax.set_title(title)ax.set_xlabel(x_label)ax.set_ylabel(y_label)# 顯示圖例，配合label參數(shù)，loc=“best”自適應(yīng)方式ax.legend(loc='best')# Call the function to create plot
lineplotCI(x_data=daily_data['temp'], y_data=fitted_values, sorted_x=CI_df['x_data'], low_CI=CI_df['low_CI'], upper_CI=CI_df['upper_CI'],x_label='Normalized temperature (C)', y_label='Check outs', title='Line of Best Fit for Number of Check Outs vs Temperature')

>>> predict_mean_ci_low
array([3357.95460434, 3493.2706787 , 2308.61424066, 2334.61511966,2527.1743799 , 2365.69916755, 2309.74422092, 2084.09224731,1892.90173201, 1982.55120771, 2113.4006319 , 2139.44397   ,2084.09224731, 2054.49814292, 2572.66032092, 2560.77755361,2161.68700285, 2453.71655445, 2991.06693905, 2774.47282774,2173.62331635, 1323.87879839, 1592.70114322, 1598.94940723,2502.34533239, 2459.66533233, 2298.85873944, 2359.48023561,2309.74422092, 2452.68101446, 2197.48538328, 2278.64410965,2762.61518008, 2241.31702524, 2415.40839107, 2572.66032092,2946.11512008, 2845.5587389 , 2483.46378131, 1867.43225439,1936.04737195, 2256.58702583, 2495.3642587 , 3163.28624958,3850.82088667, 2805.90262872, 3175.56105833, 3993.62059464,4568.15553475, 3741.55716105, 2941.746223  , 3070.07683526,2207.42834256, 2489.93178099, 3015.70611753, 3499.35231432,2922.47209315, 3353.11577628, 3797.57171434, 2810.02576103,3293.51825779, 2322.69524907, 2774.47282774, 3637.51586294,3584.30939921, 2774.98492788, 2993.37692847, 3016.98804377,3671.72232094, 3163.28624958, 3252.45565962, 3638.77413698,3224.62303583, 3169.4205805 , 3505.42562991, 3850.82088667,4687.81236982, 4241.92285953, 3275.92466748, 3956.7321869 ,4033.39618479, 3377.54137823, 2940.20709584, 2792.25173921,2804.0968969 , 2713.10649495, 2793.53874375, 3064.18335719,3046.49142163, 2821.86760563, 3046.49142163, 3152.54017204,3596.92375538, 4902.84424832, 3845.08751497, 3683.81159355,4004.99592   , 3299.37851782, 3346.2460944 , 3930.93527485,
...3016.98804377, 2916.56152591, 3217.22108861, 3486.43250237,3701.14990982, 3822.12298042, 3275.92466748, 3258.32254957,3234.84243973, 2804.0968969 , 2661.76079882, 2558.18822718,2984.90173382, 2643.9488419 , 2721.10770942, 2715.17095217,2715.17095217, 2732.96547894, 2447.76025905])

6. 雙坐標(biāo)曲線圖

曲線擬合不滿足置信閾值時，考慮增加獨立變量；
分析不同尺度多變量的關(guān)系；

# 雙縱坐標(biāo)繪圖函數(shù)
def lineplot2y(x_data, x_label, y1_data, y1_color, y1_label, y2_data, y2_color, y2_label, title):_, ax1 = plt.subplots()ax1.plot(x_data, y1_data, color=y1_color)# 添加標(biāo)題和坐標(biāo)說明ax1.set_ylabel(y1_label, color=y1_color)ax1.set_xlabel(x_label)ax1.set_title(title)ax2 = ax1.twinx()  # 兩個繪圖對象共享橫坐標(biāo)軸ax2.plot(x_data, y2_data, color=y2_color)ax2.set_ylabel(y2_label, color=y2_color)# 右側(cè)坐標(biāo)軸可見ax2.spines['right'].set_visible(True)# 調(diào)用繪圖函數(shù)
lineplot2y(x_data=daily_data['dteday'], x_label='Day', y1_data=daily_data['cnt'], y1_color='#539caf', y1_label='Check outs',y2_data=daily_data['windspeed'], y2_color='#7663b0', y2_label='Normalized windspeed', title='Check Outs and Windspeed Over Time')

7. 分布分析（灰度圖 - 粗略區(qū)間計數(shù)）

# 繪制灰度圖的函數(shù)
def histogram(data, x_label, y_label, title):_, ax = plt.subplots()res = ax.hist(data, color='#539caf', bins=10)  # 設(shè)置bin的數(shù)量ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)return res# 繪圖函數(shù)調(diào)用
res = histogram(data=daily_data['registered'], x_label='Check outs',y_label='Frequency', title='Distribution of Registered Check Outs')
res[0]  # value of bins
res[1]  # boundary of bins

8. 堆疊直方圖 - 比較兩個分布

# 繪制堆疊的直方圖
def overlaid_histogram(data1, data1_name, data1_color, data2, data2_name, data2_color, x_label, y_label, title):# 歸一化數(shù)據(jù)區(qū)間，對齊兩個直方圖的binsmax_nbins = 10data_range = [min(min(data1), min(data2)), max(max(data1), max(data2))]binwidth = (data_range[1] - data_range[0]) / max_nbinsbins = np.arange(data_range[0], data_range[1] +binwidth, binwidth)  # 生成直方圖bins區(qū)間# Create the plot_, ax = plt.subplots()ax.hist(data1, bins=bins, color=data1_color, alpha=1, label=data1_name)ax.hist(data2, bins=bins, color=data2_color, alpha=0.75, label=data2_name)ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)ax.legend(loc='best')# Call the function to create plot
overlaid_histogram(data1=daily_data['registered'], data1_name='Registered', data1_color='#539caf', data2=daily_data['casual'],data2_name='Casual', data2_color='#7663b0', x_label='Check outs', y_label='Frequency', title='Distribution of Check Outs By Type')

registered：注冊的分布，正態(tài)分布，why；
casual：偶然的分布，疑似指數(shù)分布，why；

9. 密度圖 - 精細(xì)刻畫概率分布

KDE: kernal density estimate

$f?h(x)=1n∑?i=1nK?h(x?xi)=1nh∑?i=1nK(x?xih)f*h(x)={\frac{1}{n}}\sum*{i=1}^nK*h(x?xi)={\frac{1}{nh}}\sum*{i=1}^nK({\frac{x?x_i}{h}})$

# 計算概率密度
from scipy.stats import gaussian_kdedata = daily_data['registered']# kernal density estimate: https://en.wikipedia.org/wiki/Kernel_density_estimation
density_est = gaussian_kde(data)# 控制平滑程度，數(shù)值越大，越平滑
density_est.covariance_factor = lambda: .3
density_est._compute_covariance()
x_data = np.arange(min(data), max(data), 200)# 繪制密度估計曲線
def densityplot(x_data, density_est, x_label, y_label, title):_, ax = plt.subplots()ax.plot(x_data, density_est(x_data), color='#539caf', lw=2)ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)# 調(diào)用繪圖函數(shù)
densityplot(x_data=x_data, density_est=density_est, x_label='Check outs',y_label='Frequency', title='Distribution of Registered Check Outs')

>>> type(density_est)
scipy.stats._kde.gaussian_kde

10. 組間分析（柱狀圖 - 一級類間均值方差比較）

組間定量比較
分組粒度
組間聚類

# 分天分析統(tǒng)計特征
mean_total_co_day = daily_data[['weekday', 'cnt']].groupby('weekday').agg([np.mean, np.std])
mean_total_co_day.columns = mean_total_co_day.columns.droplevel()# 定義繪制柱狀圖的函數(shù)
def barplot(x_data, y_data, error_data, x_label, y_label, title):_, ax = plt.subplots()# 柱狀圖ax.bar(x_data, y_data, color='#539caf', align='center')# 繪制方差# ls='none'去掉bar之間的連線ax.errorbar(x_data, y_data, yerr=error_data,color='#297083', ls='none', lw=5)ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)# 繪圖函數(shù)調(diào)用
barplot(x_data=mean_total_co_day.index.values, y_data=mean_total_co_day['mean'], error_data=mean_total_co_day['std'], x_label='Day of week', y_label='Check outs', title='Total Check Outs By Day of Week (0 = Sunday)')

>>> mean_total_co_day.columns
Index(['mean', 'std'], dtype='object')>>> daily_data[['weekday', 'cnt']].groupby('weekday').agg([np.mean, np.std])cntmean          std
weekday
0        4228.828571  1872.496629
1        4338.123810  1793.074013
2        4510.663462  1826.911642
3        4548.538462  2038.095884
4        4667.259615  1939.433317
5        4690.288462  1874.624870
6        4550.542857  2196.693009

11. 堆積柱狀圖 - 多級類間相對占比比較

>>> mean_by_reg_co_day = daily_data[[
>>>     'weekday', 'registered', 'casual']].groupby('weekday').mean()
>>> mean_by_reg_co_dayregistered       casual
weekday
0        2890.533333  1338.295238
1        3663.990476   674.133333
2        3954.480769   556.182692
3        3997.394231   551.144231
4        4076.298077   590.961538
5        3938.000000   752.288462
6        3085.285714  1465.257143

# 分天統(tǒng)計注冊和偶然使用的情況
mean_by_reg_co_day = daily_data[['weekday', 'registered', 'casual']].groupby('weekday').mean()
# 分天統(tǒng)計注冊和偶然使用的占比
mean_by_reg_co_day['total'] = mean_by_reg_co_day['registered'] + \mean_by_reg_co_day['casual']
mean_by_reg_co_day['reg_prop'] = mean_by_reg_co_day['registered'] / \mean_by_reg_co_day['total']
mean_by_reg_co_day['casual_prop'] = mean_by_reg_co_day['casual'] / \mean_by_reg_co_day['total']# 繪制堆積柱狀圖
def stackedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):_, ax = plt.subplots()# 循環(huán)繪制堆積柱狀圖for i in range(0, len(y_data_list)):if i == 0:ax.bar(x_data, y_data_list[i], color=colors[i],align='center', label=y_data_names[i])else:# 采用堆積的方式，除了第一個分類，后面的分類都從前一個分類的柱狀圖接著畫# 用歸一化保證最終累積結(jié)果為1ax.bar(x_data, y_data_list[i], color=colors[i],bottom=y_data_list[i - 1], align='center', label=y_data_names[i])ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)ax.legend(loc='upper right')  # 設(shè)定圖例位置# 調(diào)用繪圖函數(shù)
stackedbarplot(x_data=mean_by_reg_co_day.index.values, y_data_list=[mean_by_reg_co_day['reg_prop'], mean_by_reg_co_day['casual_prop']], y_data_names=['Registered', 'Casual'], colors=['#539caf', '#7663b0'], x_label='Day of week', y_label='Proportion of check outs', title='Check Outs By Registration Status and Day of Week (0 = Sunday)')

請?zhí)砑訄D片描述

從這幅圖你看出了什么？工作日 VS 節(jié)假日；
為什么會有這樣的差別？

12. 分組柱狀圖 - 多級類間絕對數(shù)值比較

# 繪制分組柱狀圖的函數(shù)
def groupedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):_, ax = plt.subplots()# 設(shè)置每一組柱狀圖的寬度total_width = 0.8# 設(shè)置每一個柱狀圖的寬度ind_width = total_width / len(y_data_list)# 計算每一個柱狀圖的中心偏移alteration = np.arange(-total_width/2+ind_width/2,total_width/2+ind_width/2, ind_width)# 分別繪制每一個柱狀圖for i in range(0, len(y_data_list)):# 橫向散開繪制ax.bar(x_data + alteration[i], y_data_list[i],color=colors[i], label=y_data_names[i], width=ind_width)ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)ax.legend(loc='upper right')# 調(diào)用繪圖函數(shù)
groupedbarplot(x_data=mean_by_reg_co_day.index.values, y_data_list=[mean_by_reg_co_day['registered'], mean_by_reg_co_day['casual']], y_data_names=['Registered', 'Casual'], colors=['#539caf', '#7663b0'], x_label='Day of week', y_label='Check outs', title='Check Outs By Registration Status and Day of Week (0 = Sunday)')

偏移前：ind_width/2；
偏移后：total_width/2；
偏移量：total_width/2-ind_width/2；

13. 箱式圖

多級類間數(shù)據(jù)分布比較；
柱狀圖 + 堆疊灰度圖；

# 只需要指定分類的依據(jù)，就能自動繪制箱式圖
days = np.unique(daily_data['weekday'])
bp_data = []
for day in days:bp_data.append(daily_data[daily_data['weekday'] == day]['cnt'].values)# 定義繪圖函數(shù)
def boxplot(x_data, y_data, base_color, median_color, x_label, y_label, title):_, ax = plt.subplots()# 設(shè)置樣式ax.boxplot(y_data               # 箱子是否顏色填充, patch_artist=True               # 中位數(shù)線顏色# 箱子顏色設(shè)置，color：邊框顏色，facecolor：填充顏色, medianprops={'color': base_color}               # 貓須顏色whisker# 貓須界限顏色whisker cap, boxprops={'color': base_color, 'facecolor': median_color}, whiskerprops={'color': median_color}, capprops={'color': base_color})# 箱圖與x_data保持一致ax.set_xticklabels(x_data)ax.set_ylabel(y_label)ax.set_xlabel(x_label)ax.set_title(title)# 調(diào)用繪圖函數(shù)
boxplot(x_data=days, y_data=bp_data, base_color='b', median_color='r', x_label='Day of week',y_label='Check outs', title='Total Check Outs By Day of Week (0 = Sunday)')

>>> bp_data
[array([ 801,  822, 1204,  986, 1096, 1623, 1589, 1812, 2402,  605, 2417,2471, 1693, 3249, 2895, 3744, 4191, 3351, 4333, 4553, 4660, 4788,4906, 4460, 4744, 5305, 4649, 4881, 5302, 3606, 4302, 3785, 3820,3873, 4334, 4940, 5046, 4274, 5010, 2918, 5511, 5041, 4381, 3331,3649, 3717, 3520, 3071, 3485, 2743, 2431,  754, 2294, 3425, 2311,1977, 3243, 2947, 1529, 2689, 3389, 3423, 4911, 5892, 4996, 6041,5169, 7132, 1027, 6304, 6359, 6118, 7129, 6591, 7641, 6598, 6978,6891, 5531, 4672, 6031, 7410, 6597, 5464, 6544, 4549, 5255, 5810,8227, 7333, 7907, 6889, 3510, 6639, 6824, 4459, 5107, 6852, 4669,2424, 4649, 3228, 3786, 1787, 1796]), array([1349, 1321, 1000, 1416, 1501, 1712, 1913, 1107, 1446, 1872, 2046,2077, 2028, 3115, 3348, 3429, 4073, 4401, 4362, 3958, 4274, 4098,4548, 5020, 4010, 4708, 6043, 4086, 4458, 3840, 4266, 4326, 4338,4758, 4634, 3351, 4713, 4539, 4630, 3570, 5117, 4570, 4187, 3669,4035, 4486, 2765, 3867, 3811, 3310, 3403, 1317, 1951, 2376, 2298,2432, 3624, 3784, 3422, 3129, 4322, 3333, 5298, 6153, 5558, 5936,5585, 6370, 3214, 5572, 6273, 2843, 4359, 6043, 6998, 6664, 5099,6779, 6227, 6569, 6830, 6966, 7105, 7013, 6883, 6530, 6917, 6034,7525, 6869, 7436, 6778, 5478, 5875, 7058,   22, 5259, 6269, 5499,5087, 6234, 5170, 4585,  920, 2729]), array([1562, 1263,  683, 1985, 1360, 1530, 1815, 1450, 1851, 2133, 2056,2703, 2425, 1795, 2034, 3204, 4400, 4451, 4803, 4123, 4492, 3982,4833, 4891, 4835, 4648, 4665, 4258, 4541, 4590, 4845, 4602, 4725,5895, 5204, 2710, 4763, 3641, 4120, 4456, 4563, 4748, 4687, 4068,4205, 4195, 1607, 2914, 2594, 3523, 3750, 1162, 2236, 3598, 2935,4339, 4509, 4375, 3922, 3777, 4363, 3956, 5847, 6093, 5102, 6772,5918, 6691, 5633, 5740, 5728, 5115, 6073, 5743, 7001, 4972, 6825,
...5976, 8714, 8395, 8555, 7965, 7109, 8090, 7852, 5138, 6536, 5629,2277, 5191, 5582, 5047, 1749, 1341])]>>> days
[0 1 2 3 4 5 6]

6. 應(yīng)用案例：航班乘客變化分析

1. 折線圖：分析年度乘客總量變化情況

%matplotlib inline
import matplotlib as mpl
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd

import ssl
ssl._create_default_https_context = ssl._create_unverified_contextdata = sns.load_dataset("flights")
data.head()
# 年份，月份，乘客數(shù)

   year month  passengers
0  1949   Jan         112
1  1949   Feb         118
2  1949   Mar         132
3  1949   Apr         129
4  1949   May         121

months_data = {'month': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August','September', 'October', 'November', 'December'], 'month_int': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}
months = pd.DataFrame(months_data)
months

        month  month_int
0     January          1
1    February          2
2       March          3
3       April          4
4         May          5
5        June          6
6        July          7
7      August          8
8   September          9
9     October         10
10   November         11
11   December         12

data = pd.merge(data, months, on='month')
data.head(15)

    year     month  passengers  month_int
0   1949   January         112          1
1   1950   January         115          1
2   1951   January         145          1
3   1952   January         171          1
4   1953   January         196          1
5   1954   January         204          1
6   1955   January         242          1
7   1956   January         284          1
8   1957   January         315          1
9   1958   January         340          1
10  1959   January         360          1
11  1960   January         417          1
12  1949  February         118          2
13  1950  February         126          2
14  1951  February         150          2

import numpy as np
x = np.arange(1,13)
for name, group in data.groupby('year'):
#     print(name)plt.plot(x, group['passengers'], label=name)plt.legend(loc='upper right')
#     print(group[['month_int', 'passengers']])

passengers

2. 柱狀圖：分析乘客在一年中各月份的分布

data_month = pd.merge(data.groupby('month').sum()[['passengers']], months, on='month').sort_values(by='month_int')
plt.bar(data_month['month_int'], data_month['passengers'])
plt.plot(data_month['month_int'], data_month['passengers'])

passengers-bar

7. 應(yīng)用案例：鳶尾花花型尺寸分析

1. 散點圖：萼片（sepal）和花瓣（petal）的大小關(guān)系

data = sns.load_dataset('iris')
iris_colors = pd.DataFrame({'species': ['setosa', 'versicolor', 'virginica'], 'colors': ['r', 'g', 'b']})
data_colors = pd.merge(data, iris_colors, on='species')
# data_colors
plt.scatter(data_colors['sepal_length'],data_colors['sepal_width'], c=data_colors['colors'])

iris-scatter

2. 分類散點子圖：不同種類（species）鳶尾花萼片和花瓣的大小關(guān)系

data = sns.load_dataset("iris")
data.groupby('species').sum()
# 萼片長度，萼片寬度，花瓣長度，花瓣寬度，種類

            sepal_length  sepal_width  petal_length  petal_width
species
setosa             250.3        171.4          73.1         12.3
versicolor         296.8        138.5         213.0         66.3
virginica          329.4        148.7         277.6        101.3

還可以探索柱狀圖或者箱式圖：不同種類鳶尾花萼片和花瓣大小的分布情況；

8. 應(yīng)用案例：餐廳小費情況分析

散點圖：小費和總消費之間的關(guān)系；
分類箱式圖：男性顧客和女性顧客，誰更慷慨；
分類箱式圖：抽煙與否是否會對小費金額產(chǎn)生影響；
分類箱式圖：工作日和周末，什么時候顧客給的小費更慷慨；
分類箱式圖：午飯和晚飯，哪一頓顧客更愿意給小費；
分類箱式圖：就餐人數(shù)是否會對慷慨度產(chǎn)生影響；
分組柱狀圖：性別 + 抽煙的組合因素對慷慨度的影響；

data = sns.load_dataset("tips")
data.head()
# 總消費，小費，性別，吸煙與否，就餐星期，就餐時間，就餐人數(shù)

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

9. 應(yīng)用案例：泰坦尼克號海難幸存狀況分析

堆積柱狀圖：不同倉位等級中幸存和遇難的乘客比例；
堆積柱狀圖：不同性別的幸存比例；
分類箱式圖：幸存和遇難乘客的票價分布；
分類箱式圖：幸存和遇難乘客的年齡分布
分組柱狀圖：不同上船港口的乘客倉位等級分布；
分類箱式圖：幸存和遇難乘客堂兄弟姐妹的數(shù)量分布；
分類箱式圖：幸存和遇難乘客父母子女的數(shù)量分布；
堆積柱狀圖或者分組柱狀圖：單獨乘船與否和幸存之間有沒有聯(lián)系；

data = sns.load_dataset("titanic")
data.head()
# 幸存與否，倉位等級，性別，年齡，堂兄弟姐妹數(shù)，父母子女?dāng)?shù)，票價，上船港口縮寫，倉位等級，人員分類，是否成年男性，所在甲板，上船港口，是否幸存，是否單獨乘船

   survived  pclass     sex   age  sibsp  parch     fare  ...  class    who adult_male  deck  embark_town alive  alone
0         0       3    male  22.0      1      0   7.2500  ...  Third    man       True   NaN  Southampton    no  False
1         1       1  female  38.0      1      0  71.2833  ...  First  woman      False     C    Cherbourg   yes  False
2         1       3  female  26.0      0      0   7.9250  ...  Third  woman      False   NaN  Southampton   yes   True
3         1       1  female  35.0      1      0  53.1000  ...  First  woman      False     C  Southampton   yes  False
4         0       3    male  35.0      0      0   8.0500  ...  Third    man       True   NaN  Southampton    no   True

上一篇：「Python 機器學(xué)習(xí)」Pandas 數(shù)據(jù)分析
專欄：《Python 基礎(chǔ)》 | 《機器學(xué)習(xí)》

PS：歡迎各路道友閱讀與評論，感謝道友點贊、關(guān)注、收藏！

查看全文

http://aloenet.com.cn/news/30659.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡