當(dāng)前位置：首頁 > news >正文

建甌網(wǎng)站制作百度網(wǎng)站怎么提升排名

news 2025/7/7 17:26:29

建甌網(wǎng)站制作,百度網(wǎng)站怎么提升排名,網(wǎng)站建設(shè)找哪個(gè),杭州營銷型網(wǎng)站目錄常見的決策樹算法 1. ID3 2. C4.5 3. CART 決策樹的優(yōu)缺點(diǎn) 優(yōu)點(diǎn)： 缺點(diǎn)： 決策樹的優(yōu)化常見的決策樹算法 1. ID3 ID3（Iterative Dichotomiser 3）算法使用信息增益作為特征選擇的標(biāo)準(zhǔn)。它是一種貪心算法，信…

常見的決策樹算法

1. ID3

2. C4.5

3. CART

決策樹的優(yōu)缺點(diǎn)

優(yōu)點(diǎn)：

缺點(diǎn)：

決策樹的優(yōu)化

常見的決策樹算法

1. ID3

ID3（Iterative Dichotomiser 3）算法使用信息增益作為特征選擇的標(biāo)準(zhǔn)。它是一種貪心算法，信息增益表示按某特征劃分?jǐn)?shù)據(jù)集前后信息熵的變化量，變化量越大，表示使用該特征劃分的效果越好。但I(xiàn)D3偏向于選擇取值較多的特征，可能導(dǎo)致過擬合。

以下是ID3算法的實(shí)現(xiàn)步驟：

1. 計(jì)算數(shù)據(jù)集的熵：熵是度量數(shù)據(jù)集無序程度的指標(biāo)。
2. 計(jì)算每個(gè)屬性的信息增益：信息增益是數(shù)據(jù)集的熵減去按照該屬性分割后的條件熵。
3. 選擇信息增益最大的屬性：作為決策節(jié)點(diǎn)。
4. 分割數(shù)據(jù)集：根據(jù)選定的屬性和它的值，將數(shù)據(jù)集分割成若干子集。
5. 遞歸構(gòu)建決策樹：對(duì)每個(gè)子集重復(fù)步驟1-4，直到所有數(shù)據(jù)都屬于同一類別，或者已達(dá)到預(yù)設(shè)的最大深度。

以下是使用Python實(shí)現(xiàn)ID3算法的一個(gè)簡單示例：

import numpy as np
import pandas as pd# 計(jì)算熵
def calc_entropy(target_col):entropy = -np.sum([len(target_col[target_col == val]) / len(target_col) * np.log2(len(target_col[target_col == val]) / len(target_col))for val in np.unique(target_col)])return entropy# 按照給定屬性分裂數(shù)據(jù)集
def split_dataset(dataset, index, value):return dataset[dataset.iloc[:, index] == value]# 選擇最好的數(shù)據(jù)集屬性
def choose_best_feature_to_split(dataset):num_features = dataset.shape[1] - 1base_entropy = calc_entropy(dataset.iloc[:, -1])best_info_gain = 0.0best_feature = -1for i in range(num_features):feat_list = dataset.iloc[:, i]unique_vals = set(feat_list)new_entropy = 0.0for value in unique_vals:sub_dataset = split_dataset(dataset, i, value)prob = len(sub_dataset) / len(dataset)new_entropy += prob * calc_entropy(sub_dataset.iloc[:, -1])info_gain = base_entropy - new_entropyif info_gain > best_info_gain:best_info_gain = info_gainbest_feature = ireturn best_feature# 構(gòu)建決策樹
def create_tree(dataset, labels):# 檢查數(shù)據(jù)集是否為空if len(dataset) == 0:return None# 檢查數(shù)據(jù)集中的所有目標(biāo)變量是否相同if len(set(dataset.iloc[:, -1])) <= 1:return dataset.iloc[0, -1]# 檢查是否已達(dá)到最大深度或者沒有更多的特征if len(dataset.columns) == 1 or len(set(dataset.iloc[:, -1])) == 1:return majority_cnt(dataset.iloc[:, -1])# 選擇最好的數(shù)據(jù)集屬性best_feat = choose_best_feature_to_split(dataset)best_feat_label = dataset.columns[best_feat]my_tree = {best_feat_label: {}}del(dataset[:, best_feat])unique_vals = set(dataset.iloc[:, -1])for value in unique_vals:sub_labels = best_feat_label + "_" + str(value)my_tree[best_feat_label][value] = create_tree(split_dataset(dataset, best_feat, value), sub_labels)return my_tree# 找到出現(xiàn)次數(shù)最多的目標(biāo)變量值
def majority_cnt(class_list):class_count = {}for vote in class_list:if vote not in class_count.keys():class_count[vote] = 1else:class_count[vote] += 1sorted_class_count = sorted(class_count.items(), key=lambda item: item[1], reverse=True)return sorted_class_count[0][0]# 加載數(shù)據(jù)集
dataset = pd.read_csv("your_dataset.csv") ?# 替換為你的數(shù)據(jù)集路徑
labels = dataset.iloc[:, -1].name
dataset = dataset.iloc[:, 0:-1] ?# 特征數(shù)據(jù)# 創(chuàng)建決策樹
my_tree = create_tree(dataset, labels)
print(my_tree)

注：這個(gè)實(shí)現(xiàn)是為了教學(xué)目的而簡化的，實(shí)際應(yīng)用中通常會(huì)使用更高級(jí)的庫和算法，如 scikit-learn?中的 DecisionTreeClassifier。

2. C4.5

C4.5是ID3的改進(jìn)版，使用信息增益比替代信息增益作為特征選擇標(biāo)準(zhǔn)，從而克服了ID3傾向于選擇多值特征的缺點(diǎn)。此外，C4.5還能處理連續(xù)型特征和缺失值。

實(shí)現(xiàn)C4.5算法可以通過多種編程語言，但這里我將提供一個(gè)簡化的Python實(shí)現(xiàn)，使用Python的基本庫來構(gòu)建決策樹。這個(gè)實(shí)現(xiàn)將包括計(jì)算信息熵、信息增益、信息增益比，并基于這些度量來構(gòu)建決策樹。

1. 計(jì)算信息熵

信息熵是度量數(shù)據(jù)集無序程度的指標(biāo)，計(jì)算公式為：

其中 pi ?是第 i ?個(gè)類別的樣本在數(shù)據(jù)集中的比例。

2. 計(jì)算信息增益

信息增益是度量在知道特征 ?A ?的條件下，數(shù)據(jù)集 ?S ?的熵減少的程度。計(jì)算公式為：

其中 Sv ?是特征 ?A ?值為 ?v ?時(shí)的子集。

3. 計(jì)算信息增益比

信息增益比是信息增益與特征 ?A ?的固有信息的比值，計(jì)算公式為：

其中，分裂信息 Split Information(S, A) ?是度量特征 ?A ?的值分布的熵：

4. 構(gòu)建決策樹

使用以上計(jì)算方法，我們可以構(gòu)建一個(gè)簡單的C4.5決策樹：

import numpy as np
import pandas as pddef entropy(target_col):elements, counts = np.unique(target_col, return_counts=True)probabilities = counts / counts.sum()return -np.sum(probabilities * np.log2(probabilities))def information_gain(data, feature, target):total_entropy = entropy(data[target])values = data[feature].unique()weighted_entropy = 0.0for value in values:sub_data = data[data[feature] == value]weighted_entropy += (len(sub_data) / len(data)) * entropy(sub_data[target])return total_entropy - weighted_entropydef gain_ratio(data, feature, target):ig = information_gain(data, feature, target)split_info = entropy(data[feature])return ig / split_info if split_info != 0 else 0def choose_best_feature_to_split(data, features, target):best_feature = Nonemax_gain_ratio = -1for feature in features:gain_ratio_value = gain_ratio(data, feature, target)if gain_ratio_value > max_gain_ratio:max_gain_ratio = gain_ratio_valuebest_feature = featurereturn best_featuredef c45(data, features, target, tree=None, depth=0):if len(data[target].unique()) == 1:return data[target].mode()[0]if depth == 0:depth = 0elif depth > 10: ?# Limit the depth to avoid overfittingreturn data[target].mode()[0]best_feat = choose_best_feature_to_split(data, features, target)if best_feat is None:return data[target].mode()[0]if len(data[best_feat].unique()) == 1:return data[target].mode()[0]tree = tree if tree else {best_feat: {}}unique_vals = data[best_feat].unique()for value in unique_vals:subtree = c45(data[data[best_feat] == value], features, target, tree=tree, depth=depth+1)tree[best_feat][value] = subtreereturn tree# Example usage
data = pd.DataFrame({'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'],'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
})features = ['Outlook', 'Temperature', 'Humidity', 'Wind']
target = 'PlayTennis'tree = c45(data, features, target)
print(tree)

注：這個(gè)實(shí)現(xiàn)是一個(gè)簡化版，沒有包括剪枝和處理連續(xù)變量的步驟。在實(shí)際應(yīng)用中，你可能需要使用更復(fù)雜的數(shù)據(jù)結(jié)構(gòu)和算法來優(yōu)化性能和處理更復(fù)雜的情況。

3. CART

CART（分類與回歸樹）是一種既能用于分類也能用于回歸的決策樹算法。對(duì)于分類問題，CART使用基尼不純度作為特征選擇標(biāo)準(zhǔn)；對(duì)于回歸問題，則使用方差作為分裂標(biāo)準(zhǔn)。CART構(gòu)建的是二叉樹，每個(gè)內(nèi)部節(jié)點(diǎn)都是二元分裂。

以下是CART算法的基本步驟：

1. 選擇最佳分割特征和分割點(diǎn)：使用基尼不純度（Gini impurity）或均方誤差（Mean Squared Error, MSE）作為分割標(biāo)準(zhǔn)，選擇能夠最大程度降低不純度的特征和分割點(diǎn)。

2. 分割數(shù)據(jù)集：根據(jù)選定的特征和分割點(diǎn)，將數(shù)據(jù)集分割成兩個(gè)子集。

3. 遞歸構(gòu)建：對(duì)每個(gè)子集重復(fù)步驟1和2，直到滿足停止條件（如達(dá)到最大深度、節(jié)點(diǎn)中的樣本數(shù)量低于閾值或無法進(jìn)一步降低不純度）。

4. 剪枝：通過剪掉樹的某些部分來簡化樹，以防止過擬合。

以下是一個(gè)簡化的Python實(shí)現(xiàn)CART算法，使用基尼不純度作為分割標(biāo)準(zhǔn)：

import numpy as np
import pandas as pddef gini_impurity(y):class_probabilities = y.mean(axis=0)return 1 - np.sum(class_probabilities ** 2)def best_split(data, target_column, features):best_feature = Nonebest_threshold = Nonebest_gini = float('inf')for feature in features:for idx in np.unique(data[feature]):threshold = (data[feature] < idx).mean()split_data = data[data[feature] < idx]gini = (len(data) - len(split_data)) / len(data) * gini_impurity(split_data[target_column]) + \len(split_data) / len(data) * gini_impurity(data[(data[target_column] == target_column.mode())[data[target_column] >= idx]][target_column])if gini < best_gini:best_gini = ginibest_feature = featurebest_threshold = thresholdreturn best_feature, best_thresholddef build_tree(data, target_column, features, depth=0, max_depth=None):if max_depth is None:max_depth = np.infif len(data[target_column].unique()) == 1 or len(data) == 1 or depth >= max_depth:return data[target_column].mode()[0]best_feature, best_threshold = best_split(data, target_column, features)tree = {best_feature: {}}features = [f for f in features if f != best_feature]split_function = lambda x: x[best_feature] < best_thresholdleft_indices = np.array([bool(split_function(row)) for row in data.itertuples()])right_indices = np.array([not bool(split_function(row)) for row in data.itertuples()])left_data = data[left_indices]right_data = data[right_indices]if not left_data.empty:tree[best_feature][0] = build_tree(left_data, target_column, features, depth+1, max_depth)if not right_data.empty:tree[best_feature][1] = build_tree(right_data, target_column, features, depth+1, max_depth)return tree# Example usage
data = pd.DataFrame({'Feature1': [1, 2, 4, 6, 8, 10],'Feature2': [2, 4, 6, 8, 10, 12],'Target': ['Yes', 'No', 'Yes', 'No', 'Yes', 'No']
})features = ['Feature1', 'Feature2']
target_column = 'Target'tree = build_tree(data, target_column, features)
print(tree)

注：這個(gè)實(shí)現(xiàn)是一個(gè)簡化的版本，沒有包括剪枝步驟。在實(shí)際應(yīng)用中，你可能需要使用更復(fù)雜的數(shù)據(jù)結(jié)構(gòu)和算法來優(yōu)化性能和處理更復(fù)雜的情況。此外，對(duì)于回歸問題，需要使用均方誤差（MSE）而不是基尼不純度作為分割標(biāo)準(zhǔn)。

在實(shí)踐中，通常會(huì)使用像scikit-learn這樣的機(jī)器學(xué)習(xí)庫來構(gòu)建CART樹，因?yàn)樗鼈兲峁┝烁咝Ш透煽康膶?shí)現(xiàn)。例如，scikit-learn中的DecisionTreeClassifier和DecisionTreeRegressor類實(shí)現(xiàn)了CART算法。

決策樹的優(yōu)缺點(diǎn)

優(yōu)點(diǎn)：

- 易于理解和解釋。
- 可以處理數(shù)值和類別數(shù)據(jù)。
- 不需要數(shù)據(jù)標(biāo)準(zhǔn)化。
- 可以可視化。

缺點(diǎn)：

- 容易過擬合。
- 對(duì)于某些數(shù)據(jù)集，構(gòu)建的樹可能非常大。
- 對(duì)于缺失數(shù)據(jù)敏感。

決策樹的優(yōu)化

- 剪枝：通過減少樹的大小來減少過擬合。
- 集成方法：如隨機(jī)森林和梯度提升樹，可以提高模型的泛化能力。

執(zhí)筆至此，感觸彼多，全文將至，落筆為終，感謝各位讀者的支持，如果對(duì)你有所幫助，還請(qǐng)一鍵三連支持我，我會(huì)持續(xù)更新創(chuàng)作。

查看全文

http://aloenet.com.cn/news/46044.html

国产亚洲精品福利在线无卡一,国产精久久一区二区三区,亚洲精品无码国模,精品久久久久久无码专区不卡

建甌網(wǎng)站制作百度網(wǎng)站怎么提升排名

常見的決策樹算法

1. ID3

2. C4.5

3. CART

決策樹的優(yōu)缺點(diǎn)

優(yōu)點(diǎn)：

缺點(diǎn)：

決策樹的優(yōu)化

相關(guān)文章：