先看总的图:
本质上就是在传统gbdt的决策树基础上加入了正则化防止过拟合,以及为了让损失函数求解更方便,加入了泰勒展开,这样计算损失函数更方便了(除了决策树代码有差别,其他都是gbdt一样,本文仅实现xgboost的决策树)。如下:

再解释各个步骤:




。。。
让gpt来汇总下:


好了,我们直接写下实现代码:
import numpy as np
class XGBoostDecisionTree:
def __init__(self, max_depth=3):
self.max_depth = max_depth
def fit(self, X, y):
# X是特征数据,y的第一列是伪残差,第二列是hessians
self.tree = self._grow_tree(X, y, depth=0)
def _gain(self, gradients, hessians):
return np.square(gradients.sum()) / (hessians.sum() + 1e-9)
def _split(self, X, y):
best_gain = 0
best_split = None
best_left_y, best_right_y = None, None
gradients, hessians = y[:, 0], y[:, 1]
total_gain = self._gain(gradients, hessians)
for i in range(X.shape[1]):
thresholds = np.unique(X[:, i])
for thresh in thresholds:
left_mask = X[:, i] < thresh
right_mask = ~left_mask
left_gradients = gradients[left_mask]
right_gradients = gradients[right_mask]
left_hessians = hessians[left_mask]
right_hessians = hessians[right_mask]
left_gain = self._gain(left_gradients, left_hessians)
right_gain = self._gain(right_gradients, right_hessians)
gain = left_gain + right_gain - total_gain
if gain > best_gain:
best_gain = gain
best_split = (i, thresh)
best_left_y = y[left_mask]
best_right_y = y[right_mask]
return best_gain, best_split, best_left_y, best_right_y
def _grow_tree(self, X, y, depth):
gradients, hessians = y[:, 0], y[:, 1]
predicted_value = -gradients.sum() / (hessians.sum() + 1e-9)
node = {"value": predicted_value}
if depth < self.max_depth:
gain, split, left_y, right_y = self._split(X, y)
if gain > 0: # 只有当增益大于0时我们才真正地进行分裂
feature_idx, threshold = split
left_mask = X[:, feature_idx] < threshold
node["feature_idx"] = feature_idx
node["threshold"] = threshold
node["left"] = self._grow_tree(X[left_mask], left_y, depth+1)
node["right"] = self._grow_tree(X[~left_mask], right_y, depth+1)
return node
def predict(self, X):
return np.array([self._predict_single(x, self.tree) for x in X])
def _predict_single(self, x, node):
if "feature_idx" in node:
if x[node["feature_idx"]] < node["threshold"]:
return self._predict_single(x, node["left"])
else:
return self._predict_single(x, node["right"])
return node["value"]
import matplotlib.pyplot as plt
X = np.linspace(0, 10, 100)[:, np.newaxis]
y_true = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])
base_pred = np.ones_like(y_true) * y_true.mean()
pseudo_residuals = -2 * (y_true - base_pred)
y = np.c_[pseudo_residuals, np.ones_like(y_true)]
model = XGBoostDecisionTree(max_depth=8)
model.fit(X, y)
predictions = model.predict(X) + base_pred # 加上基学习器的预测
plt.scatter(X, y_true, label='True values', color='b')
plt.plot(X, predictions, label='XGBoost Decision Tree', color='r')
plt.legend()
plt.title('XGBoost Decision Tree Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.show()
效果图:

代码中有几个细节值得注意:
hession矩阵就是二阶导数,就是1,因为损失函数二阶求导(泰勒展开后的)就是1!


预测值为什么是这个呢?predicted_value = -gradients.sum() / (hessians.sum() + 1e-9)

此外,代码里对于增益的变化计算(从后面回答看,严格说还有除2):




所以说,实际上这里是Loss函数变化的值来模拟的gain! 变化更大了,loss就更小了!说明分裂效果越好!

好了,至于其他代码,和决策树算法步骤一样,可以参考之前的文章,就不再说了。