决策树DecisionTree

发布时间 2023-06-11 18:14:19作者: 找回那所有、

模型亮点

  1. 数据清洗方式得当
  2. 由于模型、数据集太小,没有什么好调的,就当练习吧~

-----------------------------------------以下为模型具体实现-----------------------------------------

Step1.数据读取

import pandas as pd
df=pd.read_csv('bankdebt.csv',index_col=0,header=None)
df.columns=['house','marital','income','repayment']
df.head()

Step2.数据清洗

# 1.是否有房子,Yes->1,No->0
df.loc[df['house']=='Yes','house']=1
df.loc[df['house']=='No','house']=0
# 2.能否偿还贷款,Yes->1,No->0
df.loc[df['repayment']=='Yes','repayment']=1
df.loc[df['repayment']=='No','repayment']=0
# 3.婚姻状态,Single->1,Married->2,Divorced->3
df.loc[df['marital']=='Single','marital']=1
df.loc[df['marital']=='Married','marital']=2
df.loc[df['marital']=='Divorced','marital']=3

Step3.划分训练集和测试集

from sklearn.model_selection import train_test_split
x=df.drop('repayment',axis=1).astype(float)
y=df['repayment'].astype(float)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)

Step4.启动决策树

from sklearn.tree import DecisionTreeClassifier
def tree(x_train,y_train):
    model=DecisionTreeClassifier()
    model.fit(x_train,y_train)
    return model
model=tree(x_train,y_train)

Step5.模型评估

print("训练集上评分:",model.score(x_train,y_train))
def score(model,x_test,y_test):
    print("测试集上评分:",model.score(x_test,y_test),5)
score(model,x_test,y_test)

Step6.可视化决策树结构

from sklearn.tree.export import export_text
branch_name=['house','marital','income']
export_text(model,feature_names=branch_name)

Step7.保存模型

from sklearn.externals import joblib
joblib.dump(model,'d:\DecisionTree.pkl')
new_model=joblib.load('d:\DecisionTree.pkl')
print("测试集上预测结果为:\n",new_model.predict(x_test))

-END