交叉验证¶
设置¶
In [ ]
已复制!
pip install ydf -U
pip install ydf -U
In [2]
已复制!
import ydf
import pandas as pd
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
dataset = pd.read_csv(f"{ds_path}/adult.csv")
# Print the first 5 examples
dataset.head(5)
import ydf import pandas as pd # 下载一个分类数据集并将其加载为 Pandas DataFrame。 ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset" dataset = pd.read_csv(f"{ds_path}/adult.csv") # 打印前 5 个样本 dataset.head(5)
Out [2]
age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K |
1 | 50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K |
2 | 38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
3 | 53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K |
4 | 28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K |
In [9]
已复制!
learner = ydf.RandomForestLearner(label="income")
evaluation = learner.cross_validation(dataset, folds=10)
evaluation
learner = ydf.RandomForestLearner(label="income") evaluation = learner.cross_validation(dataset, folds=10) evaluation
[INFO 23-11-01 14:14:30.9654 CET dataset.cc:440] max_vocab_count = -1 for column income, the dictionary will not be pruned by size.