在 C++ 中¶

设置¶

In [ ]

已复制！

pip install ydf -U
pip install ydf -U

在 C++ 中服务¶

YDF 模型可以通过 C++ 库直接在 C++ 中提供服务。由于 Python API 和 C++ API 共享相同的服务代码，模型是完全跨平台兼容的。

使用 C++ 服务的优点

优化的推理速度：C++ API 提供对服务代码的完全控制，可用于从 YDF 中榨取每一纳秒的性能。
优化的二进制大小：由于 C++ 服务代码不依赖于训练代码，因此只需要链接一小部分 YDF。

何时不使用 C++ API

C++ API 不如 Python API 易于使用。
如果存在预处理，必须在 C++ 中重新生成。

训练一个小模型¶

下一个单元格将创建一个非常小的 YDF 模型。

In [2]

已复制！

# Load libraries
import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/iris.csv")
label = "class"

model = ydf.RandomForestLearner(label=label, num_trees=10).train(train_ds)

model.describe()
# Load libraries import ydf # Yggdrasil Decision Forests import pandas as pd # We use Pandas to load small datasets # Download a classification dataset and load it as a Pandas DataFrame. ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset" train_ds = pd.read_csv(f"{ds_path}/iris.csv") label = "class" model = ydf.RandomForestLearner(label=label, num_trees=10).train(train_ds) model.describe()

Train model on 150 examples
Model trained in 0:00:00.003721

Out[2]

名称 : RANDOM_FOREST
任务 : CLASSIFICATION
标签 : class
特征 (4) : Sepal.Length Sepal.Width Petal.Length Petal.Width
权重 : 无
使用调优器训练 : 否
模型大小 : 29 kB

Number of records: 150
Number of columns: 5

Number of columns by type:
	NUMERICAL: 4 (80%)
	CATEGORICAL: 1 (20%)

Columns:

NUMERICAL: 4 (80%)
	1: "Sepal.Length" NUMERICAL mean:5.84333 min:4.3 max:7.9 sd:0.825301
	2: "Sepal.Width" NUMERICAL mean:3.05733 min:2 max:4.4 sd:0.434411
	3: "Petal.Length" NUMERICAL mean:3.758 min:1 max:6.9 sd:1.7594
	4: "Petal.Width" NUMERICAL mean:1.19933 min:0.1 max:2.5 sd:0.759693

CATEGORICAL: 1 (20%)
	0: "class" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"setosa" 50 (33.3333%)

Terminology:
	nas: Number of non-available (i.e. missing) values.
	ood: Out of dictionary.
	manually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred.
	tokenized: The attribute value is obtained through tokenization.
	has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
	vocab-size: Number of unique values.

以下评估是在验证集或袋外数据集上计算的。

Number of predictions (without weights): 149
Number of predictions (with weights): 149
Task: CLASSIFICATION
Label: class

Accuracy: 0.919463  CI95[W][0.872779 0.952873]
LogLoss: : 0.798053
ErrorRate: : 0.0805369

Default Accuracy: : 0.33557
Default LogLoss: : 1.09857
Default ErrorRate: : 0.66443

Confusion Table:
truth\prediction
            setosa  versicolor  virginica
    setosa      50           0          0
versicolor       0          47          3
 virginica       0           9         40
Total: 149

变量重要性衡量输入特征对模型的重要性。

    1. "Petal.Length"  0.595238 ################
    2.  "Petal.Width"  0.578035 ###############
    3.  "Sepal.Width"  0.280786 
    4. "Sepal.Length"  0.279107

    1. "Petal.Length"  5.000000 
    2.  "Petal.Width"  5.000000

    1. "Petal.Length" 18.000000 ################
    2.  "Petal.Width" 15.000000 ############
    3.  "Sepal.Width"  5.000000 ##
    4. "Sepal.Length"  3.000000

    1. "Petal.Length" 870.339292 ################
    2.  "Petal.Width" 676.225185 ############
    3.  "Sepal.Width" 12.636705 
    4. "Sepal.Length" 12.459391

这些变量重要性是在训练期间计算的。在分析测试数据集上的模型时，可以获得更多（可能更有信息量）的变量重要性。

树的数量 : 10

只打印第一棵树。

Tree #0:
    "Petal.Length">=2.6 [s:0.673012 n:150 np:90 miss:1] ; val:"setosa" prob:[0.4, 0.266667, 0.333333]
        ├─(pos)─ "Petal.Width">=1.75 [s:0.512546 n:90 np:45 miss:0] ; val:"virginica" prob:[0, 0.444444, 0.555556]
        |        ├─(pos)─ val:"virginica" prob:[0, 0, 1]
        |        └─(neg)─ "Petal.Length">=4.95 [s:0.139839 n:45 np:7 miss:0] ; val:"versicolor" prob:[0, 0.888889, 0.111111]
        |                 ├─(pos)─ val:"virginica" prob:[0, 0.428571, 0.571429]
        |                 └─(neg)─ "Sepal.Length">=5.55 [s:0.0505512 n:38 np:32 miss:1] ; val:"versicolor" prob:[0, 0.973684, 0.0263158]
        |                          ├─(pos)─ val:"versicolor" prob:[0, 1, 0]
        |                          └─(neg)─ val:"versicolor" prob:[0, 0.833333, 0.166667]
        └─(neg)─ val:"setosa" prob:[1, 0, 0]

生成 C++ 代码¶

使用 model.to_cpp()，YDF 会创建一个可用的 C++ 文件，可以导入到现有的 C++ 项目中。C++ 代码的命名空间由 key= 参数控制。

In [3]

已复制！

# Save the model code to model.h and display it
with open("ydf_tutorial_model.h", "w") as f:
  f.write(model.to_cpp(key="ydf_tutorial"))

!cat ydf_tutorial_model.h
# Save the model code to model.h and display it with open("ydf_tutorial_model.h", "w") as f: f.write(model.to_cpp(key="ydf_tutorial")) !cat ydf_tutorial_model.h

// Automatically generated code running an Yggdrasil Decision Forests model in
// C++. This code was generated with "model.to_cpp()".
//
// Date of generation: 2023-11-01 13:06:59.075973
// YDF Version: 0.0.3
//
// How to use this code:
//
// 1. Copy this code in a new .h file.
// 2. If you use Bazel/Blaze, use the following dependencies:
//      //third_party/absl/status:statusor
//      //third_party/absl/strings
//      //external/ydf_cc/yggdrasil_decision_forests/api:serving
// 3. In your existing code, include the .h file and do:
//   // Load the model (to do only once).
//   namespace ydf = yggdrasil_decision_forests;
//   const auto model = ydf::exported_model_123::Load(<path to model>);
//   // Run the model
//   predictions = model.Predict();
// 4. By default, the "Predict" function takes no inputs and creates fake
//   examples. In practice, you want to add your input data as arguments to
//   "Predict" and call "examples->Set..." functions accordingly.
// 4. (Bonus)
//   Allocate one "examples" and "predictions" per thread and reuse them to
//   speed-up the inference.
//
#ifndef YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial
#define YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial

#include <memory>
#include <vector>

#include "third_party/absl/status/statusor.h"
#include "third_party/absl/strings/string_view.h"
#include "external/ydf_cc/yggdrasil_decision_forests/api/serving.h"

namespace yggdrasil_decision_forests {
namespace exported_model_ydf_tutorial {

struct ServingModel {
  std::vector<float> Predict() const;

  // Compiled model.
  std::unique_ptr<serving_api::FastEngine> engine;

  // Index of the input features of the model.
  //
  // Non-owning pointer. The data is owned by the engine.
  const serving_api::FeaturesDefinition* features;

  // Number of output predictions for each example.
  // Equal to 1 for regression, ranking and binary classification with compact
  // format. Equal to the number of classes for classification.
  int NumPredictionDimension() const {
    return engine->NumPredictionDimension();
  }

  // Indexes of the input features.
  serving_api::NumericalFeatureId feature_Sepal_Length;
  serving_api::NumericalFeatureId feature_Sepal_Width;
  serving_api::NumericalFeatureId feature_Petal_Length;
  serving_api::NumericalFeatureId feature_Petal_Width;
};

// TODO: Pass input feature values to "Predict".
inline std::vector<float> ServingModel::Predict() const {
  // Allocate memory for 2 examples. Alternatively, for speed-sensitive code,
  // an "examples" object can be allocated for each thread and reused. It is
  // okay to allocate more examples than needed.
  const int num_examples = 2;
  auto examples = engine->AllocateExamples(num_examples);

  // Set all the values to be missing. The values may then be overridden by the
  // "Set*" methods. If all the values are set with "Set*" methods,
  // "FillMissing" can be skipped.
  examples->FillMissing(*features);

  // Example #0
  examples->SetNumerical(/*example_idx=*/0, feature_Sepal_Length, 1.f, *features);
  examples->SetNumerical(/*example_idx=*/0, feature_Sepal_Width, 1.f, *features);
  examples->SetNumerical(/*example_idx=*/0, feature_Petal_Length, 1.f, *features);
  examples->SetNumerical(/*example_idx=*/0, feature_Petal_Width, 1.f, *features);

  // Example #1
  examples->SetNumerical(/*example_idx=*/1, feature_Sepal_Length, 2.f, *features);
  examples->SetNumerical(/*example_idx=*/1, feature_Sepal_Width, 2.f, *features);
  examples->SetNumerical(/*example_idx=*/1, feature_Petal_Length, 2.f, *features);
  examples->SetNumerical(/*example_idx=*/1, feature_Petal_Width, 2.f, *features);

  // Run the model on the two examples.
  //
  // For speed-sensitive code, reuse the same predictions.
  std::vector<float> predictions;
  engine->Predict(*examples, num_examples, &predictions);
  return predictions;
}

inline absl::StatusOr<ServingModel> Load(absl::string_view path) {
  ServingModel m;

  // Load the model
  ASSIGN_OR_RETURN(auto model, serving_api::LoadModel(path));

  // Compile the model into an inference engine.
  ASSIGN_OR_RETURN(m.engine, model->BuildFastEngine());

  // Index the input features of the model.
  m.features = &m.engine->features();

  // Index the input features.
  ASSIGN_OR_RETURN(m.feature_Sepal_Length, m.features->GetNumericalFeatureId("Sepal.Length"));
  ASSIGN_OR_RETURN(m.feature_Sepal_Width, m.features->GetNumericalFeatureId("Sepal.Width"));
  ASSIGN_OR_RETURN(m.feature_Petal_Length, m.features->GetNumericalFeatureId("Petal.Length"));
  ASSIGN_OR_RETURN(m.feature_Petal_Width, m.features->GetNumericalFeatureId("Petal.Width"));

  return m;
}

}  // namespace exported_model_ydf_tutorial
}  // namespace yggdrasil_decision_forests

#endif  // YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial

使用 C++ 代码。¶

要在项目中使用 C++ 代码，请按照以下步骤操作。

如果您使用 Bazel/Blaze，请创建一个包含依赖项的规则

      //third_party/absl/status:statusor,
      //third_party/absl/strings,
      //third_party/yggdrasil_decision_forests/api:serving,

在您的 C++ 代码中，包含 .h 文件并使用以下方式调用模型

    // Load the model (to do only once).
    namespace ydf = yggdrasil_decision_forests;
    const auto model = ydf::exported_model_ydf_tutorial::Load(<path to model>);
    // Run the model
    predictions = model.Predict();

生成的 "Predict" 函数不接受输入。相反，它用占位符值填充输入特征。因此，您需要将您的输入作为参数添加到 "Predict" 函数中，并相应地使用它来填充 "examples->Set..." 部分。

进一步改进¶

您可以通过为运行模型的每个线程预先分配和重用示例和预测来进一步优化推理速度。