Numpy 数组¶

设置¶

输入 [ ]

已复制!

pip install ydf -U
pip install ydf -U

输入 [1]

已复制!

import ydf
import numpy as np
import ydf import numpy as np

Numpy¶

Numpy 数组非常适合用于训练和使用 YDF 模型。YDF 不直接接受 Numpy 数组，而是接受 Numpy 数组的字典。使用字典非常适合用于组织特征。

让我们定义一个数据集

输入 [2]

已复制!





number_of_examples = 10
dataset = {
    "f1": np.random.uniform(size=number_of_examples),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}

dataset
number_of_examples = 10 dataset = { "f1": np.random.uniform(size=number_of_examples), "f2": np.random.uniform(size=number_of_examples), "l": np.random.randint(0, 2, size=number_of_examples), } dataset

输出[2]

{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995,
        0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]),
 'f2': array([0.53119829, 0.07066887, 0.367039  , 0.88090998, 0.76215773,
        0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]),
 'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}

然后，让我们训练一个模型并生成预测。

输入 [3]

已复制!

model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)

Train model on 10 examples
Model trained in 0:00:00.006883

输入 [4]

已复制!

model.predict(dataset)
model.predict(dataset)

输出[4]

array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962,
       0.836666  , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ],
      dtype=float32)

如果您的输入数据是一个单独的 numpy 数组，只需将其包装成一个字典即可 :)。

训练示例可以是二维或一维 Numpy 数组。如果是二维数组，则第二维定义了不同的特征。这类似于分别馈送每个维度。

输入 [5]

已复制!





number_of_examples = 10

# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
    "f1": np.random.uniform(size=(number_of_examples, 3)),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10 # "f1" 是一个大小为 [num_examples, 3] 的数组。YDF 将其视为一个具有 20 个维度的特征。 # "f2" 仍然是一个单维特征。 dataset = { "f1": np.random.uniform(size=(number_of_examples, 3)), "f2": np.random.uniform(size=number_of_examples), "l": np.random.randint(0, 2, size=number_of_examples), } dataset

输出[5]

{'f1': array([[0.77831876, 0.44491803, 0.06950368],
        [0.51402546, 0.35996753, 0.75910236],
        [0.35404616, 0.30025651, 0.50369477],
        [0.83403873, 0.61047313, 0.07814819],
        [0.38385037, 0.40671211, 0.47912743],
        [0.99550808, 0.93747089, 0.74900908],
        [0.13106712, 0.48648687, 0.77925262],
        [0.25118286, 0.34226331, 0.03312203],
        [0.5772139 , 0.03045939, 0.81802417],
        [0.27276707, 0.24643098, 0.62696742]]),
 'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271,
        0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]),
 'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}

输入 [6]

已复制!

model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset) model.predict(dataset)

Train model on 10 examples
Model trained in 0:00:00.003045

输出[6]

array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964,
       0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963],
      dtype=float32)