Numpy 数组¶
设置¶
输入 [ ]
已复制!
pip install ydf -U
pip install ydf -U
输入 [1]
已复制!
import ydf
import numpy as np
import ydf import numpy as np
输入 [2]
已复制!
number_of_examples = 10
dataset = {
"f1": np.random.uniform(size=number_of_examples),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10 dataset = { "f1": np.random.uniform(size=number_of_examples), "f2": np.random.uniform(size=number_of_examples), "l": np.random.randint(0, 2, size=number_of_examples), } dataset
输出[2]
{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995, 0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]), 'f2': array([0.53119829, 0.07066887, 0.367039 , 0.88090998, 0.76215773, 0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]), 'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}
然后,让我们训练一个模型并生成预测。
输入 [3]
已复制!
model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
Train model on 10 examples Model trained in 0:00:00.006883
输入 [4]
已复制!
model.predict(dataset)
model.predict(dataset)
输出[4]
array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962, 0.836666 , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ], dtype=float32)
如果您的输入数据是一个单独的 numpy 数组,只需将其包装成一个字典即可 :)。
训练示例可以是二维或一维 Numpy 数组。如果是二维数组,则第二维定义了不同的特征。这类似于分别馈送每个维度。
输入 [5]
已复制!
number_of_examples = 10
# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
"f1": np.random.uniform(size=(number_of_examples, 3)),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10 # "f1" 是一个大小为 [num_examples, 3] 的数组。YDF 将其视为一个具有 20 个维度的特征。 # "f2" 仍然是一个单维特征。 dataset = { "f1": np.random.uniform(size=(number_of_examples, 3)), "f2": np.random.uniform(size=number_of_examples), "l": np.random.randint(0, 2, size=number_of_examples), } dataset
输出[5]
{'f1': array([[0.77831876, 0.44491803, 0.06950368], [0.51402546, 0.35996753, 0.75910236], [0.35404616, 0.30025651, 0.50369477], [0.83403873, 0.61047313, 0.07814819], [0.38385037, 0.40671211, 0.47912743], [0.99550808, 0.93747089, 0.74900908], [0.13106712, 0.48648687, 0.77925262], [0.25118286, 0.34226331, 0.03312203], [0.5772139 , 0.03045939, 0.81802417], [0.27276707, 0.24643098, 0.62696742]]), 'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271, 0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]), 'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}
输入 [6]
已复制!
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset) model.predict(dataset)
Train model on 10 examples Model trained in 0:00:00.003045
输出[6]
array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964, 0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963], dtype=float32)