预测(Prediction)/推理(Inference)(完结)
- 引言
- 完整代码:
引言
虽然我们经常将大部分时间花在训练和测试模型上,但我们这样做的核心原因是希望有一个能够接受新输入并生成期望输出的模型。这通常需要多次尝试训练最优模型,保存该模型,并加载已保存的模型进行推断或预测。
以Fashion MNIST分类为例,我们希望加载一个已训练的模型,展示从未见过的图像,并让它预测正确的分类。为此,我们将在Model
类中添加一个新的predict
方法:
python"> # Predicts on the samples
def predict(self, X, *, batch_size=None):
请注意,我们使用可能的batch_size
对
X
X
X进行预测。这意味着所有预测,包括仅对一个样本的预测,仍将作为样本列表输入,以NumPy数组的形式,其中第一维是样本列表,第二维是样本数据。例如,如果我们想对一张图像进行预测,仍然需要创建一个NumPy数组,模拟一个包含单个样本的列表——形状为(1, 784),其中1表示该单个样本,784表示样本中的特征数量(每张图像的像素数)。与evaluate
方法类似,我们将计算计划进行的步骤数量:
python"> # Default value if batch size is not being set
prediction_steps = 1
# Calculate number of steps
if batch_size is not None:
prediction_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if prediction_steps * batch_size < len(X):
prediction_steps += 1
然后创建一个列表,我们将在其中填充预测结果:
python"> # Model outputs
output = []
我们将遍历批次,将样本传递到网络中进行预测,并使用预测结果填充输出:
python"> # Iterate over steps
for step in range(prediction_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
batch_output = self.forward(batch_X, training=False)
# Append batch prediction to the list of predictions
output.append(batch_output)
运行此方法后,输出是一个批次预测的列表。每个预测都是一个NumPy数组,是对输入数据数组中的一批样本进行预测所得的部分结果。任何将使用我们模型推断输出的应用程序或程序,只需传入一个样本列表并获得一个预测列表(两者均以之前提到的NumPy数组形式表示)。
由于我们不专注于训练,在预测中使用批次只是为了确保模型能够适应内存,但我们获得的返回结果也是批次预测的形式。以下是一个简单的示例:
python">import numpy as np
output = []
b = np.array([[1, 2], [3, 4]])
output.append(b)
b = np.array([[5, 6], [7, 8]])
output.append(b)
b = np.array([[9, 10], [11, 12]])
output.append(b)
print(output)
python">>>>
[array([[1, 2],
[3, 4]]), array([[5, 6],
[7, 8]]), array([[ 9, 10],
[11, 12]])]
在这个示例中,我们看到输出的批次大小为2,总共有6个样本。输出是一个数组列表,每个数组包含一批预测结果。而我们希望得到的是一个包含所有预测结果的列表,而不是分批次的结果。为此,我们将使用NumPy的vstack
方法:
python">import numpy as np
output = []
b = np.array([[1, 2], [3, 4]])
output.append(b)
b = np.array([[5, 6], [7, 8]])
output.append(b)
b = np.array([[9, 10], [11, 12]])
output.append(b)
output = np.vstack(output)
print(output)
python">>>>
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]
[11 12]]
它接收一个对象列表,并在可能的情况下将它们堆叠起来,创建一个同质数组。这是当我们传入一个样本列表时,predict
方法返回的更理想的形式。使用纯Python,我们可能只是每一步将结果添加到列表中:
python">output = []
b = [[1, 2], [3, 4]]
output += b
b = [[5, 6], [7, 8]]
output += b
b = [[9, 10], [11, 12]]
output += b
print(output)
python">>>>
[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]]
我们将结果添加到一个列表中,并在最后将它们堆叠起来,而不是在每个批次中将结果附加到NumPy数组中,以避免性能损失。与纯Python不同,NumPy是用C语言编写的,其数据对象在内存中的创建方式有所不同。这意味着没有一种简单的方法可以将数据添加到现有的NumPy数组中,除了合并两个数组并将结果保存为一个新数组。但这会导致性能损失,因为随着预测的进行,结果数组会变得越来越大。
最快且最优化的方式是将NumPy数组附加到一个列表中,当我们收集了所有部分结果后,再一次性将它们垂直堆叠。我们将在返回的输出末尾添加np.vstack
:
python"> # Stack and return results
return np.vstack(output)
整个predict
方法:
python"> # Predicts on the samples
def predict(self, X, *, batch_size=None):
# Default value if batch size is not being set
prediction_steps = 1
# Calculate number of steps
if batch_size is not None:
prediction_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if prediction_steps * batch_size < len(X):
prediction_steps += 1
# Model outputs
output = []
# Iterate over steps
for step in range(prediction_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
batch_output = self.forward(batch_X, training=False)
# Append batch prediction to the list of predictions
output.append(batch_output)
# Stack and return results
return np.vstack(output)
现在我们可以加载模型并测试预测功能:
python"># Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Scale and reshape samples
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the first 5 samples from validation dataset
# and print the result
confidences = model.predict(X_test[:5])
print(confidences)
python">>>>
[[9.47225571e-01 2.52310792e-06 4.26566275e-03 1.14208065e-04
5.60502713e-07 1.03709858e-07 4.83857058e-02 9.79777681e-09
5.65434220e-06 6.24423624e-09]
[7.76644230e-01 5.56645566e-04 1.82817469e-03 2.07056459e-02
4.91867031e-05 1.62446213e-07 2.00205415e-01 2.78994799e-10
1.05146655e-05 1.24752910e-08]
[9.96223211e-01 3.88239574e-09 4.89091559e-04 1.81238247e-05
1.49976700e-06 7.25310034e-10 3.26809939e-03 1.81895521e-09
1.49130344e-08 6.69718003e-10]
[9.98704791e-01 1.77900521e-08 5.24727257e-05 4.83505391e-06
1.02738902e-07 5.13466492e-10 1.23780814e-03 9.31118738e-09
2.84552026e-09 6.17795770e-09]
[8.52106988e-01 5.32999422e-07 4.70034749e-04 1.28197280e-04
9.89067530e-07 9.23007946e-08 1.47292748e-01 8.85645761e-08
1.79957738e-07 2.20160018e-07]]
看起来工作正常!经过大量时间训练并找到最佳超参数后,人们常见的问题是如何实际使用模型。需要提醒的是,输出中的每个子数组是一个置信向量,其中包含每个类别的置信度指标。
在这种情况下,我们需要做的第一件事是获取这些置信向量的argmax
值。回想一下,我们使用的是softmax分类器,因此这个神经网络尝试拟合的是独热向量(one-hot vectors),其中正确的类别用1表示,其他类别用0表示。在进行推断时,通常很难达到如此完美的结果,但我们通过输出中最高值对应的索引来确定模型的预测类别;这就是我们使用argmax
的原因。
虽然我们可以编写代码来实现这一点,但实际上我们已经在所有激活函数类中添加了一个predictions
方法来完成这个功能:
python"># Softmax activation
class Activation_Softmax:
...
# Calculate predictions for outputs
def predictions(self, outputs):
return np.argmax(outputs, axis=1)
我们还在模型中设置了一个属性,用于存储输出层的激活函数,这意味着我们可以通过以下方式通用地获取预测结果:
python"># Load the model
model = Model.load('fashion_mnist.model')
# Predict on the first 5 samples from validation dataset
# and print the result
confidences = model.predict(X_test[:5])
predictions = model.output_layer_activation.predictions(confidences)
print(predictions)
# Print first 5 labels
print(y_test[:5])
python">>>>
[0 0 0 0 0]
[0 0 0 0 0]
在这个例子中,我们的模型预测的全是“类别0”,而我们的测试标签也全是类别0。由于对测试数据进行打乱并不是必须的,因此我们从未对其进行打乱,所以它们的顺序与训练数据一样保持原始顺序。这解释了为什么所有预测结果都是0。
在实际应用中,我们并不关心某个类别的编号,而是想知道它具体是什么。在这个例子中,类别编号直接映射到名称,因此我们在代码中添加以下字典:
python">fashion_mnist_labels = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot'
}
然后,我们就可以通过执行来获得字符串分类:
python">for prediction in predictions:
print(fashion_mnist_labels[prediction])
python">>>>
T-shirt/top
T-shirt/top
T-shirt/top
T-shirt/top
T-shirt/top
这很好,但我们仍然需要实际对某些内容进行预测,而不是使用训练数据。在讨论深度学习时,训练步骤通常成为关注的重点;我们希望看到准确率和损失指标表现良好!对于旨在向人们展示如何使用框架的教程来说,专注于训练效果很好,但我们发现的一个更大痛点是如何将模型应用于生产环境,或者只是对从外部获取的新数据进行预测(尤其是外部数据很少会被格式化成与你的训练数据完全匹配的形式)。
目前,我们有一个在服装项目上训练的模型,因此我们需要一些真正的新样本。幸运的是,你很可能是一个拥有衣服的人;如果是这样,你可以从拍摄这些衣服的照片开始。如果不是,可以使用以下示例照片:
你也可以尝试手绘类似这样的样本。一旦你有了希望在生产环境中使用的新图像/样本,就需要以与训练样本相同的方式对它们进行预处理。有些更改相对难以遗忘,比如图像分辨率或颜色通道数;如果我们不做这些处理,程序会报错。
让我们通过加载图像来开始预处理。我们将使用cv2
包来读取图像(记得保存tshirt.png
和pants.png
这两张图片到本地文件):
python">import cv2
image_data = cv2.imread('tshirt.png', cv2.IMREAD_UNCHANGED)
我们可以查看图像:
python">import matplotlib.pyplot as plt
plt.imshow(cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB))
plt.show()
请注意,我们使用cv2.cvtColor
是因为OpenCV默认使用BGR(蓝、绿、红像素值)颜色格式,而matplotlib使用RGB(红、绿、蓝)格式,因此我们需要转换颜色映射来显示图像。
首先,我们将以灰度模式读取这张图像,而不是RGB模式。这与Fashion MNIST图像的处理不同,后者已经是灰度图像,并且我们在使用cv2.imread()
时传入了cv2.IMREAD_UNCHANGED
参数,告知OpenCV我们的意图是读取灰度且不变的图像。然而,这里我们有一张彩色图像,cv2.IMREAD_UNCHANGED
参数不起作用,因为“不变”意味着包含所有颜色;因此,我们将使用cv2.IMREAD_GRAYSCALE
来强制在读取图像时进行灰度处理:
python">import cv2
image_data = cv2.imread('tshirt.png', cv2.IMREAD_GRAYSCALE)
然后我们就可以显示它了:
python">import matplotlib.pyplot as plt
plt.imshow(image_data, cmap='gray')
plt.show()
请注意,我们在使用plt.imshow()
时,通过将'gray'
参数传递给cmap
参数来使用灰度颜色映射。结果是一张灰度图像:
接下来,我们将调整图像大小,使其与训练数据相同,为28x28的分辨率:
python">image_data = cv2.resize(image_data, (28, 28))
然后,我们就会显示这张调整过大小的图片:
python">plt.imshow(image_data, cmap='gray')
plt.show()
接下来,我们将对图像进行展平和缩放操作。虽然缩放操作与训练数据相同,但展平操作略有不同;我们这里不是一组图像,而是一张单独的图像。正如之前解释的,单张图像必须作为包含该图像的列表传入。我们通过对图像应用.reshape(1, -1)
来展平,其中参数1
表示样本数量,-1
将图像展平成长度为784的向量。这将生成一个1x784的数组,其中包含我们的一个样本和784个特征(即28x28像素):
python">image_data = (image_data.reshape(1, -1).astype(np.float32) - 127.5) / 127.5
现在,我们可以加载模型并对图像数据进行预测:
python"># Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
到此为止,我们的代码已经完成了加载、预处理和预测:
python">fashion_mnist_labels = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot'
}
# Read an image
image_data = cv2.imread('tshirt.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
请注意,我们使用predictions[0]
是因为我们以列表的形式传入了一张图像,模型返回的是包含单个预测的列表。
只有一个问题……
python">>>>
Ankle boot
有什么问题?让我们将当前预处理的图像与训练数据进行比较:
python">mnist_image = cv2.imread('fashion_mnist_images/train/0/0000.png', cv2.IMREAD_UNCHANGED)
plt.imshow(mnist_image, cmap='gray')
plt.show()
现在,我们将原始图像和示例训练图像与我们的图像进行比较:
我们使用的训练数据是颜色反转的(即背景是黑色而不是白色,等等)。为了在缩放之前反转我们的图像,我们可以直接使用像素数学,而不是使用OpenCV。我们将所有像素值从最大像素值255中减去。例如,值为0的像素将变为 255 − 0 = 255 255 - 0 = 255 255−0=255,值为255的像素将变为 255 − 255 = 0 255 - 255 = 0 255−255=0。
python">image_data = 255 - image_data
稍作改动后,我们的预测代码就变成了:
python"># Read an image
image_data = cv2.imread('tshirt.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
python">>>>
T-shirt/top
现在它可以正常工作了!它现在能正常工作而之前不能的原因在于Dense层的工作方式——它们学习特征(在这种情况下是像素)值及其之间的关联性。这与卷积层形成对比,卷积层被训练来发现和理解图像上的特征(不是作为数据输入节点的特征,而是实际的特征/属性,例如线条和曲线)。
由于像素值差异很大,模型在这种情况下错误地做出了“猜测”。而卷积层可能在这种情况下能够正确预测,因为它可以直接处理图像特征。
试试裤子:
python">image_data = cv2.imread('pants.png', cv2.IMREAD_UNCHANGED)
plt.imshow(cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB))
plt.show()
现在我们进行预处理:
python"># Read an image
image_data = cv2.imread('pants.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
看看我们有什么:
plt.imshow(image_data, cmap=‘gray’)
plt.show()
编写我们的代码:
python"># Label index to label name relation
fashion_mnist_labels = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot'
}
# Read an image
image_data = cv2.imread('pants.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) -
127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
python">>>>
Trouser
又一次成功!我们现在已经为模型编写了最后一个功能,这也标志着本书中所涵盖主题的完整闭环。
完整代码:
python">import numpy as np
import cv2
import os
import pickle
import copy
import matplotlib.pyplot as plt
# Loads a MNIST dataset
def load_mnist_dataset(dataset, path):
# Scan all the directories and create a list of labels
labels = os.listdir(os.path.join(path, dataset))
# Create lists for samples and labels
X = []
y = []
# For each label folder
for label in labels:
# And for each image in given folder
for file in os.listdir(os.path.join(path, dataset, label)):
# Read the image
image = cv2.imread(os.path.join(path, dataset, label, file), cv2.IMREAD_UNCHANGED)
# And append it and a label to the lists
X.append(image)
y.append(label)
# Convert the data to proper numpy arrays and return
return np.array(X), np.array(y).astype('uint8')
# MNIST dataset (train + test)
def create_data_mnist(path):
# Load both sets separately
X, y = load_mnist_dataset('train', path)
X_test, y_test = load_mnist_dataset('test', path)
# And return all the data
return X, y, X_test, y_test
import numpy as np
import nnfs
from nnfs.datasets import sine_data, spiral_data
import sys
nnfs.init()
# Dense layer
class Layer_Dense:
# Layer initialization
def __init__(self, n_inputs, n_neurons,
weight_regularizer_l1=0, weight_regularizer_l2=0,
bias_regularizer_l1=0, bias_regularizer_l2=0):
# Initialize weights and biases
# self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
# Set regularization strength
self.weight_regularizer_l1 = weight_regularizer_l1
self.weight_regularizer_l2 = weight_regularizer_l2
self.bias_regularizer_l1 = bias_regularizer_l1
self.bias_regularizer_l2 = bias_regularizer_l2
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Calculate output values from inputs, weights and biases
self.output = np.dot(inputs, self.weights) + self.biases
# Backward pass
def backward(self, dvalues):
# Gradients on parameters
self.dweights = np.dot(self.inputs.T, dvalues)
self.dbiases = np.sum(dvalues, axis=0, keepdims=True)
# Gradients on regularization
# L1 on weights
if self.weight_regularizer_l1 > 0:
dL1 = np.ones_like(self.weights)
dL1[self.weights < 0] = -1
self.dweights += self.weight_regularizer_l1 * dL1
# L2 on weights
if self.weight_regularizer_l2 > 0:
self.dweights += 2 * self.weight_regularizer_l2 * self.weights
# L1 on biases
if self.bias_regularizer_l1 > 0:
dL1 = np.ones_like(self.biases)
dL1[self.biases < 0] = -1
self.dbiases += self.bias_regularizer_l1 * dL1
# L2 on biases
if self.bias_regularizer_l2 > 0:
self.dbiases += 2 * self.bias_regularizer_l2 * self.biases
# Gradient on values
self.dinputs = np.dot(dvalues, self.weights.T)
# Retrieve layer parameters
def get_parameters(self):
return self.weights, self.biases
# Set weights and biases in a layer instance
def set_parameters(self, weights, biases):
self.weights = weights
self.biases = biases
# Dropout
class Layer_Dropout:
# Init
def __init__(self, rate):
# Store rate, we invert it as for example for dropout
# of 0.1 we need success rate of 0.9
self.rate = 1 - rate
# Forward pass
def forward(self, inputs, training):
# Save input values
self.inputs = inputs
# If not in the training mode - return values
if not training:
self.output = inputs.copy()
return
# Generate and save scaled mask
self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
# Apply mask to output values
self.output = inputs * self.binary_mask
# Backward pass
def backward(self, dvalues):
# Gradient on values
self.dinputs = dvalues * self.binary_mask
# Input "layer"
class Layer_Input:
# Forward pass
def forward(self, inputs, training):
self.output = inputs
# ReLU activation
class Activation_ReLU:
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Calculate output values from inputs
self.output = np.maximum(0, inputs)
# Backward pass
def backward(self, dvalues):
# Since we need to modify original variable,
# let's make a copy of values first
self.dinputs = dvalues.copy()
# Zero gradient where input values were negative
self.dinputs[self.inputs <= 0] = 0
# Calculate predictions for outputs
def predictions(self, outputs):
return outputs
# Softmax activation
class Activation_Softmax:
# Forward pass
def forward(self, inputs, training):
# Remember input values
self.inputs = inputs
# Get unnormalized probabilities
exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
# Normalize them for each sample
probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
self.output = probabilities
# Backward pass
def backward(self, dvalues):
# Create uninitialized array
self.dinputs = np.empty_like(dvalues)
# Enumerate outputs and gradients
for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
# Flatten output array
single_output = single_output.reshape(-1, 1)
# Calculate Jacobian matrix of the output and
jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
# Calculate sample-wise gradient
# and add it to the array of sample gradients
self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)
# Calculate predictions for outputs
def predictions(self, outputs):
return np.argmax(outputs, axis=1)
# Sigmoid activation
class Activation_Sigmoid:
# Forward pass
def forward(self, inputs, training):
# Save input and calculate/save output
# of the sigmoid function
self.inputs = inputs
self.output = 1 / (1 + np.exp(-inputs))
# Backward pass
def backward(self, dvalues):
# Derivative - calculates from output of the sigmoid function
self.dinputs = dvalues * (1 - self.output) * self.output
# Calculate predictions for outputs
def predictions(self, outputs):
return (outputs > 0.5) * 1
# Linear activation
class Activation_Linear:
# Forward pass
def forward(self, inputs, training):
# Just remember values
self.inputs = inputs
self.output = inputs
# Backward pass
def backward(self, dvalues):
# derivative is 1, 1 * dvalues = dvalues - the chain rule
self.dinputs = dvalues.copy()
# Calculate predictions for outputs
def predictions(self, outputs):
return outputs
# SGD optimizer
class Optimizer_SGD:
# Initialize optimizer - set settings,
# learning rate of 1. is default for this optimizer
def __init__(self, learning_rate=1., decay=0., momentum=0.):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.momentum = momentum
# Call once before any parameter updates
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
# Update parameters
def update_params(self, layer):
# If we use momentum
if self.momentum:
# If layer does not contain momentum arrays, create them
# filled with zeros
if not hasattr(layer, 'weight_momentums'):
layer.weight_momentums = np.zeros_like(layer.weights)
# If there is no momentum array for weights
# The array doesn't exist for biases yet either.
layer.bias_momentums = np.zeros_like(layer.biases)
# Build weight updates with momentum - take previous
# updates multiplied by retain factor and update with
# current gradients
weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights
layer.weight_momentums = weight_updates
# Build bias updates
bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiases
layer.bias_momentums = bias_updates
# Vanilla SGD updates (as before momentum update)
else:
weight_updates = -self.current_learning_rate * layer.dweights
bias_updates = -self.current_learning_rate * layer.dbiases
# Update weights and biases using either
# vanilla or momentum updates
layer.weights += weight_updates
layer.biases += bias_updates
# Call once after any parameter updates
def post_update_params(self):
self.iterations += 1
# Adagrad optimizer
class Optimizer_Adagrad:
# Initialize optimizer - set settings
def __init__(self, learning_rate=1., decay=0., epsilon=1e-7):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.epsilon = epsilon
# Call once before any parameter updates
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
# Update parameters
def update_params(self, layer):
# If layer does not contain cache arrays,
# create them filled with zeros
if not hasattr(layer, 'weight_cache'):
layer.weight_cache = np.zeros_like(layer.weights)
layer.bias_cache = np.zeros_like(layer.biases)
# Update cache with squared current gradients
layer.weight_cache += layer.dweights**2
layer.bias_cache += layer.dbiases**2
# Vanilla SGD parameter update + normalization
# with square rooted cache
layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)
layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)
# Call once after any parameter updates
def post_update_params(self):
self.iterations += 1
# RMSprop optimizer
class Optimizer_RMSprop:
# Initialize optimizer - set settings
def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, rho=0.9):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.epsilon = epsilon
self.rho = rho
# Call once before any parameter updates
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
# Update parameters
def update_params(self, layer):
# If layer does not contain cache arrays,
# create them filled with zeros
if not hasattr(layer, 'weight_cache'):
layer.weight_cache = np.zeros_like(layer.weights)
layer.bias_cache = np.zeros_like(layer.biases)
# Update cache with squared current gradients
layer.weight_cache = self.rho * layer.weight_cache + (1 - self.rho) * layer.dweights**2
layer.bias_cache = self.rho * layer.bias_cache + (1 - self.rho) * layer.dbiases**2
# Vanilla SGD parameter update + normalization
# with square rooted cache
layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)
layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)
# Call once after any parameter updates
def post_update_params(self):
self.iterations += 1
# Adam optimizer
class Optimizer_Adam:
# Initialize optimizer - set settings
def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, beta_1=0.9, beta_2=0.999):
self.learning_rate = learning_rate
self.current_learning_rate = learning_rate
self.decay = decay
self.iterations = 0
self.epsilon = epsilon
self.beta_1 = beta_1
self.beta_2 = beta_2
# Call once before any parameter updates
def pre_update_params(self):
if self.decay:
self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))
# Update parameters
def update_params(self, layer):
# If layer does not contain cache arrays,
# create them filled with zeros
if not hasattr(layer, 'weight_cache'):
layer.weight_momentums = np.zeros_like(layer.weights)
layer.weight_cache = np.zeros_like(layer.weights)
layer.bias_momentums = np.zeros_like(layer.biases)
layer.bias_cache = np.zeros_like(layer.biases)
# Update momentum with current gradients
layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights
layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases
# Get corrected momentum
# self.iteration is 0 at first pass
# and we need to start with 1 here
weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))
bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))
# Update cache with squared current gradients
layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2
layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2
# Get corrected cache
weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))
bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))
# Vanilla SGD parameter update + normalization
# with square rooted cache
layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)
layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)
# Call once after any parameter updates
def post_update_params(self):
self.iterations += 1
# Common loss class
class Loss:
# Regularization loss calculation
def regularization_loss(self):
# 0 by default
regularization_loss = 0
# Calculate regularization loss
# iterate all trainable layers
for layer in self.trainable_layers:
# L1 regularization - weights
# calculate only when factor greater than 0
if layer.weight_regularizer_l1 > 0:
regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
# L2 regularization - weights
if layer.weight_regularizer_l2 > 0:
regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
# L1 regularization - biases
# calculate only when factor greater than 0
if layer.bias_regularizer_l1 > 0:
regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
# L2 regularization - biases
if layer.bias_regularizer_l2 > 0:
regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
return regularization_loss
# Set/remember trainable layers
def remember_trainable_layers(self, trainable_layers):
self.trainable_layers = trainable_layers
# Calculates the data and regularization losses
# given model output and ground truth values
def calculate(self, output, y, *, include_regularization=False):
# Calculate sample losses
sample_losses = self.forward(output, y)
# Calculate mean loss
data_loss = np.mean(sample_losses)
# Add accumulated sum of losses and sample count
self.accumulated_sum += np.sum(sample_losses)
self.accumulated_count += len(sample_losses)
# If just data loss - return it
if not include_regularization:
return data_loss
# Return the data and regularization losses
return data_loss, self.regularization_loss()
# Calculates accumulated loss
def calculate_accumulated(self, *, include_regularization=False):
# Calculate mean loss
data_loss = self.accumulated_sum / self.accumulated_count
# If just data loss - return it
if not include_regularization:
return data_loss
# Return the data and regularization losses
return data_loss, self.regularization_loss()
# Reset variables for accumulated loss
def new_pass(self):
self.accumulated_sum = 0
self.accumulated_count = 0
# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
# Forward pass
def forward(self, y_pred, y_true):
# Number of samples in a batch
samples = len(y_pred)
# Clip data to prevent division by 0
# Clip both sides to not drag mean towards any value
y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
# Probabilities for target values -
# only if categorical labels
if len(y_true.shape) == 1:
correct_confidences = y_pred_clipped[
range(samples),
y_true
]
# Mask values - only for one-hot encoded labels
elif len(y_true.shape) == 2:
correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)
# Losses
negative_log_likelihoods = -np.log(correct_confidences)
return negative_log_likelihoods
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# Number of labels in every sample
# We'll use the first sample to count them
labels = len(dvalues[0])
# If labels are sparse, turn them into one-hot vector
if len(y_true.shape) == 1:
y_true = np.eye(labels)[y_true]
# Calculate gradient
self.dinputs = -y_true / dvalues
# Normalize gradient
self.dinputs = self.dinputs / samples
# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():
# # Creates activation and loss function objects
# def __init__(self):
# self.activation = Activation_Softmax()
# self.loss = Loss_CategoricalCrossentropy()
# # Forward pass
# def forward(self, inputs, y_true):
# # Output layer's activation function
# self.activation.forward(inputs)
# # Set the output
# self.output = self.activation.output
# # Calculate and return loss value
# return self.loss.calculate(self.output, y_true)
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# If labels are one-hot encoded,
# turn them into discrete values
if len(y_true.shape) == 2:
y_true = np.argmax(y_true, axis=1)
# Copy so we can safely modify
self.dinputs = dvalues.copy()
# Calculate gradient
self.dinputs[range(samples), y_true] -= 1
# Normalize gradient
self.dinputs = self.dinputs / samples
# Binary cross-entropy loss
class Loss_BinaryCrossentropy(Loss):
# Forward pass
def forward(self, y_pred, y_true):
# Clip data to prevent division by 0
# Clip both sides to not drag mean towards any value
y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
# Calculate sample-wise loss
sample_losses = -(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))
sample_losses = np.mean(sample_losses, axis=-1)
# Return losses
return sample_losses
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# Number of outputs in every sample
# We'll use the first sample to count them
outputs = len(dvalues[0])
# Clip data to prevent division by 0
# Clip both sides to not drag mean towards any value
clipped_dvalues = np.clip(dvalues, 1e-7, 1 - 1e-7)
# Calculate gradient
self.dinputs = -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs
# Normalize gradient
self.dinputs = self.dinputs / samples
# Mean Squared Error loss
class Loss_MeanSquaredError(Loss): # L2 loss
# Forward pass
def forward(self, y_pred, y_true):
# Calculate loss
sample_losses = np.mean((y_true - y_pred)**2, axis=-1)
# Return losses
return sample_losses
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# Number of outputs in every sample
# We'll use the first sample to count them
outputs = len(dvalues[0])
# Gradient on values
self.dinputs = -2 * (y_true - dvalues) / outputs
# Normalize gradient
self.dinputs = self.dinputs / samples
# Mean Absolute Error loss
class Loss_MeanAbsoluteError(Loss): # L1 loss
def forward(self, y_pred, y_true):
# Calculate loss
sample_losses = np.mean(np.abs(y_true - y_pred), axis=-1)
# Return losses
return sample_losses
# Backward pass
def backward(self, dvalues, y_true):
# Number of samples
samples = len(dvalues)
# Number of outputs in every sample
# We'll use the first sample to count them
outputs = len(dvalues[0])
# Calculate gradient
self.dinputs = np.sign(y_true - dvalues) / outputs
# Normalize gradient
self.dinputs = self.dinputs / samples
# Common accuracy class
class Accuracy:
# Calculates an accuracy
# given predictions and ground truth values
def calculate(self, predictions, y):
# Get comparison results
comparisons = self.compare(predictions, y)
# Calculate an accuracy
accuracy = np.mean(comparisons)
# Add accumulated sum of matching values and sample count
self.accumulated_sum += np.sum(comparisons)
self.accumulated_count += len(comparisons)
# Return accuracy
return accuracy
# Calculates accumulated accuracy
def calculate_accumulated(self):
# Calculate an accuracy
accuracy = self.accumulated_sum / self.accumulated_count
# Return the data and regularization losses
return accuracy
# Reset variables for accumulated accuracy
def new_pass(self):
self.accumulated_sum = 0
self.accumulated_count = 0
# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
# No initialization is needed
def init(self, y):
pass
# Compares predictions to the ground truth values
def compare(self, predictions, y):
if len(y.shape) == 2:
y = np.argmax(y, axis=1)
return predictions == y
# Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):
def __init__(self):
# Create precision property
self.precision = None
# Calculates precision value
# based on passed in ground truth
def init(self, y, reinit=False):
if self.precision is None or reinit:
self.precision = np.std(y) / 250
# Compares predictions to the ground truth values
def compare(self, predictions, y):
return np.absolute(predictions - y) < self.precision
# Model class
class Model:
def __init__(self):
# Create a list of network objects
self.layers = []
# Softmax classifier's output object
self.softmax_classifier_output = None
# Add objects to the model
def add(self, layer):
self.layers.append(layer)
# Set loss, optimizer and accuracy
def set(self, *, loss=None, optimizer=None, accuracy=None):
if loss is not None:
self.loss = loss
if optimizer is not None:
self.optimizer = optimizer
if accuracy is not None:
self.accuracy = accuracy
# Finalize the model
def finalize(self):
# Create and set the input layer
self.input_layer = Layer_Input()
# Count all the objects
layer_count = len(self.layers)
# Initialize a list containing trainable layers:
self.trainable_layers = []
# Iterate the objects
for i in range(layer_count):
# If it's the first layer,
# the previous layer object is the input layer
if i == 0:
self.layers[i].prev = self.input_layer
self.layers[i].next = self.layers[i+1]
# All layers except for the first and the last
elif i < layer_count - 1:
self.layers[i].prev = self.layers[i-1]
self.layers[i].next = self.layers[i+1]
# The last layer - the next object is the loss
# Also let's save aside the reference to the last object
# whose output is the model's output
else:
self.layers[i].prev = self.layers[i-1]
self.layers[i].next = self.loss
self.output_layer_activation = self.layers[i]
# If layer contains an attribute called "weights",
# it's a trainable layer -
# add it to the list of trainable layers
# We don't need to check for biases -
# checking for weights is enough
if hasattr(self.layers[i], 'weights'):
self.trainable_layers.append(self.layers[i])
# Update loss object with trainable layers
# self.loss.remember_trainable_layers(self.trainable_layers)
if self.loss is not None:
self.loss.remember_trainable_layers(self.trainable_layers)
# If output activation is Softmax and
# loss function is Categorical Cross-Entropy
# create an object of combined activation
# and loss function containing
# faster gradient calculation
if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
# Create an object of combined activation
# and loss functions
self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
# Train the model
# def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
def train(self, X, y, *, epochs=1, batch_size=None, print_every=1, validation_data=None):
# Initialize accuracy object
self.accuracy.init(y)
# Default value if batch size is not being set
train_steps = 1
# If there is validation data passed,
# set default number of steps for validation as well
if validation_data is not None:
validation_steps = 1
# For better readability
X_val, y_val = validation_data
# Calculate number of steps
if batch_size is not None:
train_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if train_steps * batch_size < len(X):
train_steps += 1
if validation_data is not None:
validation_steps = len(X_val) // batch_size
# Dividing rounds down. If there are some remaining
# data, but nor full batch, this won't include it
# Add `1` to include this not full batch
if validation_steps * batch_size < len(X_val):
validation_steps += 1
# Main training loop
for epoch in range(1, epochs+1):
# Print epoch number
print(f'epoch: {epoch}')
# Reset accumulated values in loss and accuracy objects
self.loss.new_pass()
self.accuracy.new_pass()
# Iterate over steps
for step in range(train_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
batch_y = y
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
batch_y = y[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
output = self.forward(batch_X, training=True)
# Calculate loss
data_loss, regularization_loss = self.loss.calculate(output, batch_y, include_regularization=True)
loss = data_loss + regularization_loss
# Get predictions and calculate an accuracy
predictions = self.output_layer_activation.predictions(output)
accuracy = self.accuracy.calculate(predictions, batch_y)
# Perform backward pass
self.backward(output, batch_y)
# Optimize (update parameters)
self.optimizer.pre_update_params()
for layer in self.trainable_layers:
self.optimizer.update_params(layer)
self.optimizer.post_update_params()
# Print a summary
if not step % print_every or step == train_steps - 1:
print(f'step: {step}, ' +
f'acc: {accuracy:.3f}, ' +
f'loss: {loss:.3f} (' +
f'data_loss: {data_loss:.3f}, ' +
f'reg_loss: {regularization_loss:.3f}), ' +
f'lr: {self.optimizer.current_learning_rate}')
# Get and print epoch loss and accuracy
epoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)
epoch_loss = epoch_data_loss + epoch_regularization_loss
epoch_accuracy = self.accuracy.calculate_accumulated()
print(f'training, ' +
f'acc: {epoch_accuracy:.3f}, ' +
f'loss: {epoch_loss:.3f} (' +
f'data_loss: {epoch_data_loss:.3f}, ' +
f'reg_loss: {epoch_regularization_loss:.3f}), ' +
f'lr: {self.optimizer.current_learning_rate}')
# If there is the validation data
if validation_data is not None:
# Evaluate the model:
self.evaluate(*validation_data, batch_size=batch_size)
# Performs forward pass
def forward(self, X, training):
# Call forward method on the input layer
# this will set the output property that
# the first layer in "prev" object is expecting
self.input_layer.forward(X, training)
# Call forward method of every object in a chain
# Pass output of the previous object as a parameter
for layer in self.layers:
layer.forward(layer.prev.output, training)
# "layer" is now the last object from the list,
# return its output
return layer.output
# Performs backward pass
def backward(self, output, y):
# If softmax classifier
if self.softmax_classifier_output is not None:
# First call backward method
# on the combined activation/loss
# this will set dinputs property
self.softmax_classifier_output.backward(output, y)
# Since we'll not call backward method of the last layer
# which is Softmax activation
# as we used combined activation/loss
# object, let's set dinputs in this object
self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
# Call backward method going through
# all the objects but last
# in reversed order passing dinputs as a parameter
for layer in reversed(self.layers[:-1]):
layer.backward(layer.next.dinputs)
return
# First call backward method on the loss
# this will set dinputs property that the last
# layer will try to access shortly
self.loss.backward(output, y)
# Call backward method going through all the objects
# in reversed order passing dinputs as a parameter
for layer in reversed(self.layers):
layer.backward(layer.next.dinputs)
# Evaluates the model using passed in dataset
def evaluate(self, X_val, y_val, *, batch_size=None):
# Default value if batch size is not being set
validation_steps = 1
# Calculate number of steps
if batch_size is not None:
validation_steps = len(X_val) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if validation_steps * batch_size < len(X_val):
validation_steps += 1
# Reset accumulated values in loss
# and accuracy objects
self.loss.new_pass()
self.accuracy.new_pass()
# Iterate over steps
for step in range(validation_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X_val
batch_y = y_val
# Otherwise slice a batch
else:
batch_X = X_val[step*batch_size:(step+1)*batch_size]
batch_y = y_val[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
output = self.forward(batch_X, training=False)
# Calculate the loss
self.loss.calculate(output, batch_y)
# Get predictions and calculate an accuracy
predictions = self.output_layer_activation.predictions(output)
self.accuracy.calculate(predictions, batch_y)
# Get and print validation loss and accuracy
validation_loss = self.loss.calculate_accumulated()
validation_accuracy = self.accuracy.calculate_accumulated()
# Print a summary
print(f'validation, ' +
f'acc: {validation_accuracy:.3f}, ' +
f'loss: {validation_loss:.3f}')
# Retrieves and returns parameters of trainable layers
def get_parameters(self):
# Create a list for parameters
parameters = []
# Iterable trainable layers and get their parameters
for layer in self.trainable_layers:
parameters.append(layer.get_parameters())
# Return a list
return parameters
# Updates the model with new parameters
def set_parameters(self, parameters):
# Iterate over the parameters and layers
# and update each layers with each set of the parameters
for parameter_set, layer in zip(parameters, self.trainable_layers):
layer.set_parameters(*parameter_set)
# Saves the parameters to a file
def save_parameters(self, path):
# Open a file in the binary-write mode
# and save parameters to it
with open(path, 'wb') as f:
pickle.dump(self.get_parameters(), f)
# Loads the weights and updates a model instance with them
def load_parameters(self, path):
# Open file in the binary-read mode,
# load weights and update trainable layers
with open(path, 'rb') as f:
self.set_parameters(pickle.load(f))
# Saves the model
def save(self, path):
# Make a deep copy of current model instance
model = copy.deepcopy(self)
# Reset accumulated values in loss and accuracy objects
model.loss.new_pass()
model.accuracy.new_pass()
# Remove data from input layer
# and gradients from the loss object
model.input_layer.__dict__.pop('output', None)
model.loss.__dict__.pop('dinputs', None)
# For each layer remove inputs, output and dinputs properties
for layer in model.layers:
for property in ['inputs', 'output', 'dinputs', 'dweights', 'dbiases']:
layer.__dict__.pop(property, None)
# Open a file in the binary-write mode and save the model
with open(path, 'wb') as f:
pickle.dump(model, f)
# Loads and returns a model
@staticmethod
def load(path):
# Open file in the binary-read mode, load a model
with open(path, 'rb') as f:
model = pickle.load(f)
# Return a model
return model
# Predicts on the samples
def predict(self, X, *, batch_size=None):
# Default value if batch size is not being set
prediction_steps = 1
# Calculate number of steps
if batch_size is not None:
prediction_steps = len(X) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if prediction_steps * batch_size < len(X):
prediction_steps += 1
# Model outputs
output = []
# Iterate over steps
for step in range(prediction_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X
# Otherwise slice a batch
else:
batch_X = X[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
batch_output = self.forward(batch_X, training=False)
# Append batch prediction to the list of predictions
output.append(batch_output)
# Stack and return results
return np.vstack(output)
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Scale and reshape samples
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the first 5 samples from validation dataset
# and print the result
confidences = model.predict(X_test[:5])
predictions = model.output_layer_activation.predictions(confidences)
print(predictions)
# Print first 5 labels
print(y_test[:5])
#############################################
## tshirt
fashion_mnist_labels = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot'
}
for prediction in predictions:
print(fashion_mnist_labels[prediction])
# Read an image
image_data = cv2.imread('tshirt.png', cv2.IMREAD_UNCHANGED)
plt.imshow(cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB))
plt.show()
# Read an image
image_data = cv2.imread('tshirt.png', cv2.IMREAD_GRAYSCALE)
plt.imshow(image_data, cmap='gray')
plt.show()
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
plt.imshow(image_data, cmap='gray')
plt.show()
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
mnist_image = cv2.imread('fashion_mnist_images/train/0/0000.png', cv2.IMREAD_UNCHANGED)
plt.imshow(mnist_image, cmap='gray')
plt.show()
# Read an image
image_data = cv2.imread('tshirt.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
#############################################
## pants
image_data = cv2.imread('pants.png', cv2.IMREAD_UNCHANGED)
plt.imshow(cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB))
plt.show()
# Read an image
image_data = cv2.imread('pants.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
plt.imshow(image_data, cmap='gray')
plt.show()
# Label index to label name relation
fashion_mnist_labels = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot'
}
# Read an image
image_data = cv2.imread('pants.png', cv2.IMREAD_GRAYSCALE)
# Resize to the same size as Fashion MNIST images
image_data = cv2.resize(image_data, (28, 28))
# Invert image colors
image_data = 255 - image_data
# Reshape and scale pixel data
image_data = (image_data.reshape(1, -1).astype(np.float32) -
127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Predict on the image
confidences = model.predict(image_data)
# Get prediction instead of confidence levels
predictions = model.output_layer_activation.predictions(confidences)
# Get label name from label index
prediction = fashion_mnist_labels[predictions[0]]
print(prediction)
本章的章节代码、更多资源和勘误表:https://nnfs.io/ch22
到此为止,整个系列更新完毕!感谢收看!