注意： TensorFlow Lite 現在是 Google AI Edge 的一部分。最新文件現已移至 ai.google.dev/edge/lite。瞭解詳情

訓練後 float16 量化

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

下載筆記本

總覽

TensorFlow Lite 現在支援在將模型從 TensorFlow 轉換為 TensorFlow Lite 的 FlatBuffer 格式期間，將權重轉換為 16 位元浮點數值。這會使模型大小縮減一半。部分硬體 (例如 GPU) 可以使用這種降低精度的算術在本機端運算，進而實現比傳統浮點執行更快的速度。Tensorflow Lite GPU 委派可以設定為以這種方式執行。然而，轉換為 float16 權重的模型仍然可以在 CPU 上執行，而無需額外修改：float16 權重會在第一次推論之前升採樣為 float32。這可在模型大小大幅縮減的情況下，將延遲時間和準確性降到最低。

在本教學課程中，您將從頭開始訓練 MNIST 模型、在 TensorFlow 中檢查其準確性，然後將模型轉換為使用 float16 量化的 Tensorflow Lite FlatBuffer。最後，檢查已轉換模型的準確性，並將其與原始 float32 模型進行比較。

建構 MNIST 模型

設定

import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import pathlib

訓練並匯出模型

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(
  train_images,
  train_labels,
  epochs=1,
  validation_data=(test_images, test_labels)
)

在此範例中，您只訓練模型一個週期，因此只訓練到約 96% 的準確性。

轉換為 TensorFlow Lite 模型

使用 TensorFlow Lite Converter，您現在可以將已訓練模型轉換為 TensorFlow Lite 模型。

現在使用 TFLiteConverter 載入模型

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

將其寫出到 .tflite 檔案

tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)

若要改為在匯出時將模型量化為 float16，請先設定 optimizations 旗標以使用預設最佳化。然後指定 float16 是目標平台上支援的類型

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

最後，像平常一樣轉換模型。請注意，根據預設，轉換後的模型仍會使用 float 輸入和輸出，以方便叫用。

tflite_fp16_model = converter.convert()
tflite_model_fp16_file = tflite_models_dir/"mnist_model_quant_f16.tflite"
tflite_model_fp16_file.write_bytes(tflite_fp16_model)

請注意，產生的檔案大小約為 1/2。

ls -lh {tflite_models_dir}

執行 TensorFlow Lite 模型

使用 Python TensorFlow Lite Interpreter 執行 TensorFlow Lite 模型。

將模型載入到 Interpreter

interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

interpreter_fp16 = tf.lite.Interpreter(model_path=str(tflite_model_fp16_file))
interpreter_fp16.allocate_tensors()

在單一圖片上測試模型

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, test_image)
interpreter.invoke()
predictions = interpreter.get_tensor(output_index)

import matplotlib.pylab as plt

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

test_image = np.expand_dims(test_images[0], axis=0).astype(np.float32)

input_index = interpreter_fp16.get_input_details()[0]["index"]
output_index = interpreter_fp16.get_output_details()[0]["index"]

interpreter_fp16.set_tensor(input_index, test_image)
interpreter_fp16.invoke()
predictions = interpreter_fp16.get_tensor(output_index)

plt.imshow(test_images[0])
template = "True:{true}, predicted:{predict}"
_ = plt.title(template.format(true= str(test_labels[0]),
                              predict=str(np.argmax(predictions[0]))))
plt.grid(False)

評估模型

# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

print(evaluate_model(interpreter))

重複在 float16 量化模型上進行評估以取得

# NOTE: Colab runs on server CPUs. At the time of writing this, TensorFlow Lite
# doesn't have super optimized server CPU kernels. For this reason this may be
# slower than the above float interpreter. But for mobile CPUs, considerable
# speedup can be observed.
print(evaluate_model(interpreter_fp16))

在本範例中，您已將模型量化為 float16，且準確性沒有差異。

也可以在 GPU 上評估 fp16 量化模型。若要使用降低精度的值執行所有算術運算，請務必在您的應用程式中建立 TfLiteGPUDelegateOptions 結構，並將 precision_loss_allowed 設定為 1，如下所示

//Prepare GPU delegate.
const TfLiteGpuDelegateOptions options = {
  .metadata = NULL,
  .compile_options = {
    .precision_loss_allowed = 1,  // FP16
    .preferred_gl_object_type = TFLITE_GL_OBJECT_TYPE_FASTEST,
    .dynamic_batch_enabled = 0,   // Not fully functional yet
  },
};

如需 TFLite GPU 委派以及如何在您的應用程式中使用委派的詳細文件，請參閱此處