量化感知訓練綜合指南

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

下載筆記本

歡迎使用 Keras 量化感知訓練綜合指南。

本頁面記錄各種使用情境，並說明如何在各種情境中使用 API。一旦您知道所需的 API，請在 API 文件中尋找參數和低階詳細資料。

如要瞭解量化感知訓練的優點和支援項目，請參閱總覽。
如需單一端對端範例，請參閱量化感知訓練範例。

涵蓋下列使用情境

按照下列步驟部署採用 8 位元量化的模型。
- 定義量化感知模型。
- 僅限 Keras HDF5 模型：使用特殊的檢查點和還原序列化邏輯。訓練方式則與標準訓練相同。
- 從量化感知模型建立量化模型。
試驗量化。
- 任何用於實驗的項目皆無支援的部署路徑。
- 自訂 Keras 層屬於實驗性質。

設定

為了尋找您需要的 API 並協助理解，您可以執行但不閱讀本節。

! pip install -q tensorflow
! pip install -q tensorflow-model-optimization

import tensorflow as tf
import numpy as np
import tensorflow_model_optimization as tfmot
import tf_keras as keras

import tempfile

input_shape = [20]
x_train = np.random.randn(1, 20).astype(np.float32)
y_train = keras.utils.to_categorical(np.random.randn(1), num_classes=20)

def setup_model():
  model = keras.Sequential([
      keras.layers.Dense(20, input_shape=input_shape),
      keras.layers.Flatten()
  ])
  return model

def setup_pretrained_weights():
  model= setup_model()

  model.compile(
      loss=keras.losses.categorical_crossentropy,
      optimizer='adam',
      metrics=['accuracy']
  )

  model.fit(x_train, y_train)

  _, pretrained_weights = tempfile.mkstemp('.tf')

  model.save_weights(pretrained_weights)

  return pretrained_weights

def setup_pretrained_model():
  model = setup_model()
  pretrained_weights = setup_pretrained_weights()
  model.load_weights(pretrained_weights)
  return model

setup_model()
pretrained_weights = setup_pretrained_weights()

2024-03-09 12:29:37.526315: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

定義量化感知模型

透過以下列方式定義模型，即可使用總覽頁面中列出的後端部署路徑。預設使用 8 位元量化。

量化整個模型

您的使用情境

不支援子類別模型。

提升模型準確度的秘訣

嘗試「量化部分層」，略過會大幅降低準確度的層。
一般來說，使用量化感知訓練進行微調，效果會比從頭開始訓練更好。

如要讓整個模型感知量化，請將 tfmot.quantization.keras.quantize_model 套用至模型。

base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy

quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)
quant_aware_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer (QuantizeLa  (None, 20)                3         
 yer)                                                            
                                                                 
 quant_dense_2 (QuantizeWra  (None, 20)                425       
 pperV2)                                                         
                                                                 
 quant_flatten_2 (QuantizeW  (None, 20)                1         
 rapperV2)                                                       
                                                                 
=================================================================
Total params: 429 (1.68 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 9 (36.00 Byte)
_________________________________________________________________

量化部分層

量化模型可能會對準確度產生負面影響。您可以選擇性地量化模型的層，以探索準確度、速度和模型大小之間的取捨。

您的使用情境

如要部署到僅適用於完全量化模型的後端 (例如 EdgeTPU v1、大多數 DSP)，請嘗試「量化整個模型」。

提升模型準確度的秘訣

一般來說，使用量化感知訓練進行微調，效果會比從頭開始訓練更好。
嘗試量化後面的層，而非前面的層。
避免量化關鍵層 (例如注意力機制)。

在以下範例中，僅量化 Dense 層。

# Create a base model
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy

# Helper function uses `quantize_annotate_layer` to annotate that only the 
# Dense layers should be quantized.
def apply_quantization_to_dense(layer):
  if isinstance(layer, keras.layers.Dense):
    return tfmot.quantization.keras.quantize_annotate_layer(layer)
  return layer

# Use `keras.models.clone_model` to apply `apply_quantization_to_dense` 
# to the layers of the model.
annotated_model = keras.models.clone_model(
    base_model,
    clone_function=apply_quantization_to_dense,
)

# Now that the Dense layers are annotated,
# `quantize_apply` actually makes the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
quant_aware_model.summary()

WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_1 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_3 (QuantizeWra  (None, 20)                425       
 pperV2)                                                         
                                                                 
 flatten_3 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 428 (1.67 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 8 (32.00 Byte)
_________________________________________________________________

雖然這個範例使用層的類型來決定要量化哪些層，但量化特定層最簡單的方式是設定其 name 屬性，並在 clone_function 中尋找該名稱。

print(base_model.layers[0].name)

dense_3

更易於閱讀，但模型準確度可能較低

這與使用量化感知訓練進行微調不相容，因此準確度可能會低於上述範例。

函式範例

# Use `quantize_annotate_layer` to annotate that the `Dense` layer
# should be quantized.
i = keras.Input(shape=(20,))
x = tfmot.quantization.keras.quantize_annotate_layer(keras.layers.Dense(10))(i)
o = keras.layers.Flatten()(x)
annotated_model = keras.Model(inputs=i, outputs=o)

# Use `quantize_apply` to actually make the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)

# For deployment purposes, the tool adds `QuantizeLayer` after `InputLayer` so that the
# quantized model can take in float inputs instead of only uint8.
quant_aware_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 20)]              0         
                                                                 
 quantize_layer_2 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_4 (QuantizeWra  (None, 10)                215       
 pperV2)                                                         
                                                                 
 flatten_4 (Flatten)         (None, 10)                0         
                                                                 
=================================================================
Total params: 218 (872.00 Byte)
Trainable params: 210 (840.00 Byte)
Non-trainable params: 8 (32.00 Byte)
_________________________________________________________________

序列範例

# Use `quantize_annotate_layer` to annotate that the `Dense` layer
# should be quantized.
annotated_model = keras.Sequential([
  tfmot.quantization.keras.quantize_annotate_layer(keras.layers.Dense(20, input_shape=input_shape)),
  keras.layers.Flatten()
])

# Use `quantize_apply` to actually make the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)

quant_aware_model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_3 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_5 (QuantizeWra  (None, 20)                425       
 pperV2)                                                         
                                                                 
 flatten_5 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 428 (1.67 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 8 (32.00 Byte)
_________________________________________________________________

檢查點和還原序列化

您的使用情境：此程式碼僅適用於 HDF5 模型格式 (不適用於 HDF5 權重或其他格式)。

# Define the model.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)

# Save or checkpoint the model.
_, keras_model_file = tempfile.mkstemp('.h5')
quant_aware_model.save(keras_model_file)

# `quantize_scope` is needed for deserializing HDF5 models.
with tfmot.quantization.keras.quantize_scope():
  loaded_model = keras.models.load_model(keras_model_file)

loaded_model.summary()

WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tf_keras/src/engine/training.py:3098: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native TF-Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_4 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_6 (QuantizeWra  (None, 20)                425       
 pperV2)                                                         
                                                                 
 quant_flatten_6 (QuantizeW  (None, 20)                1         
 rapperV2)                                                       
                                                                 
=================================================================
Total params: 429 (1.68 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 9 (36.00 Byte)
_________________________________________________________________

建立並部署量化模型

一般來說，請參考您將使用的部署後端的說明文件。

這是 TFLite 後端的範例。

base_model = setup_pretrained_model()
quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)

# Typically you train the model here.

converter = tf.lite.TFLiteConverter.from_keras_model(quant_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()

1/1 [==============================] - 1s 684ms/step - loss: 16.1181 - accuracy: 0.0000e+00
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://tensorflow.dev.org.tw/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._iterations
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._learning_rate
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.3
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer._variables.4
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpyo_u4d_8/assets
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmpyo_u4d_8/assets
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/lite/python/convert.py:964: UserWarning: Statistics for quantized inputs were expected, but not specified; continuing anyway.
  warnings.warn(
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1709987395.907073   23976 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format.
W0000 00:00:1709987395.907116   23976 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency.

試驗量化

您的使用情境：使用下列 API 表示沒有支援的部署路徑。例如，TFLite 轉換和核心實作僅支援 8 位元量化。這些功能也屬於實驗性質，不受回溯相容性規範約束。

設定：DefaultDenseQuantizeConfig

實驗需要使用 tfmot.quantization.keras.QuantizeConfig，此設定描述如何量化層的權重、啟動和輸出。

以下範例定義了與 API 預設值中 Dense 層相同的 QuantizeConfig。

在本範例的前向傳播期間，系統會以 layer.kernel 作為輸入，呼叫 get_weights_and_quantizers 中傳回的 LastValueQuantizer，並產生輸出。透過 set_quantize_weights 中定義的邏輯，輸出會取代 Dense 層原始前向傳播中的 layer.kernel。相同的概念也適用於啟動和輸出。

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
      return [(layer.kernel, LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
      return [(layer.activation, MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
      # Add this line for each item returned in `get_weights_and_quantizers`
      # , in the same order
      layer.kernel = quantize_weights[0]

    def set_quantize_activations(self, layer, quantize_activations):
      # Add this line for each item returned in `get_activations_and_quantizers`
      # , in the same order.
      layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
      return []

    def get_config(self):
      return {}

量化自訂 Keras 層

本範例使用 DefaultDenseQuantizeConfig 量化 CustomLayer。

在「試驗量化」使用情境中，套用設定的方式皆相同。

將 tfmot.quantization.keras.quantize_annotate_layer 套用至 CustomLayer，並傳入 QuantizeConfig。
使用 tfmot.quantization.keras.quantize_annotate_model 繼續使用 API 預設值量化模型的其餘部分。

quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope

class CustomLayer(keras.layers.Dense):
  pass

model = quantize_annotate_model(keras.Sequential([
   quantize_annotate_layer(CustomLayer(20, input_shape=(20,)), DefaultDenseQuantizeConfig()),
   keras.layers.Flatten()
]))

# `quantize_apply` requires mentioning `DefaultDenseQuantizeConfig` with `quantize_scope`
# as well as the custom Keras layer.
with quantize_scope(
  {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig,
   'CustomLayer': CustomLayer}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quant_aware_model = tfmot.quantization.keras.quantize_apply(model)

quant_aware_model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_6 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_custom_layer (Quanti  (None, 20)                425       
 zeWrapperV2)                                                    
                                                                 
 quant_flatten_9 (QuantizeW  (None, 20)                1         
 rapperV2)                                                       
                                                                 
=================================================================
Total params: 429 (1.68 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 9 (36.00 Byte)
_________________________________________________________________

修改量化參數

常見錯誤：通常將偏差量化為少於 32 位元，會對模型準確度造成過度損害。

本範例修改 Dense 層，使其權重使用 4 位元而非預設的 8 位元。模型的其餘部分繼續使用 API 預設值。

quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope

class ModifiedDenseQuantizeConfig(DefaultDenseQuantizeConfig):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
      return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]

在「試驗量化」使用情境中，套用設定的方式皆相同。

將 tfmot.quantization.keras.quantize_annotate_layer 套用至 Dense 層，並傳入 QuantizeConfig。
使用 tfmot.quantization.keras.quantize_annotate_model 繼續使用 API 預設值量化模型的其餘部分。

model = quantize_annotate_model(keras.Sequential([
   # Pass in modified `QuantizeConfig` to modify this Dense layer.
   quantize_annotate_layer(keras.layers.Dense(20, input_shape=(20,)), ModifiedDenseQuantizeConfig()),
   keras.layers.Flatten()
]))

# `quantize_apply` requires mentioning `ModifiedDenseQuantizeConfig` with `quantize_scope`:
with quantize_scope(
  {'ModifiedDenseQuantizeConfig': ModifiedDenseQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quant_aware_model = tfmot.quantization.keras.quantize_apply(model)

quant_aware_model.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_7 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_9 (QuantizeWra  (None, 20)                425       
 pperV2)                                                         
                                                                 
 quant_flatten_10 (Quantize  (None, 20)                1         
 WrapperV2)                                                      
                                                                 
=================================================================
Total params: 429 (1.68 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 9 (36.00 Byte)
_________________________________________________________________

修改要量化的層部分

本範例修改 Dense 層，以略過量化啟動。模型的其餘部分繼續使用 API 預設值。

quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope

class ModifiedDenseQuantizeConfig(DefaultDenseQuantizeConfig):
    def get_activations_and_quantizers(self, layer):
      # Skip quantizing activations.
      return []

    def set_quantize_activations(self, layer, quantize_activations):
      # Empty since `get_activaations_and_quantizers` returns
      # an empty list.
      return

在「試驗量化」使用情境中，套用設定的方式皆相同。

將 tfmot.quantization.keras.quantize_annotate_layer 套用至 Dense 層，並傳入 QuantizeConfig。
使用 tfmot.quantization.keras.quantize_annotate_model 繼續使用 API 預設值量化模型的其餘部分。

model = quantize_annotate_model(keras.Sequential([
   # Pass in modified `QuantizeConfig` to modify this Dense layer.
   quantize_annotate_layer(keras.layers.Dense(20, input_shape=(20,)), ModifiedDenseQuantizeConfig()),
   keras.layers.Flatten()
]))

# `quantize_apply` requires mentioning `ModifiedDenseQuantizeConfig` with `quantize_scope`:
with quantize_scope(
  {'ModifiedDenseQuantizeConfig': ModifiedDenseQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quant_aware_model = tfmot.quantization.keras.quantize_apply(model)

quant_aware_model.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_8 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_10 (QuantizeWr  (None, 20)                423       
 apperV2)                                                        
                                                                 
 quant_flatten_11 (Quantize  (None, 20)                1         
 WrapperV2)                                                      
                                                                 
=================================================================
Total params: 427 (1.67 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 7 (28.00 Byte)
_________________________________________________________________

使用自訂量化演算法

tfmot.quantization.keras.quantizers.Quantizer 類別是可呼叫的物件，可將任何演算法套用至其輸入。

在本範例中，輸入是權重，而我們將 FixedRangeQuantizer __call__ 函式中的數學運算套用至權重。現在，FixedRangeQuantizer 的輸出會傳遞至原本會使用權重的任何項目，而非原始權重值。

quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope

class FixedRangeQuantizer(tfmot.quantization.keras.quantizers.Quantizer):
  """Quantizer which forces outputs to be between -1 and 1."""

  def build(self, tensor_shape, name, layer):
    # Not needed. No new TensorFlow variables needed.
    return {}

  def __call__(self, inputs, training, weights, **kwargs):
    return keras.backend.clip(inputs, -1.0, 1.0)

  def get_config(self):
    # Not needed. No __init__ parameters to serialize.
    return {}


class ModifiedDenseQuantizeConfig(DefaultDenseQuantizeConfig):
    # Configure weights to quantize with 4-bit instead of 8-bits.
    def get_weights_and_quantizers(self, layer):
      # Use custom algorithm defined in `FixedRangeQuantizer` instead of default Quantizer.
      return [(layer.kernel, FixedRangeQuantizer())]

在「試驗量化」使用情境中，套用設定的方式皆相同。

將 tfmot.quantization.keras.quantize_annotate_layer 套用至 Dense 層，並傳入 QuantizeConfig。
使用 tfmot.quantization.keras.quantize_annotate_model 繼續使用 API 預設值量化模型的其餘部分。

model = quantize_annotate_model(keras.Sequential([
   # Pass in modified `QuantizeConfig` to modify this `Dense` layer.
   quantize_annotate_layer(keras.layers.Dense(20, input_shape=(20,)), ModifiedDenseQuantizeConfig()),
   keras.layers.Flatten()
]))

# `quantize_apply` requires mentioning `ModifiedDenseQuantizeConfig` with `quantize_scope`:
with quantize_scope(
  {'ModifiedDenseQuantizeConfig': ModifiedDenseQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
  quant_aware_model = tfmot.quantization.keras.quantize_apply(model)

quant_aware_model.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 quantize_layer_9 (Quantize  (None, 20)                3         
 Layer)                                                          
                                                                 
 quant_dense_11 (QuantizeWr  (None, 20)                423       
 apperV2)                                                        
                                                                 
 quant_flatten_12 (Quantize  (None, 20)                1         
 WrapperV2)                                                      
                                                                 
=================================================================
Total params: 427 (1.67 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 7 (28.00 Byte)
_________________________________________________________________