注意： TensorFlow Lite 現在是 Google AI Edge 的一部分。最新的文件位於 ai.google.dev/edge/lite。瞭解詳情

使用量化偵錯工具檢查量化錯誤

在 TensorFlow.org 上檢視

雖然完整整數量化可改善模型大小和延遲時間，但量化模型不一定能如預期般運作。通常預期模型品質 (例如準確度、mAP、WER) 會略低於原始浮點模型。但是，在某些情況下，模型品質可能會低於您的預期或產生完全錯誤的結果。

當發生此問題時，找出量化錯誤的根本原因既棘手又痛苦，而修正量化錯誤更是困難。為了協助此模型檢查流程，量化偵錯工具可用於識別有問題的層，而選擇性量化可以讓這些有問題的層保持浮點數，以便在犧牲量化優勢的情況下恢復模型準確度。

量化偵錯工具

量化偵錯工具可以對現有模型執行量化品質指標分析。量化偵錯工具可以自動化使用偵錯資料集執行模型，以及收集每個張量的量化品質指標的流程。

先決條件

如果您已經有量化模型的管線，您就擁有執行量化偵錯工具的所有必要元件！

要量化的模型
代表性資料集

除了模型和資料之外，您還需要使用資料處理架構 (例如 pandas、Google 試算表) 來分析匯出的結果。

設定

本節準備程式庫、MobileNet v3 模型和 100 張圖片的測試資料集。

# Quantization debugger is available from TensorFlow 2.7.0
pip uninstall -y tensorflow
pip install tf-nightly
pip install tensorflow_datasets --upgrade  # imagenet_v2 needs latest checksum

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub

樣板程式碼和輔助程式

MODEL_URI = 'https://tfhub.dev/google/imagenet/mobilenet_v3_small_100_224/classification/5'


def process_image(data):
  data['image'] = tf.image.resize(data['image'], (224, 224)) / 255.0
  return data


# Representative dataset
def representative_dataset(dataset):

  def _data_gen():
    for data in dataset.batch(1):
      yield [data['image']]

  return _data_gen


def eval_tflite(tflite_model, dataset):
  """Evaluates tensorflow lite classification model with the given dataset."""
  interpreter = tf.lite.Interpreter(model_content=tflite_model)
  interpreter.allocate_tensors()

  input_idx = interpreter.get_input_details()[0]['index']
  output_idx = interpreter.get_output_details()[0]['index']

  results = []

  for data in representative_dataset(dataset)():
    interpreter.set_tensor(input_idx, data[0])
    interpreter.invoke()
    results.append(interpreter.get_tensor(output_idx).flatten())

  results = np.array(results)
  gt_labels = np.array(list(dataset.map(lambda data: data['label'] + 1)))
  accuracy = (
      np.sum(np.argsort(results, axis=1)[:, -5:] == gt_labels.reshape(-1, 1)) /
      gt_labels.size)
  print(f'Top-5 accuracy (quantized): {accuracy * 100:.2f}%')


model = tf.keras.Sequential([
  tf.keras.layers.Input(shape=(224, 224, 3), batch_size=1),
  hub.KerasLayer(MODEL_URI)
])
model.compile(
    loss='sparse_categorical_crossentropy',
    metrics='sparse_top_k_categorical_accuracy')
model.build([1, 224, 224, 3])

# Prepare dataset with 100 examples
ds = tfds.load('imagenet_v2', split='test[:1%]')
ds = ds.map(process_image)

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.representative_dataset = representative_dataset(ds)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_model = converter.convert()

test_ds = ds.map(lambda data: (data['image'], data['label'] + 1)).batch(16)
loss, acc = model.evaluate(test_ds)
print(f'Top-5 accuracy (float): {acc * 100:.2f}%')

eval_tflite(quantized_model, ds)

我們可以發現，原始模型在我們的小型資料集上的前 5 名準確度高得多，而量化模型則有顯著的準確度損失。

步驟 1. 偵錯工具準備

使用量化偵錯工具最簡單的方法是提供您一直用來量化模型的 tf.lite.TFLiteConverter。

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset(ds)

# my_debug_dataset should have the same format as my_representative_dataset
debugger = tf.lite.experimental.QuantizationDebugger(
    converter=converter, debug_dataset=representative_dataset(ds))

步驟 2. 執行偵錯工具並取得結果

當您呼叫 QuantizationDebugger.run() 時，偵錯工具會記錄相同運算位置的浮點張量和量化張量之間的差異，並使用給定的指標處理這些差異。

debugger.run()

處理後的指標可以使用 QuantizationDebugger.layer_statistics 存取，或者可以使用 QuantizationDebugger.layer_statistics_dump() 以 CSV 格式傾印到文字檔。

RESULTS_FILE = '/tmp/debugger_results.csv'
with open(RESULTS_FILE, 'w') as f:
  debugger.layer_statistics_dump(f)

head /tmp/debugger_results.csv

在傾印中的每一列，運算名稱和索引會先出現，然後是量化參數和錯誤指標 (包括使用者定義的錯誤指標，如果有的話)。產生的 CSV 檔案可用於挑選具有大量化錯誤指標的有問題層。

透過 pandas 或其他資料處理程式庫，我們可以檢查詳細的每層錯誤指標。

layer_stats = pd.read_csv(RESULTS_FILE)
layer_stats.head()

步驟 3. 資料分析

有多種方法可以分析結果。首先，讓我們新增一些從偵錯工具輸出衍生的實用指標。( scale 表示每個張量的量化比例因子。)

範圍 ( 256 / scale )
RMSE / 比例 ( sqrt(mean_squared_error) / scale )

當量化分佈與原始浮點分佈相似時， RMSE / scale 接近 1 / sqrt(12) (~ 0.289)，表示量化模型良好。值越大，圖層量化效果不佳的可能性就越高。

layer_stats['range'] = 255.0 * layer_stats['scale']
layer_stats['rmse/scale'] = layer_stats.apply(
    lambda row: np.sqrt(row['mean_squared_error']) / row['scale'], axis=1)
layer_stats[['op_name', 'range', 'rmse/scale']].head()

plt.figure(figsize=(15, 5))
ax1 = plt.subplot(121)
ax1.bar(np.arange(len(layer_stats)), layer_stats['range'])
ax1.set_ylabel('range')
ax2 = plt.subplot(122)
ax2.bar(np.arange(len(layer_stats)), layer_stats['rmse/scale'])
ax2.set_ylabel('rmse/scale')
plt.show()

有許多層具有寬廣的範圍，還有一些層具有高 RMSE/scale 值。讓我們取得具有高錯誤指標的層。

layer_stats[layer_stats['rmse/scale'] > 0.7][[
    'op_name', 'range', 'rmse/scale', 'tensor_name'
]]

透過這些層，您可以嘗試選擇性量化，以查看不量化這些層是否能提高模型品質。

suspected_layers = list(
    layer_stats[layer_stats['rmse/scale'] > 0.7]['tensor_name'])

除了這些之外，略過前幾層的量化也有助於提高量化模型的品質。

suspected_layers.extend(list(layer_stats[:5]['tensor_name']))

選擇性量化

選擇性量化會略過某些節點的量化，以便計算可以在原始浮點網域中進行。當略過正確的層時，我們可以預期在延遲時間和模型大小增加的情況下，模型品質會有所恢復。

但是，如果您計劃在僅限整數的加速器 (例如 Hexagon DSP、EdgeTPU) 上執行量化模型，則選擇性量化會導致模型分散，並導致較慢的推論延遲時間，這主要是由 CPU 和這些加速器之間的資料傳輸成本引起的。為了防止這種情況，您可以考慮執行量化感知訓練，以將所有層保持在整數中，同時保持模型準確度。

量化偵錯工具的選項接受 denylisted_nodes 和 denylisted_ops 選項，以略過特定層或特定運算的所有執行個體的量化。使用我們從上一步準備的 suspected_layers ，我們可以使用量化偵錯工具來取得選擇性量化模型。

debug_options = tf.lite.experimental.QuantizationDebugOptions(
    denylisted_nodes=suspected_layers)
debugger = tf.lite.experimental.QuantizationDebugger(
    converter=converter,
    debug_dataset=representative_dataset(ds),
    debug_options=debug_options)

selective_quantized_model = debugger.get_nondebug_quantized_model()
eval_tflite(selective_quantized_model, ds)

與原始浮點模型相比，準確度仍然較低，但是透過略過 111 層中的約 10 層的量化，我們從整體量化模型中獲得了顯著的改進。

您也可以嘗試不要量化相同類別中的所有運算。例如，若要略過所有平均運算的量化，您可以將 MEAN 傳遞至 denylisted_ops 。

debug_options = tf.lite.experimental.QuantizationDebugOptions(
    denylisted_ops=['MEAN'])
debugger = tf.lite.experimental.QuantizationDebugger(
    converter=converter,
    debug_dataset=representative_dataset(ds),
    debug_options=debug_options)

selective_quantized_model = debugger.get_nondebug_quantized_model()
eval_tflite(selective_quantized_model, ds)

透過這些技術，我們能夠提高量化 MobileNet V3 模型的準確度。接下來，我們將探索進階技術，以進一步提高模型準確度。

進階用法

透過以下功能，您可以進一步自訂偵錯管線。

自訂指標

預設情況下，量化偵錯工具會針對每個浮點數-量化差異發出五個指標：張量大小、標準差、平均誤差、最大絕對誤差和均方誤差。您可以透過將更多自訂指標傳遞至選項來新增它們。對於每個指標，結果應該是單一浮點值，而產生的指標將是所有範例指標的平均值。

layer_debug_metrics ：根據浮點數和量化運算輸出的每個運算輸出的差異計算指標。
layer_direct_compare_metrics ：與僅取得差異不同，這將根據原始浮點數和量化張量及其量化參數 (比例、零點) 計算指標
model_debug_metrics ： 僅當 float_model_(path|content) 傳遞至偵錯工具時使用。除了運算層級指標之外，最終層輸出還會與原始浮點模型的參考輸出進行比較。

debug_options = tf.lite.experimental.QuantizationDebugOptions(
    layer_debug_metrics={
        'mean_abs_error': (lambda diff: np.mean(np.abs(diff)))
    },
    layer_direct_compare_metrics={
        'correlation':
            lambda f, q, s, zp: (np.corrcoef(f.flatten(),
                                             (q.flatten() - zp) / s)[0, 1])
    },
    model_debug_metrics={
        'argmax_accuracy': (lambda f, q: np.mean(np.argmax(f) == np.argmax(q)))
    })

debugger = tf.lite.experimental.QuantizationDebugger(
    converter=converter,
    debug_dataset=representative_dataset(ds),
    debug_options=debug_options)

debugger.run()

CUSTOM_RESULTS_FILE = '/tmp/debugger_results.csv'
with open(CUSTOM_RESULTS_FILE, 'w') as f:
  debugger.layer_statistics_dump(f)

custom_layer_stats = pd.read_csv(CUSTOM_RESULTS_FILE)
custom_layer_stats[['op_name', 'mean_abs_error', 'correlation']].tail()

model_debug_metrics 的結果可以從 debugger.model_statistics 中單獨查看。

debugger.model_statistics

使用 (內部) mlir_quantize API 存取深入功能

from tensorflow.lite.python import convert

完整模型驗證模式

偵錯模型產生的預設行為是每層驗證。在此模式下，浮點數和量化運算對的輸入來自相同的來源 (先前的量化運算)。另一種模式是完整模型驗證，其中浮點數和量化模型是分開的。此模式可用於觀察錯誤如何在模型中向下傳播。若要啟用，請將 enable_whole_model_verify=True 設定為 convert.mlir_quantize ，同時手動產生偵錯模型。

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.representative_dataset = representative_dataset(ds)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter._experimental_calibrate_only = True
calibrated_model = converter.convert()

# Note that enable_numeric_verify and enable_whole_model_verify are set.
quantized_model = convert.mlir_quantize(
    calibrated_model,
    enable_numeric_verify=True,
    enable_whole_model_verify=True)
debugger = tf.lite.experimental.QuantizationDebugger(
    quant_debug_model_content=quantized_model,
    debug_dataset=representative_dataset(ds))

從已校正模型進行選擇性量化

您可以直接呼叫 convert.mlir_quantize ，以從已校正模型取得選擇性量化模型。當您想要校正模型一次，並試驗各種拒絕清單組合時，這特別有用。

selective_quantized_model = convert.mlir_quantize(
    calibrated_model, denylisted_nodes=suspected_layers)
eval_tflite(selective_quantized_model, ds)