![]() |
![]() |
![]() |
![]() |
總覽
歡迎來到以量值為基礎的權重剪枝端對端範例。
其他頁面
如需剪枝簡介和判斷是否應使用剪枝 (包括支援哪些項目),請參閱總覽頁面。
若要快速找到您的使用案例 (除了以 80% 稀疏性完整剪枝模型之外) 所需的 API,請參閱綜合指南。
摘要
在本教學課程中,您將
- 從頭開始訓練用於 MNIST 的
keras
模型。 - 透過套用剪枝 API 來微調模型並查看準確度。
- 從剪枝建立小 3 倍的 TF 和 TFLite 模型。
- 從結合剪枝和訓練後量化建立小 10 倍的 TFLite 模型。
- 查看從 TF 到 TFLite 的準確度持久性。
設定
pip install -q tensorflow-model-optimization
import tempfile
import os
import tensorflow as tf
import numpy as np
from tensorflow_model_optimization.python.core.keras.compat import keras
%load_ext tensorboard
訓練用於 MNIST 的模型,不進行剪枝
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Normalize the input image so that each pixel value is between 0 and 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define the model architecture.
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
# Train the digit classification model
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(
train_images,
train_labels,
epochs=4,
validation_split=0.1,
)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] - 0s 0us/step 2024-03-09 12:16:05.087631: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected Epoch 1/4 1688/1688 [==============================] - 21s 4ms/step - loss: 0.3272 - accuracy: 0.9066 - val_loss: 0.1451 - val_accuracy: 0.9602 Epoch 2/4 1688/1688 [==============================] - 7s 4ms/step - loss: 0.1396 - accuracy: 0.9602 - val_loss: 0.1017 - val_accuracy: 0.9698 Epoch 3/4 1688/1688 [==============================] - 7s 4ms/step - loss: 0.0925 - accuracy: 0.9730 - val_loss: 0.0776 - val_accuracy: 0.9780 Epoch 4/4 1688/1688 [==============================] - 7s 4ms/step - loss: 0.0726 - accuracy: 0.9787 - val_loss: 0.0653 - val_accuracy: 0.9822 <tf_keras.src.callbacks.History at 0x7f630863b790>
評估基準測試準確度並儲存模型以供日後使用。
_, baseline_model_accuracy = model.evaluate(
test_images, test_labels, verbose=0)
print('Baseline test accuracy:', baseline_model_accuracy)
_, keras_file = tempfile.mkstemp('.h5')
keras.models.save_model(model, keras_file, include_optimizer=False)
print('Saved baseline model to:', keras_file)
Baseline test accuracy: 0.9790999889373779 Saved baseline model to: /tmpfs/tmp/tmpq067am7o.h5 /tmpfs/tmp/ipykernel_10506/3790298460.py:7: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native TF-Keras format, e.g. `model.save('my_model.keras')`. keras.models.save_model(model, keras_file, include_optimizer=False)
使用剪枝微調預先訓練的模型
定義模型
您將對整個模型套用剪枝,並在模型摘要中看到此情況。
在本範例中,您從具有 50% 稀疏性 (權重中 50% 為零) 的模型開始,並以 80% 稀疏性結束。
在綜合指南中,您可以查看如何剪枝某些層以改善模型準確度。
import tensorflow_model_optimization as tfmot
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
# Compute end step to finish pruning after 2 epochs.
batch_size = 128
epochs = 2
validation_split = 0.1 # 10% of training set will be used for validation set.
num_images = train_images.shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs
# Define model for pruning.
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
final_sparsity=0.80,
begin_step=0,
end_step=end_step)
}
model_for_pruning = prune_low_magnitude(model, **pruning_params)
# `prune_low_magnitude` requires a recompile.
model_for_pruning.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_for_pruning.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= prune_low_magnitude_reshap (None, 28, 28, 1) 1 e (PruneLowMagnitude) prune_low_magnitude_conv2d (None, 26, 26, 12) 230 (PruneLowMagnitude) prune_low_magnitude_max_po (None, 13, 13, 12) 1 oling2d (PruneLowMagnitude ) prune_low_magnitude_flatte (None, 2028) 1 n (PruneLowMagnitude) prune_low_magnitude_dense (None, 10) 40572 (PruneLowMagnitude) ================================================================= Total params: 40805 (159.41 KB) Trainable params: 20410 (79.73 KB) Non-trainable params: 20395 (79.69 KB) _________________________________________________________________
訓練模型並根據基準評估模型
使用剪枝微調兩個週期。
tfmot.sparsity.keras.UpdatePruningStep
在訓練期間是必要的,而tfmot.sparsity.keras.PruningSummaries
提供記錄以追蹤進度和偵錯。
logdir = tempfile.mkdtemp()
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir=logdir),
]
model_for_pruning.fit(train_images, train_labels,
batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks=callbacks)
Epoch 1/2 1/422 [..............................] - ETA: 19:24 - loss: 0.0619 - accuracy: 0.9766WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0047s vs `on_train_batch_end` time: 0.0067s). Check your callbacks. 422/422 [==============================] - 5s 6ms/step - loss: 0.1062 - accuracy: 0.9710 - val_loss: 0.1166 - val_accuracy: 0.9718 Epoch 2/2 422/422 [==============================] - 2s 5ms/step - loss: 0.1099 - accuracy: 0.9697 - val_loss: 0.0930 - val_accuracy: 0.9747 <tf_keras.src.callbacks.History at 0x7f6278426cd0>
對於此範例,與基準相比,剪枝後測試準確度的損失極小。
_, model_for_pruning_accuracy = model_for_pruning.evaluate(
test_images, test_labels, verbose=0)
print('Baseline test accuracy:', baseline_model_accuracy)
print('Pruned test accuracy:', model_for_pruning_accuracy)
Baseline test accuracy: 0.9790999889373779 Pruned test accuracy: 0.9686999917030334
記錄顯示了每個層稀疏性的進展。
#docs_infra: no_execute
%tensorboard --logdir={logdir}
對於非 Colab 使用者,您可以在先前的執行結果中,在 TensorBoard.dev 上查看此程式碼區塊。
從剪枝建立小 3 倍的模型
需要同時使用tfmot.sparsity.keras.strip_pruning
和套用標準壓縮演算法 (例如透過 gzip) 才能看到剪枝的壓縮優勢。
strip_pruning
是必要的,因為它會移除剪枝僅在訓練期間需要的每個 tf.Variable,否則這些變數會在推論期間增加模型大小- 套用標準壓縮演算法是必要的,因為序列化的權重矩陣與剪枝前的大小相同。但是,剪枝使大多數權重變為零,這是演算法可用於進一步壓縮模型的額外冗餘。
首先,為 TensorFlow 建立可壓縮模型。
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)
_, pruned_keras_file = tempfile.mkstemp('.h5')
keras.models.save_model(model_for_export, pruned_keras_file, include_optimizer=False)
print('Saved pruned Keras model to:', pruned_keras_file)
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model. Saved pruned Keras model to: /tmpfs/tmp/tmphtdbakhm.h5 /tmpfs/tmp/ipykernel_10506/3267383138.py:4: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native TF-Keras format, e.g. `model.save('my_model.keras')`. keras.models.save_model(model_for_export, pruned_keras_file, include_optimizer=False)
然後,為 TFLite 建立可壓縮模型。
converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
pruned_tflite_model = converter.convert()
_, pruned_tflite_file = tempfile.mkstemp('.tflite')
with open(pruned_tflite_file, 'wb') as f:
f.write(pruned_tflite_model)
print('Saved pruned TFLite model to:', pruned_tflite_file)
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp0i4b2dha/assets INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp0i4b2dha/assets WARNING: All log messages before absl::InitializeLog() is called are written to STDERR W0000 00:00:1709986617.944296 10506 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format. W0000 00:00:1709986617.944339 10506 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency. Saved pruned TFLite model to: /tmpfs/tmp/tmp1i3mzxcp.tflite
定義輔助函式,以透過 gzip 實際壓縮模型並測量壓縮後的大小。
def get_gzipped_model_size(file):
# Returns size of gzipped model, in bytes.
import os
import zipfile
_, zipped_file = tempfile.mkstemp('.zip')
with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
f.write(file)
return os.path.getsize(zipped_file)
比較並查看模型因剪枝而小 3 倍。
print("Size of gzipped baseline Keras model: %.2f bytes" % (get_gzipped_model_size(keras_file)))
print("Size of gzipped pruned Keras model: %.2f bytes" % (get_gzipped_model_size(pruned_keras_file)))
print("Size of gzipped pruned TFlite model: %.2f bytes" % (get_gzipped_model_size(pruned_tflite_file)))
Size of gzipped baseline Keras model: 78239.00 bytes Size of gzipped pruned Keras model: 25908.00 bytes Size of gzipped pruned TFlite model: 24848.00 bytes
從結合剪枝和量化建立小 10 倍的模型
您可以對剪枝模型套用訓練後量化,以獲得額外的好處。
converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_and_pruned_tflite_model = converter.convert()
_, quantized_and_pruned_tflite_file = tempfile.mkstemp('.tflite')
with open(quantized_and_pruned_tflite_file, 'wb') as f:
f.write(quantized_and_pruned_tflite_model)
print('Saved quantized and pruned TFLite model to:', quantized_and_pruned_tflite_file)
print("Size of gzipped baseline Keras model: %.2f bytes" % (get_gzipped_model_size(keras_file)))
print("Size of gzipped pruned and quantized TFlite model: %.2f bytes" % (get_gzipped_model_size(quantized_and_pruned_tflite_file)))
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp35zwmyql/assets INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp35zwmyql/assets W0000 00:00:1709986618.942100 10506 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format. W0000 00:00:1709986618.942130 10506 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency. Saved quantized and pruned TFLite model to: /tmpfs/tmp/tmp3v6lm0h4.tflite Size of gzipped baseline Keras model: 78239.00 bytes Size of gzipped pruned and quantized TFlite model: 8064.00 bytes
查看從 TF 到 TFLite 的準確度持久性
定義輔助函式,以在測試資料集上評估 TF Lite 模型。
import numpy as np
def evaluate_model(interpreter):
input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]
# Run predictions on ever y image in the "test" dataset.
prediction_digits = []
for i, test_image in enumerate(test_images):
if i % 1000 == 0:
print('Evaluated on {n} results so far.'.format(n=i))
# Pre-processing: add batch dimension and convert to float32 to match with
# the model's input data format.
test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
interpreter.set_tensor(input_index, test_image)
# Run inference.
interpreter.invoke()
# Post-processing: remove batch dimension and find the digit with highest
# probability.
output = interpreter.tensor(output_index)
digit = np.argmax(output()[0])
prediction_digits.append(digit)
print('\n')
# Compare prediction results with ground truth labels to calculate accuracy.
prediction_digits = np.array(prediction_digits)
accuracy = (prediction_digits == test_labels).mean()
return accuracy
您評估剪枝和量化模型,並查看從 TensorFlow 持續到 TFLite 後端的準確度。
interpreter = tf.lite.Interpreter(model_content=quantized_and_pruned_tflite_model)
interpreter.allocate_tensors()
test_accuracy = evaluate_model(interpreter)
print('Pruned and quantized TFLite test_accuracy:', test_accuracy)
print('Pruned TF test accuracy:', model_for_pruning_accuracy)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU. WARNING: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#13 is a dynamic-sized tensor). Evaluated on 0 results so far. Evaluated on 1000 results so far. Evaluated on 2000 results so far. Evaluated on 3000 results so far. Evaluated on 4000 results so far. Evaluated on 5000 results so far. Evaluated on 6000 results so far. Evaluated on 7000 results so far. Evaluated on 8000 results so far. Evaluated on 9000 results so far. Pruned and quantized TFLite test_accuracy: 0.9691 Pruned TF test accuracy: 0.9686999917030334
結論
在本教學課程中,您看到了如何使用 TensorFlow 模型最佳化工具組 API 為 TensorFlow 和 TFLite 建立稀疏模型。然後,您將剪枝與訓練後量化結合,以獲得額外的好處。
您為 MNIST 建立了一個小 10 倍的模型,準確度差異極小。
我們鼓勵您嘗試這項新功能,這對於在資源受限的環境中部署可能尤其重要。