獨立模型卡工具組範例

這個「獨立」筆記本示範如何在沒有 TFX/MLMD 環境的情況下使用模型卡工具組。

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視

下載筆記本

目標

這個筆記本示範如何在 Jupyter/Colab 環境中使用模型卡工具組產生模型卡。您可以在 https://modelcards.withgoogle.com/about 瞭解更多關於模型卡的資訊

我們在這個範例中使用 Keras 模型。但以下邏輯也大致適用於其他 ML 架構。

設定

我們首先需要 a) 安裝和匯入必要的套件，以及 b) 下載資料。

升級 Pip 並安裝模型卡工具組

pip install --upgrade pip
pip install 'model-card-toolkit>=1.0.0'
pip install 'tensorflow>=2.3.1'
pip install 'tensorflow-datasets>=4.8.2'

您是否重新啟動執行階段？

如果您使用 Google Colab，第一次執行上述儲存格時，您必須重新啟動執行階段 (執行階段 > 重新啟動執行階段...)。這是因為 Colab 載入套件的方式。

匯入

import tensorflow as tf
import numpy as np
import model_card_toolkit as mct
from model_card_toolkit.documentation.examples import cats_vs_dogs
from model_card_toolkit.utils.graphics import figure_to_base64str
import tempfile
import matplotlib.pyplot as plt
from IPython import display
import requests
import os
import zipfile

模型

我們將使用預先訓練的模型，其架構以 MobileNetV2 (一種熱門的 16 層圖片分類模型) 為基礎。我們的模型已經過訓練，可使用貓狗大戰資料集區分貓和狗。模型訓練以 TensorFlow 轉移學習教學課程為基礎。

URL = 'https://storage.googleapis.com/cats_vs_dogs_model/cats_vs_dogs_model.zip'
BASE_PATH = tempfile.mkdtemp()
ZIP_PATH = os.path.join(BASE_PATH, 'cats_vs_dogs_model.zip')
MODEL_PATH = os.path.join(BASE_PATH,'cats_vs_dogs_model')

r = requests.get(URL, allow_redirects=True)
open(ZIP_PATH, 'wb').write(r.content)

with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
    zip_ref.extractall(BASE_PATH)

model = tf.keras.models.load_model(MODEL_PATH)

WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2023-10-03 09:12:20.736066: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflow.dev.org.tw/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

資料集

在貓狗大戰資料集中，標籤 = 0 對應到貓，而標籤 = 1 對應到狗。

def compute_accuracy(data):
  x = np.stack(data['examples'])
  y = np.asarray(data['labels'])
  _, metric = model.evaluate(x, y)
  return metric

examples = cats_vs_dogs.get_data()
print('num validation examples:', len(examples['combined']['examples']))
print('num cat examples:', len(examples['cat']['examples']))
print('num dog examples:', len(examples['dog']['examples']))

num validation examples: 320
num cat examples: 149
num dog examples: 171
2023-10-03 09:12:30.081069: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

accuracy = compute_accuracy(examples['combined'])
cat_accuracy = compute_accuracy(examples['cat'])
dog_accuracy = compute_accuracy(examples['dog'])

10/10 [==============================] - 2s 77ms/step - loss: 0.0794 - binary_accuracy: 0.9812
5/5 [==============================] - 1s 74ms/step - loss: 0.0608 - binary_accuracy: 0.9933
6/6 [==============================] - 0s 65ms/step - loss: 0.0956 - binary_accuracy: 0.9708

使用模型卡工具組

初始化模型卡工具組

第一步是初始化 ModelCardToolkit 物件，該物件維護資產，包括模型卡 JSON 檔案和模型卡文件。呼叫 ModelCardToolkit.scaffold_assets() 以產生這些資產並傳回 ModelCard 物件。

# https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/model_card_toolkit.py
model_card_dir = tempfile.mkdtemp()
toolkit = mct.ModelCardToolkit(model_card_dir)

# https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/model_card.py
model_card = toolkit.scaffold_assets()

註解模型卡

scaffold_assets() 傳回的 ModelCard 物件具有許多可以直接修改的欄位。這些欄位會呈現在最終產生的模型卡文件中。如需完整清單，請參閱 model_card.py。如需更多詳細資訊，請參閱文件。

文字欄位

模型詳細資料

model_card.model_details 包含許多基本中繼資料欄位，例如 name、owners 和 version。overview 欄位中可以提供模型的說明。

model_card.model_details.name = 'Fine-tuned MobileNetV2 Model for Cats vs. Dogs'
model_card.model_details.overview = (
    'This model distinguishes cat and dog images. It uses the MobileNetV2 '
    'architecture (https://arxiv.org/abs/1801.04381) and is trained on the '
    'Cats vs Dogs dataset '
    '(https://tensorflow.dev.org.tw/datasets/catalog/cats_vs_dogs). This model '
    'performed with high accuracy on both Cat and Dog images.'
)
model_card.model_details.owners = [
  mct.Owner(name='Model Cards Team', contact='model-cards@google.com')
]
model_card.model_details.version = mct.Version(name='v1.0', date='08/28/2020')
model_card.model_details.references = [
    mct.Reference(reference='https://tensorflow.dev.org.tw/guide/keras/transfer_learning'),
    mct.Reference(reference='https://arxiv.org/abs/1801.04381'),
]
model_card.model_details.licenses = [mct.License(identifier='Apache-2.0')]
model_card.model_details.citations = [mct.Citation(citation='https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/documentation/examples/Standalone_Model_Card_Toolkit_Demo.ipynb')]

量化分析

model_card.quantitative_analysis 包含模型效能指標的相關資訊。

以下我們為假設模型 (以我們的資料集為基礎建構) 建立一些合成效能指標值。

model_card.quantitative_analysis.performance_metrics = [
  mct.PerformanceMetric(type='accuracy', value=str(accuracy)),
  mct.PerformanceMetric(type='accuracy', value=str(cat_accuracy), slice='cat'),
  mct.PerformanceMetric(type='accuracy', value=str(dog_accuracy), slice='Dog'),
]

考量事項

model_card.considerations 包含關於模型的資格資訊，例如：適當的使用情境為何？使用者應注意哪些限制？應用程式的倫理考量為何？等等。

model_card.considerations.use_cases = [
    mct.UseCase(description='This model classifies images of cats and dogs.')
]
model_card.considerations.limitations = [
    mct.Limitation(description='This model is not able to classify images of other classes.')
]
model_card.considerations.ethical_considerations = [mct.Risk(
    name=
        'While distinguishing between cats and dogs is generally agreed to be '
        'a benign application of machine learning, harmful results can occur '
        'when the model attempts to classify images that don’t contain cats or '
        'dogs.',
    mitigation_strategy=
        'Avoid application on non-dog and non-cat images.'
)]

圖表欄位

報告的最佳做法通常是提供模型訓練資料及其在評估資料中的效能資訊。模型卡工具組可讓使用者將此資訊編碼在視覺化內容中，並呈現在模型卡中。

model_card 有三個圖表區段：model_card.model_parameters.data.train.graphics (用於訓練資料集統計資料)、model_card.model_parameters.data.eval.graphics (用於評估資料集統計資料) 和 model_card.quantitative_analysis.graphics (用於模型效能量化分析)。

圖表以 base64 字串形式儲存。如果您有 matplotlib 圖表，可以使用 model_card_toolkit.utils.graphics.figure_to_base64str() 將其轉換為 base64 字串。

# Validation Set Size Bar Chart
fig, ax = plt.subplots()
width = 0.75
rects0 = ax.bar(0, len(examples['combined']['examples']), width, label='Overall')
rects1 = ax.bar(1, len(examples['cat']['examples']), width, label='Cat')
rects2 = ax.bar(2, len(examples['dog']['examples']), width, label='Dog')
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['Overall', 'Cat', 'Dog'])
ax.set_ylabel('Validation Set Size')
ax.set_xlabel('Slices')
ax.set_title('Validation Set Size for Slices')
validation_set_size_barchart = figure_to_base64str(fig)

png

# Acuracy Bar Chart
fig, ax = plt.subplots()
width = 0.75
rects0 = ax.bar(0, accuracy, width, label='Overall')
rects1 = ax.bar(1, cat_accuracy, width, label='Cat')
rects2 = ax.bar(2, dog_accuracy, width, label='Dog')
ax.set_xticks(np.arange(3))
ax.set_xticklabels(['Overall', 'Cat', 'Dog'])
ax.set_ylabel('Accuracy')
ax.set_xlabel('Slices')
ax.set_title('Accuracy on Slices')
accuracy_barchart = figure_to_base64str(fig)

png

現在我們可以將它們新增至 ModelCard。

model_card.model_parameters.data.append(mct.Dataset())
model_card.model_parameters.data[0].graphics.collection = [
  mct.Graphic(name='Validation Set Size', image=validation_set_size_barchart),
]
model_card.quantitative_analysis.graphics.collection = [
  mct.Graphic(name='Accuracy', image=accuracy_barchart),
]

產生模型卡

讓我們產生模型卡文件。可用的格式儲存在 model_card_toolkit/template。在這裡，我們將示範 HTML 和 Markdown 格式。

首先，我們需要使用最新的 ModelCard 更新 ModelCardToolkit。

toolkit.update_model_card(model_card)

現在，ModelCardToolkit 可以使用 ModelCardToolkit.export_format() 產生模型卡文件。

# Generate a model card document in HTML (default)
html_doc = toolkit.export_format()

# Display the model card document in HTML
display.display(display.HTML(html_doc))

您也可以輸出其他格式的模型卡，例如 Markdown。

# Generate a model card document in Markdown
md_path = os.path.join(model_card_dir, 'template/md/default_template.md.jinja')
md_doc = toolkit.export_format(template_path=md_path, output_file='model_card.md')

# Display the model card document in Markdown
display.display(display.Markdown(md_doc))

適用於微調貓狗大戰 MobileNetV2 模型的模型卡

模型詳細資料

總覽

這個模型可區分貓和狗的圖片。它使用 MobileNetV2 架構 (https://arxiv.org/abs/1801.04381)，並在貓狗大戰資料集 (https://tensorflow.dev.org.tw/datasets/catalog/cats_vs_dogs) 上進行訓練。這個模型在貓和狗的圖片上都表現出高準確度。

版本

名稱：v1.0

日期：08/28/2020

擁有者

模型卡團隊，model-cards@google.com

授權

Apache-2.0

參考資料

引述

https://github.com/tensorflow/model-card-toolkit/blob/master/model_card_toolkit/documentation/examples/Standalone_Model_Card_Toolkit_Demo.ipynb

考量事項

使用情境

這個模型會分類貓和狗的圖片。

限制

這個模型無法分類其他類別的圖片。

倫理考量

風險：雖然區分貓和狗通常被認為是機器學習的良性應用，但當模型嘗試分類不含貓或狗的圖片時，可能會發生有害的結果。
- 降低風險策略：避免應用於非狗和非貓的圖片。

圖表

驗證集大小

準確度

指標

名稱	值
準確度	0.981249988079071
準確度，貓	0.9932885766029358
準確度，狗	0.9707602262496948