![]() |
![]() |
![]() |
![]() |
![]() |
這個筆記本示範如何使用來自 TensorFlow Hub 的 CropNet 木薯疾病分類器 模型。此模型將木薯葉的圖片分類為 6 個類別的其中一類:細菌性疫病、褐條病、綠螨、嵌紋病、健康或未知。
這個 colab 示範如何:
- 從 TensorFlow Hub 載入 https://tfhub.dev/google/cropnet/classifier/cassava_disease_V1/2 模型
- 從 TensorFlow Datasets (TFDS) 載入 cassava 資料集
- 將木薯葉的圖片分類為 4 個不同的木薯疾病類別,或分類為健康或未知。
- 評估分類器的準確度,並查看模型應用於領域外圖片時的穩健性。
匯入和設定
pip install matplotlib==3.2.2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_hub as hub
用於顯示範例的輔助函式
資料集
讓我們從 TFDS 載入 cassava 資料集
dataset, info = tfds.load('cassava', with_info=True)
2024-03-09 13:44:07.128854: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
讓我們看看資料集資訊以瞭解更多資訊,例如說明和引文,以及關於有多少範例可用的資訊
info
tfds.core.DatasetInfo( name='cassava', full_name='cassava/0.1.0', description=""" Cassava consists of leaf images for the cassava plant depicting healthy and four (4) disease conditions; Cassava Mosaic Disease (CMD), Cassava Bacterial Blight (CBB), Cassava Greem Mite (CGM) and Cassava Brown Streak Disease (CBSD). Dataset consists of a total of 9430 labelled images. The 9430 labelled images are split into a training set (5656), a test set(1885) and a validation set (1889). The number of images per class are unbalanced with the two disease classes CMD and CBSD having 72% of the images. """, homepage='https://www.kaggle.com/c/cassava-disease/overview', data_dir='gs://tensorflow-datasets/datasets/cassava/0.1.0', file_format=tfrecord, download_size=1.26 GiB, dataset_size=Unknown size, features=FeaturesDict({ 'image': Image(shape=(None, None, 3), dtype=uint8), 'image/filename': Text(shape=(), dtype=string), 'label': ClassLabel(shape=(), dtype=int64, num_classes=5), }), supervised_keys=('image', 'label'), disable_shuffling=False, splits={ 'test': <SplitInfo num_examples=1885, num_shards=4>, 'train': <SplitInfo num_examples=5656, num_shards=8>, 'validation': <SplitInfo num_examples=1889, num_shards=4>, }, citation="""@misc{mwebaze2019icassava, title={iCassava 2019Fine-Grained Visual Categorization Challenge}, author={Ernest Mwebaze and Timnit Gebru and Andrea Frome and Solomon Nsumba and Jeremy Tusubira}, year={2019}, eprint={1908.02900}, archivePrefix={arXiv}, primaryClass={cs.CV} }""", )
cassava 資料集包含木薯葉的圖片,這些圖片帶有 4 種不同的疾病以及健康的木薯葉。模型可以預測所有這些類別,以及當模型對其預測沒有信心時的第六個類別「未知」。
# Extend the cassava dataset classes with 'unknown'
class_names = info.features['label'].names + ['unknown']
# Map the class names to human readable names
name_map = dict(
cmd='Mosaic Disease',
cbb='Bacterial Blight',
cgm='Green Mite',
cbsd='Brown Streak Disease',
healthy='Healthy',
unknown='Unknown')
print(len(class_names), 'classes:')
print(class_names)
print([name_map[name] for name in class_names])
6 classes: ['cbb', 'cbsd', 'cgm', 'cmd', 'healthy', 'unknown'] ['Bacterial Blight', 'Brown Streak Disease', 'Green Mite', 'Mosaic Disease', 'Healthy', 'Unknown']
在我們可以將資料饋送到模型之前,我們需要做一些預先處理。模型預期 224 x 224 圖片,RGB 通道值在 [0, 1] 範圍內。讓我們正規化並調整圖片大小。
def preprocess_fn(data):
image = data['image']
# Normalize [0, 255] to [0, 1]
image = tf.cast(image, tf.float32)
image = image / 255.
# Resize the images to 224 x 224
image = tf.image.resize(image, (224, 224))
data['image'] = image
return data
讓我們看看資料集中的一些範例
batch = dataset['validation'].map(preprocess_fn).batch(25).as_numpy_iterator()
examples = next(batch)
plot(examples)
模型
讓我們從 TF Hub 載入分類器並取得一些預測,看看模型對一些範例的預測
classifier = hub.KerasLayer('https://tfhub.dev/google/cropnet/classifier/cassava_disease_V1/2')
probabilities = classifier(examples['image'])
predictions = tf.argmax(probabilities, axis=-1)
plot(examples, predictions)
評估與穩健性
讓我們測量我們的分類器在資料集分割上的準確度。我們也可以透過評估模型在非木薯資料集上的效能來查看模型的穩健性。對於其他植物資料集 (如 iNaturalist 或豆類) 的圖片,模型應幾乎總是傳回未知。
參數
def label_to_unknown_fn(data):
data['label'] = 5 # Override label to unknown.
return data
# Preprocess the examples and map the image label to unknown for non-cassava datasets.
ds = tfds.load(DATASET, split=DATASET_SPLIT).map(preprocess_fn).take(MAX_EXAMPLES)
dataset_description = DATASET
if DATASET != 'cassava':
ds = ds.map(label_to_unknown_fn)
dataset_description += ' (labels mapped to unknown)'
ds = ds.batch(BATCH_SIZE)
# Calculate the accuracy of the model
metric = tf.keras.metrics.Accuracy()
for examples in ds:
probabilities = classifier(examples['image'])
predictions = tf.math.argmax(probabilities, axis=-1)
labels = examples['label']
metric.update_state(labels, predictions)
print('Accuracy on %s: %.2f' % (dataset_description, metric.result().numpy()))
Accuracy on cassava: 0.88 2024-03-09 13:44:27.693415: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
瞭解詳情
- 在 TensorFlow Hub 上瞭解更多關於模型的資訊:https://tfhub.dev/google/cropnet/classifier/cassava_disease_V1/2
- 瞭解如何使用 ML Kit 和此模型的 TensorFlow Lite 版本,在行動電話上建構自訂圖片分類器。