TFDS 現在支援 Croissant 🥐 格式！請閱讀文件以瞭解詳情。

TFDS CLI

TFDS CLI 是一個命令列工具，提供各種指令，讓您輕鬆使用 TensorFlow 數據集。

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

下載筆記本

匯入時停用 TF 記錄

%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

安裝

CLI 工具會與 tensorflow-datasets (或 tfds-nightly) 一起安裝。

pip install -q tfds-nightly apache-beam
tfds --version

如需所有 CLI 指令的清單

tfds --help

usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

`tfds new`：實作新的數據集

這個指令會協助您快速開始編寫新的 Python 數據集，方法是建立包含預設實作檔案的 <dataset_name>/ 目錄。

用法

tfds new my_dataset

Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://tensorflow.dev.org.tw/datasets/add_dataset for additional details.

tfds new my_dataset 將會建立

ls -1 my_dataset/

CITATIONS.bib
README.md
TAGS.txt
__init__.py
checksums.tsv
dummy_data/
my_dataset_dataset_builder.py
my_dataset_dataset_builder_test.py

可使用選用旗標 --data_format 來產生特定格式的數據集建構工具 (例如，conll)。如果未指定資料格式，則會產生標準 tfds.core.GeneratorBasedBuilder 的範本。如需可用特定格式數據集建構工具的詳細資訊，請參閱文件。

請參閱我們的數據集編寫指南以取得更多資訊。

可用選項

tfds new --help

usage: tfds new [-h] [--helpfull] [--data_format {standard,conll,conllu}]
                [--dir DIR]
                dataset_name

positional arguments:
  dataset_name          Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --data_format {standard,conll,conllu}
                        Optional format of the input data, which is used to
                        generate a format-specific template.
  --dir DIR             Path where the dataset directory will be created.
                        Defaults to current directory.

`tfds build`：下載並準備數據集

使用 tfds build <my_dataset> 產生新的數據集。<my_dataset> 可以是

dataset/ 資料夾或 dataset.py 檔案的路徑 (目前目錄則為空白)
- tfds build datasets/my_dataset/
- cd datasets/my_dataset/ && tfds build
- cd datasets/my_dataset/ && tfds build my_dataset
- cd datasets/my_dataset/ && tfds build my_dataset.py
已註冊的數據集
- tfds build mnist
- tfds build my_dataset --imports my_project.datasets