TFX Estimator 元件教學課程

TensorFlow Extended (TFX) 元件逐一介紹

這個以 Colab 為基礎的教學課程將互動式逐步引導您瞭解 TensorFlow Extended (TFX) 的每個內建元件。

內容涵蓋端對端機器學習管線的每個步驟,從資料擷取到將模型推送以進行服務。

完成後,這個筆記本的內容可以自動匯出為 TFX 管線原始碼,您可以使用 Apache Airflow 和 Apache Beam 來協調這些原始碼。

背景資訊

這個筆記本示範如何在 Jupyter/Colab 環境中使用 TFX。在這裡,我們會逐步瀏覽互動式筆記本中的芝加哥計程車範例。

在互動式筆記本中工作是熟悉 TFX 管線結構的實用方法。當您將自己的管線當做輕量型開發環境進行開發時,也很有用,但您應該注意,互動式筆記本的協調方式以及其存取中繼資料成品的方式有所不同。

協調

在 TFX 的生產環境部署中,您將使用協調器 (例如 Apache Airflow、Kubeflow Pipelines 或 Apache Beam) 來協調預先定義的 TFX 元件管線圖。在互動式筆記本中,筆記本本身就是協調器,會在您執行筆記本儲存格時執行每個 TFX 元件。

中繼資料

在 TFX 的生產環境部署中,您將透過 ML Metadata (MLMD) API 存取中繼資料。MLMD 會將中繼資料屬性儲存在資料庫 (例如 MySQL 或 SQLite) 中,並將中繼資料酬載儲存在永久儲存空間 (例如您的檔案系統) 中。在互動式筆記本中,屬性和酬載都會儲存在 Jupyter 筆記本或 Colab 伺服器上 /tmp 目錄中的暫時性 SQLite 資料庫中。

設定

首先,我們安裝並匯入必要的套件、設定路徑,然後下載資料。

升級 Pip

為了避免在本機執行時升級系統中的 Pip,請檢查以確保我們在 Colab 中執行。本機系統當然可以個別升級。

try:
  import colab
  !pip install --upgrade pip
except:
  pass

安裝 TFX

pip install tfx

您是否已重新啟動執行階段?

如果您使用 Google Colab,第一次執行上述儲存格時,您必須重新啟動執行階段 (「執行階段」>「重新啟動執行階段」...)。這是因為 Colab 載入套件的方式。

匯入套件

我們匯入必要的套件,包括標準 TFX 元件類別。

import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip
2024-05-08 09:28:45.211472: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-08 09:28:45.211527: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-08 09:28:45.213202: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

讓我們檢查程式庫版本。

print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))
TensorFlow version: 2.15.1
TFX version: 1.15.0

設定管線路徑

# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# This is the directory containing the TFX Chicago Taxi Pipeline example.
_taxi_root = os.path.join(_tfx_root, 'examples/chicago_taxi_pipeline')

# This is the path where your model will be pushed for serving.
_serving_model_dir = os.path.join(
    tempfile.mkdtemp(), 'serving_model/taxi_simple')

# Set up logging.
absl.logging.set_verbosity(absl.logging.INFO)

下載範例資料

我們下載範例資料集,以用於我們的 TFX 管線中。

我們使用的資料集是芝加哥市發布的 計程車行程資料集。這個資料集中的欄是

pickup_community_areafaretrip_start_month
trip_start_hourtrip_start_daytrip_start_timestamp
pickup_latitudepickup_longitudedropoff_latitude
dropoff_longitudetrip_milespickup_census_tract
dropoff_census_tractpayment_typecompany
trip_secondsdropoff_community_areatips

有了這個資料集,我們將建構一個模型來預測行程的 tips (小費)。

_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)
('/tmpfs/tmp/tfx-datafzazc6h4/data.csv',
 <http.client.HTTPMessage at 0x7f393fe02ee0>)

快速查看 CSV 檔案。

head {_data_filepath}
pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips
,12.45,5,19,6,1400269500,,,,,0.0,,,Credit Card,Chicago Elite Cab Corp. (Chicago Carriag,0,,0.0
,0,3,19,5,1362683700,,,,,0,,,Unknown,Chicago Elite Cab Corp.,300,,0
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0
16,32.05,12,1,1,1417916700,41.953582125,-87.72345239,,,15.4,,,Cash,,1200,,0.0
30,38.45,10,10,5,1444301100,41.839086906,-87.714003807,,,14.6,,,Cash,,2580,,0.0
11,14.65,1,1,3,1358213400,41.978829526,-87.771166703,,,5.81,,,Cash,,1080,,0.0

免責聲明:本網站提供的應用程式使用來自原始來源 www.cityofchicago.org (芝加哥市官方網站) 的修改資料。芝加哥市對於本網站提供的任何資料的內容、準確性、即時性或完整性不做任何聲明。本網站提供的資料隨時可能變更。您瞭解到使用本網站提供的資料須自行承擔風險。

建立 InteractiveContext

最後,我們建立 InteractiveContext,讓我們可以在這個筆記本中以互動方式執行 TFX 元件。

# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs as root for pipeline outputs.
WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/metadata.sqlite.

以互動方式執行 TFX 元件

在後續的儲存格中,我們會逐一建立 TFX 元件、執行每個元件,並視覺化其輸出成品。

ExampleGen

ExampleGen 元件通常位於 TFX 管線的開頭。它會執行下列作業:

  1. 將資料分割成訓練集和評估集 (預設為 2/3 訓練 + 1/3 評估)
  2. 將資料轉換為 tf.Example 格式 (如要瞭解詳情,請參閱這裡)
  3. 將資料複製到 _tfx_root 目錄,供其他元件存取

ExampleGen 會將資料來源的路徑做為輸入。在我們的案例中,這是包含已下載 CSV 的 _data_root 路徑。

example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
context.run(example_gen)
INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.
INFO:absl:Processing input csv data /tmpfs/tmp/tfx-datafzazc6h4/* to TFExample.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized

讓我們檢查 ExampleGen 的輸出成品。這個元件會產生兩個成品:訓練範例和評估範例

artifact = example_gen.outputs['examples'].get()[0]
print(artifact.split_names, artifact.uri)
["train", "eval"] /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/CsvExampleGen/examples/1

我們也可以查看前三個訓練範例

# Get the URI of the output artifact representing the training examples, which is a directory
train_uri = os.path.join(example_gen.outputs['examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Chicago Elite Cab Corp. (Chicago Carriag"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 12.449999809265137
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Credit Card"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1400269500
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
        value: "Taxi Affiliation Services"
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 27.049999237060547
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.836151123046875
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.64878845214844
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 12.600000381469727
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 1380
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1380593700
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      bytes_list {
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      float_list {
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 16.450000762939453
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      bytes_list {
        value: "Cash"
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      float_list {
        value: 41.98363494873047
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      float_list {
        value: -87.72357940673828
      }
    }
  }
  feature {
    key: "tips"
    value {
      float_list {
        value: 0.0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 6.900000095367432
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      int64_list {
        value: 780
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
  feature {
    key: "trip_start_timestamp"
    value {
      int64_list {
        value: 1446554700
      }
    }
  }
}

現在 ExampleGen 已完成擷取資料,下一個步驟是資料分析。

StatisticsGen

StatisticsGen 元件會計算資料集的統計資料,以進行資料分析,以及供下游元件使用。它使用 TensorFlow Data Validation 程式庫。

StatisticsGen 會將我們剛使用 ExampleGen 擷取的資料集做為輸入。

statistics_gen = tfx.components.StatisticsGen(examples=example_gen.outputs['examples'])
context.run(statistics_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for StatisticsGen
INFO:absl:Generating statistics for split train.
INFO:absl:Statistics for split train written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/StatisticsGen/statistics/2/Split-train.
INFO:absl:Generating statistics for split eval.
INFO:absl:Statistics for split eval written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/StatisticsGen/statistics/2/Split-eval.
INFO:absl:Running publisher for StatisticsGen
INFO:absl:MetadataStore with DB connection initialized

StatisticsGen 完成執行後,我們可以視覺化輸出的統計資料。試著玩玩不同的圖表!

context.show(statistics_gen.outputs['statistics'])

SchemaGen

SchemaGen 元件會根據您的資料統計資料產生結構描述。(結構描述定義資料集中特徵的預期界限、類型和屬性。) 它也使用 TensorFlow Data Validation 程式庫。

SchemaGen 會將我們使用 StatisticsGen 產生的統計資料做為輸入,預設查看訓練分割。

schema_gen = tfx.components.SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(schema_gen)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for SchemaGen
INFO:absl:Processing schema from statistics for split train.
INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/SchemaGen/schema/3/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized

SchemaGen 完成執行後,我們可以將產生的結構描述視覺化為表格。

context.show(schema_gen.outputs['schema'])

資料集中的每個特徵都會在結構描述表格中顯示為一個資料列,以及其屬性。結構描述也會擷取類別特徵所採用的所有值,以網域表示。

如要進一步瞭解結構描述,請參閱 SchemaGen 文件

ExampleValidator

ExampleValidator 元件會根據結構描述定義的預期,偵測資料中的異常。它也使用 TensorFlow Data Validation 程式庫。

ExampleValidator 會將來自 StatisticsGen 的統計資料和來自 SchemaGen 的結構描述做為輸入。

example_validator = tfx.components.ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'])
context.run(example_validator)
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Running driver for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Anomalies alerts created for split train.
INFO:absl:Validation complete for split train. Anomalies written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/ExampleValidator/anomalies/4/Split-train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Anomalies alerts created for split eval.
INFO:absl:Validation complete for split eval. Anomalies written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/ExampleValidator/anomalies/4/Split-eval.
INFO:absl:Running publisher for ExampleValidator
INFO:absl:MetadataStore with DB connection initialized

ExampleValidator 完成執行後,我們可以將異常視覺化為表格。

context.show(example_validator.outputs['anomalies'])

在異常表格中,我們可以看見沒有任何異常。這是我們預期的結果,因為這是我們分析的第一個資料集,而且結構描述是針對這個資料集量身打造的。您應該檢查這個結構描述,任何非預期的事物都表示資料中存在異常。檢查完成後,結構描述可用於保護未來的資料,而這裡產生的異常可用於偵錯模型效能、瞭解資料如何隨時間演進,以及找出資料錯誤。

轉換

Transform 元件會針對訓練和服務執行特徵工程。它使用 TensorFlow Transform 程式庫。

Transform 會將來自 ExampleGen 的資料、來自 SchemaGen 的結構描述,以及包含使用者定義 Transform 程式碼的模組做為輸入。

讓我們看看下方使用者定義 Transform 程式碼的範例 (如要瞭解 TensorFlow Transform API 的簡介,請參閱教學課程)。首先,我們定義一些用於特徵工程的常數

_taxi_constants_module_file = 'taxi_constants.py'
%%writefile {_taxi_constants_module_file}

# Categorical features are assumed to each have a maximum value in the dataset.
MAX_CATEGORICAL_FEATURE_VALUES = [24, 31, 12]

CATEGORICAL_FEATURE_KEYS = [
    'trip_start_hour', 'trip_start_day', 'trip_start_month',
    'pickup_census_tract', 'dropoff_census_tract', 'pickup_community_area',
    'dropoff_community_area'
]

DENSE_FLOAT_FEATURE_KEYS = ['trip_miles', 'fare', 'trip_seconds']

# Number of buckets used by tf.transform for encoding each feature.
FEATURE_BUCKET_COUNT = 10

BUCKET_FEATURE_KEYS = [
    'pickup_latitude', 'pickup_longitude', 'dropoff_latitude',
    'dropoff_longitude'
]

# Number of vocabulary terms used for encoding VOCAB_FEATURES by tf.transform
VOCAB_SIZE = 1000

# Count of out-of-vocab buckets in which unrecognized VOCAB_FEATURES are hashed.
OOV_SIZE = 10

VOCAB_FEATURE_KEYS = [
    'payment_type',
    'company',
]

# Keys
LABEL_KEY = 'tips'
FARE_KEY = 'fare'
Writing taxi_constants.py

接下來,我們編寫 preprocessing_fn,它會將原始資料做為輸入,並傳回我們的模型可以據以訓練的轉換後特徵

_taxi_transform_module_file = 'taxi_transform.py'
%%writefile {_taxi_transform_module_file}

import tensorflow as tf
import tensorflow_transform as tft

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_FARE_KEY = taxi_constants.FARE_KEY
_LABEL_KEY = taxi_constants.LABEL_KEY


def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  outputs = {}
  for key in _DENSE_FLOAT_FEATURE_KEYS:
    # If sparse make it dense, setting nan's to 0 or '', and apply zscore.
    outputs[key] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  for key in _VOCAB_FEATURE_KEYS:
    # Build a vocabulary for this feature.
    outputs[key] = tft.compute_and_apply_vocabulary(
        _fill_in_missing(inputs[key]),
        top_k=_VOCAB_SIZE,
        num_oov_buckets=_OOV_SIZE)

  for key in _BUCKET_FEATURE_KEYS:
    outputs[key] = tft.bucketize(
        _fill_in_missing(inputs[key]), _FEATURE_BUCKET_COUNT)

  for key in _CATEGORICAL_FEATURE_KEYS:
    outputs[key] = _fill_in_missing(inputs[key])

  # Was this passenger a big tipper?
  taxi_fare = _fill_in_missing(inputs[_FARE_KEY])
  tips = _fill_in_missing(inputs[_LABEL_KEY])
  outputs[_LABEL_KEY] = tf.where(
      tf.math.is_nan(taxi_fare),
      tf.cast(tf.zeros_like(taxi_fare), tf.int64),
      # Test if the tip was > 20% of the fare.
      tf.cast(
          tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))

  return outputs


def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
      in the second dimension.
  Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
  """
  if not isinstance(x, tf.sparse.SparseTensor):
    return x

  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)
Writing taxi_transform.py

現在,我們將這個特徵工程程式碼傳遞至 Transform 元件並執行它,以轉換您的資料。

transform = tfx.components.Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=os.path.abspath(_taxi_transform_module_file))
context.run(transform)
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_transform.py' (including modules: ['taxi_transform', 'taxi_constants']).
INFO:absl:User module package has hash fingerprint version f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpagxnedeh/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmpmro2o5jq', '--dist-dir', '/tmpfs/tmp/tmpqxoax40b']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'; target user module is 'taxi_transform'.
INFO:absl:Full user module path is 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_transform.py -> build/lib
copying taxi_constants.py -> build/lib
installing to /tmpfs/tmp/tmpmro2o5jq
running install
running install_lib
copying build/lib/taxi_transform.py -> /tmpfs/tmp/tmpmro2o5jq
copying build/lib/taxi_constants.py -> /tmpfs/tmp/tmpmro2o5jq
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmpfs/tmp/tmpmro2o5jq/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmpmro2o5jq/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL
creating '/tmpfs/tmp/tmpqxoax40b/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' and adding '/tmpfs/tmp/tmpmro2o5jq' to it
adding 'taxi_constants.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/METADATA'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/WHEEL'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/top_level.txt'
adding 'tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424.dist-info/RECORD'
removing /tmpfs/tmp/tmpmro2o5jq
INFO:absl:Running executor for Transform
INFO:absl:Analyze the 'train' split and transform all splits when splits_config is not set.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5nkxki37', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'taxi_transform@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl', 'stats_options_updater_fn': None} 'stats_options_updater_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp5jgr4gdg', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmppif0r6r3', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl']
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424-py3-none-any.whl'.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
Installing collected packages: tfx-user-code-Transform
Successfully installed tfx-user-code-Transform-0.0+f78e5f6b4988b5d5289aab277eceaff03bd38343154c2f602e06d95c6acd5424
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary/apply_vocab/text_file_init/InitializeTableFromTextFileV2
WARNING:absl:Tables initialized inside a tf.function  will be re-initialized on every invocation of the function. This  re-initialization can have significant impact on performance. Consider lifting  them out of the graph context using  `tf.init_scope`.: compute_and_apply_vocabulary_1/apply_vocab/text_file_init/InitializeTableFromTextFileV2
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/27474b1a835b4a51961d9ff1455e2a16/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/27474b1a835b4a51961d9ff1455e2a16/fingerprint.pb
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/ff049abe3d52450a806a6fe8069cd573/assets
INFO:absl:Writing fingerprint to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Transform/transform_graph/5/.temp_path/tftransform_tmp/ff049abe3d52450a806a6fe8069cd573/fingerprint.pb
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:If the number of unique tokens is smaller than the provided top_k or approximation error is acceptable, consider using tft.experimental.approximate_vocabulary for a potentially more efficient implementation.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized

讓我們檢查 Transform 的輸出成品。這個元件會產生兩種輸出類型

  • transform_graph 是可以執行前處理作業的圖表 (這個圖表將包含在服務和評估模型中)。
  • transformed_examples 代表前處理後的訓練和評估資料。
transform.outputs
{'transform_graph': OutputChannel(artifact_type=TransformGraph, producer_component_id=Transform, output_key=transform_graph, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'transformed_examples': OutputChannel(artifact_type=Examples, producer_component_id=Transform, output_key=transformed_examples, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'updated_analyzer_cache': OutputChannel(artifact_type=TransformCache, producer_component_id=Transform, output_key=updated_analyzer_cache, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'pre_transform_schema': OutputChannel(artifact_type=Schema, producer_component_id=Transform, output_key=pre_transform_schema, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'pre_transform_stats': OutputChannel(artifact_type=ExampleStatistics, producer_component_id=Transform, output_key=pre_transform_stats, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_schema': OutputChannel(artifact_type=Schema, producer_component_id=Transform, output_key=post_transform_schema, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_stats': OutputChannel(artifact_type=ExampleStatistics, producer_component_id=Transform, output_key=post_transform_stats, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'post_transform_anomalies': OutputChannel(artifact_type=ExampleAnomalies, producer_component_id=Transform, output_key=post_transform_anomalies, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

快速查看 transform_graph 成品。它指向包含三個子目錄的目錄。

train_uri = transform.outputs['transform_graph'].get()[0].uri
os.listdir(train_uri)
['transform_fn', 'metadata', 'transformed_metadata']

transformed_metadata 子目錄包含前處理後資料的結構描述。transform_fn 子目錄包含實際的前處理圖表。metadata 子目錄包含原始資料的結構描述。

我們也可以查看前三個轉換後的範例

# Get the URI of the output artifact representing the transformed examples, which is a directory
train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 3 records and decode them.
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = tf.train.Example()
  example.ParseFromString(serialized_example)
  pp.pprint(example)
features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 8
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.061060599982738495
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: -0.15886740386486053
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: -0.7118487358093262
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 6
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 19
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 5
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 1.2521240711212158
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 60
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.532160758972168
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.5509493350982666
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 2
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 10
      }
    }
  }
}

features {
  feature {
    key: "company"
    value {
      int64_list {
        value: 48
      }
    }
  }
  feature {
    key: "dropoff_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_community_area"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_latitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "dropoff_longitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "fare"
    value {
      float_list {
        value: 0.3873794376850128
      }
    }
  }
  feature {
    key: "payment_type"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_census_tract"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "pickup_community_area"
    value {
      int64_list {
        value: 13
      }
    }
  }
  feature {
    key: "pickup_latitude"
    value {
      int64_list {
        value: 9
      }
    }
  }
  feature {
    key: "pickup_longitude"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "tips"
    value {
      int64_list {
        value: 0
      }
    }
  }
  feature {
    key: "trip_miles"
    value {
      float_list {
        value: 0.21955278515815735
      }
    }
  }
  feature {
    key: "trip_seconds"
    value {
      float_list {
        value: 0.0019067146349698305
      }
    }
  }
  feature {
    key: "trip_start_day"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "trip_start_hour"
    value {
      int64_list {
        value: 12
      }
    }
  }
  feature {
    key: "trip_start_month"
    value {
      int64_list {
        value: 11
      }
    }
  }
}

Transform 元件將您的資料轉換為特徵後,下一個步驟是訓練模型。

Trainer

Trainer 元件將訓練您在 TensorFlow 中定義的模型 (使用 Estimator API 或搭配 model_to_estimator 的 Keras API)。

Trainer 會將來自 SchemaGen 的結構描述、來自 Transform 的轉換後資料和圖表、訓練參數,以及包含使用者定義模型程式碼的模組做為輸入。

讓我們看看下方使用者定義模型程式碼的範例 (如要瞭解 TensorFlow Estimator API 的簡介,請參閱教學課程)

_taxi_trainer_module_file = 'taxi_trainer.py'
%%writefile {_taxi_trainer_module_file}

import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils
from tfx_bsl.tfxio import dataset_options

import taxi_constants

_DENSE_FLOAT_FEATURE_KEYS = taxi_constants.DENSE_FLOAT_FEATURE_KEYS
_VOCAB_FEATURE_KEYS = taxi_constants.VOCAB_FEATURE_KEYS
_VOCAB_SIZE = taxi_constants.VOCAB_SIZE
_OOV_SIZE = taxi_constants.OOV_SIZE
_FEATURE_BUCKET_COUNT = taxi_constants.FEATURE_BUCKET_COUNT
_BUCKET_FEATURE_KEYS = taxi_constants.BUCKET_FEATURE_KEYS
_CATEGORICAL_FEATURE_KEYS = taxi_constants.CATEGORICAL_FEATURE_KEYS
_MAX_CATEGORICAL_FEATURE_VALUES = taxi_constants.MAX_CATEGORICAL_FEATURE_VALUES
_LABEL_KEY = taxi_constants.LABEL_KEY


# Tf.Transform considers these features as "raw"
def _get_raw_feature_spec(schema):
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _build_estimator(config, hidden_units=None, warm_start_from=None):
  """Build an estimator for predicting the tipping behavior of taxi riders.
  Args:
    config: tf.estimator.RunConfig defining the runtime environment for the
      estimator (including model_dir).
    hidden_units: [int], the layer sizes of the DNN (input layer first)
    warm_start_from: Optional directory to warm start from.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  real_valued_columns = [
      tf.feature_column.numeric_column(key, shape=())
      for key in _DENSE_FLOAT_FEATURE_KEYS
  ]
  categorical_columns = [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_VOCAB_SIZE + _OOV_SIZE, default_value=0)
      for key in _VOCAB_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=_FEATURE_BUCKET_COUNT, default_value=0)
      for key in _BUCKET_FEATURE_KEYS
  ]
  categorical_columns += [
      tf.feature_column.categorical_column_with_identity(  # pylint: disable=g-complex-comprehension
          key,
          num_buckets=num_buckets,
          default_value=0) for key, num_buckets in zip(
              _CATEGORICAL_FEATURE_KEYS,
              _MAX_CATEGORICAL_FEATURE_VALUES)
  ]
  return tf.estimator.DNNLinearCombinedClassifier(
      config=config,
      linear_feature_columns=categorical_columns,
      dnn_feature_columns=real_valued_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25],
      warm_start_from=warm_start_from)


def _example_serving_receiver_fn(tf_transform_graph, schema):
  """Build the serving in inputs.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    Tensorflow graph which parses examples, applying tf-transform to them.
  """
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(_LABEL_KEY)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec, default_batch_size=None)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_graph.transform_raw_features(
      serving_input_receiver.features)

  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_graph, schema):
  """Build everything needed for the tf-model-analysis to run the model.
  Args:
    tf_transform_graph: A TFTransformOutput.
    schema: the schema of the input data.
  Returns:
    EvalInputReceiver function, which contains:
      - Tensorflow graph which parses raw untransformed features, applies the
        tf-transform preprocessing operators.
      - Set of raw, untransformed features.
      - Label against which predictions will be compared.
  """
  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_tensor')

  # Add a parse_example operator to the tensorflow graph, which will parse
  # raw, untransformed, tf examples.
  features = tf.io.parse_example(serialized_tf_example, raw_feature_spec)

  # Now that we have our raw examples, process them through the tf-transform
  # function computed during the preprocessing step.
  transformed_features = tf_transform_graph.transform_raw_features(
      features)

  # The key name MUST be 'examples'.
  receiver_tensors = {'examples': serialized_tf_example}

  # NOTE: Model is driven by transformed features (since training works on the
  # materialized output of TFT, but slicing will happen on raw features.
  features.update(transformed_features)

  return tfma.export.EvalInputReceiver(
      features=features,
      receiver_tensors=receiver_tensors,
      labels=transformed_features[_LABEL_KEY])


def _input_fn(file_pattern, data_accessor, tf_transform_output, batch_size=200):
  """Generates features and label for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      dataset_options.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY),
      tf_transform_output.transformed_metadata.schema)


# TFX will call this function
def trainer_fn(trainer_fn_args, schema):
  """Build the estimator using the high level API.
  Args:
    trainer_fn_args: Holds args used to train the model as name/value pairs.
    schema: Holds the schema of the training examples.
  Returns:
    A dict of the following:
      - estimator: The estimator that will be used for training and eval.
      - train_spec: Spec for training.
      - eval_spec: Spec for eval.
      - eval_input_receiver_fn: Input function for eval.
  """
  # Number of nodes in the first layer of the DNN
  first_dnn_layer_size = 100
  num_dnn_layers = 4
  dnn_decay_factor = 0.7

  train_batch_size = 40
  eval_batch_size = 40

  tf_transform_graph = tft.TFTransformOutput(trainer_fn_args.transform_output)

  train_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.train_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=train_batch_size)

  eval_input_fn = lambda: _input_fn(  # pylint: disable=g-long-lambda
      trainer_fn_args.eval_files,
      trainer_fn_args.data_accessor,
      tf_transform_graph,
      batch_size=eval_batch_size)

  train_spec = tf.estimator.TrainSpec(  # pylint: disable=g-long-lambda
      train_input_fn,
      max_steps=trainer_fn_args.train_steps)

  serving_receiver_fn = lambda: _example_serving_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  exporter = tf.estimator.FinalExporter('chicago-taxi', serving_receiver_fn)
  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=trainer_fn_args.eval_steps,
      exporters=[exporter],
      name='chicago-taxi-eval')

  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=999, keep_checkpoint_max=1)

  run_config = run_config.replace(model_dir=trainer_fn_args.serving_model_dir)

  estimator = _build_estimator(
      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
          for i in range(num_dnn_layers)
      ],
      config=run_config,
      warm_start_from=trainer_fn_args.base_model)

  # Create an input receiver for TFMA processing
  receiver_fn = lambda: _eval_input_receiver_fn(  # pylint: disable=g-long-lambda
      tf_transform_graph, schema)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }
Writing taxi_trainer.py

現在,我們將這個模型程式碼傳遞至 Trainer 元件並執行它,以訓練模型。

from tfx.components.trainer.executor import Executor
from tfx.dsl.components.base import executor_spec

trainer = tfx.components.Trainer(
    module_file=os.path.abspath(_taxi_trainer_module_file),
    custom_executor_spec=executor_spec.ExecutorClassSpec(Executor),
    examples=transform.outputs['transformed_examples'],
    schema=schema_gen.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=tfx.proto.TrainArgs(num_steps=10000),
    eval_args=tfx.proto.EvalArgs(num_steps=5000))
context.run(trainer)
WARNING:absl:`custom_executor_spec` is deprecated. Please customize component directly.
INFO:absl:Generating ephemeral wheel package for '/tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py' (including modules: ['taxi_trainer', 'taxi_transform', 'taxi_constants']).
INFO:absl:User module package has hash fingerprint version e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '/tmpfs/tmp/tmpnta9yff_/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmpfs/tmp/tmp5wdb8vq6', '--dist-dir', '/tmpfs/tmp/tmp8v8hc7ie']
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
INFO:absl:Successfully built user code wheel distribution at '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'; target user module is 'taxi_trainer'.
INFO:absl:Full user module path is 'taxi_trainer@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying taxi_trainer.py -> build/lib
copying taxi_transform.py -> build/lib
copying taxi_constants.py -> build/lib
installing to /tmpfs/tmp/tmp5wdb8vq6
running install
running install_lib
copying build/lib/taxi_trainer.py -> /tmpfs/tmp/tmp5wdb8vq6
copying build/lib/taxi_transform.py -> /tmpfs/tmp/tmp5wdb8vq6
copying build/lib/taxi_constants.py -> /tmpfs/tmp/tmp5wdb8vq6
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmpfs/tmp/tmp5wdb8vq6/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3.9.egg-info
running install_scripts
creating /tmpfs/tmp/tmp5wdb8vq6/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL
creating '/tmpfs/tmp/tmp8v8hc7ie/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' and adding '/tmpfs/tmp/tmp5wdb8vq6' to it
adding 'taxi_constants.py'
adding 'taxi_trainer.py'
adding 'taxi_transform.py'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618.dist-info/RECORD'
removing /tmpfs/tmp/tmp5wdb8vq6
INFO:absl:Running executor for Trainer
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
INFO:absl:udf_utils.get_fn {'train_args': '{\n  "num_steps": 10000\n}', 'eval_args': '{\n  "num_steps": 5000\n}', 'module_file': None, 'run_fn': None, 'trainer_fn': None, 'custom_config': 'null', 'module_path': 'taxi_trainer@/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'} 'trainer_fn'
INFO:absl:Installing '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl' to a temporary directory.
INFO:absl:Executing: ['/tmpfs/src/tf_docs_env/bin/python', '-m', 'pip', 'install', '--target', '/tmpfs/tmp/tmp86irddos', '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl']
Processing /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl
INFO:absl:Successfully installed '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/_wheels/tfx_user_code_Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618-py3-none-any.whl'.
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+e337a512821685b6d91445dbd0628b47de0e4c751e9e54edf78bcf0866309618
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:188: TrainSpec.__new__ (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:195: FinalExporter.__init__ (from tensorflow_estimator.python.estimator.exporter) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:196: EvalSpec.__new__ (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:202: RunConfig.__init__ (from tensorflow_estimator.python.estimator.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:41: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:45: categorical_column_with_identity (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:62: DNNLinearCombinedClassifierV2.__init__ (from tensorflow_estimator.python.estimator.canned.dnn_linear_combined) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/head_utils.py:54: BinaryClassHead.__init__ (from tensorflow_estimator.python.estimator.head.binary_class_head) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/dnn_linear_combined.py:586: Estimator.__init__ (from tensorflow_estimator.python.estimator.estimator) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 999, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:absl:Training model.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx/components/trainer/executor.py:270: train_and_evaluate (from tensorflow_estimator.python.estimator.training) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 999 or save_checkpoints_secs None.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:385: StopAtStepHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tfx_bsl/tfxio/tf_example_record.py:343: parse_example_dataset (from tensorflow.python.data.experimental.ops.parsing_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(tf.io.parse_example(...))` instead.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/optimizers/legacy/adagrad.py:93: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/model_fn.py:250: EstimatorSpec.__new__ (from tensorflow_estimator.python.estimator.model_fn) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1416: NanTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1419: LoggingTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py:232: SecondOrStepTimer.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1456: CheckpointSaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Create CheckpointSaverHook.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:579: StepCounterHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:586: SummarySaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2024-05-08 09:29:41.217001: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

    for Tuple type infernce function 0
    while inferring type of node 'dnn/zero_fraction/cond/output/_18'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1455: SessionRunArgs.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1454: SessionRunContext.__init__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1474: SessionRunValues.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:loss = 0.7004041, step = 0
INFO:tensorflow:global_step/sec: 94.9559
INFO:tensorflow:loss = 0.58275115, step = 100 (1.054 sec)
INFO:tensorflow:global_step/sec: 126.379
INFO:tensorflow:loss = 0.5188283, step = 200 (0.791 sec)
INFO:tensorflow:global_step/sec: 127.047
INFO:tensorflow:loss = 0.5207444, step = 300 (0.787 sec)
INFO:tensorflow:global_step/sec: 126.868
INFO:tensorflow:loss = 0.5565426, step = 400 (0.788 sec)
INFO:tensorflow:global_step/sec: 128.906
INFO:tensorflow:loss = 0.44116384, step = 500 (0.776 sec)
INFO:tensorflow:global_step/sec: 127.421
INFO:tensorflow:loss = 0.4577001, step = 600 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.514
INFO:tensorflow:loss = 0.4446908, step = 700 (0.784 sec)
INFO:tensorflow:global_step/sec: 125.617
INFO:tensorflow:loss = 0.4720246, step = 800 (0.796 sec)
INFO:tensorflow:global_step/sec: 131.021
INFO:tensorflow:loss = 0.4437019, step = 900 (0.763 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 999...
INFO:tensorflow:Saving checkpoints for 999 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/saver.py:1067: remove_checkpoint (from tensorflow.python.checkpoint.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 999...
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2024-05-08T09:29:55
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/evaluation.py:260: FinalOpsHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 31.08471s
INFO:tensorflow:Finished evaluation at 2024-05-08-09:30:26
INFO:tensorflow:Saving dict for global step 999: accuracy = 0.771235, accuracy_baseline = 0.771235, auc = 0.9186248, auc_precision_recall = 0.6492264, average_loss = 0.4624323, global_step = 999, label/mean = 0.228765, loss = 0.46243173, precision = 0.0, prediction/mean = 0.24984238, recall = 0.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 999: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-999
INFO:tensorflow:global_step/sec: 2.90748
INFO:tensorflow:loss = 0.42686376, step = 1000 (34.394 sec)
INFO:tensorflow:global_step/sec: 126.964
INFO:tensorflow:loss = 0.44425964, step = 1100 (0.788 sec)
INFO:tensorflow:global_step/sec: 127.535
INFO:tensorflow:loss = 0.47197676, step = 1200 (0.784 sec)
INFO:tensorflow:global_step/sec: 125.471
INFO:tensorflow:loss = 0.506038, step = 1300 (0.797 sec)
INFO:tensorflow:global_step/sec: 127.967
INFO:tensorflow:loss = 0.4099118, step = 1400 (0.781 sec)
INFO:tensorflow:global_step/sec: 126.812
INFO:tensorflow:loss = 0.5171037, step = 1500 (0.789 sec)
INFO:tensorflow:global_step/sec: 131.046
INFO:tensorflow:loss = 0.4371317, step = 1600 (0.763 sec)
INFO:tensorflow:global_step/sec: 128.146
INFO:tensorflow:loss = 0.45776543, step = 1700 (0.780 sec)
INFO:tensorflow:global_step/sec: 128.92
INFO:tensorflow:loss = 0.50659925, step = 1800 (0.776 sec)
INFO:tensorflow:global_step/sec: 129.06
INFO:tensorflow:loss = 0.43417373, step = 1900 (0.775 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 1998...
INFO:tensorflow:Saving checkpoints for 1998 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 1998...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 103.772
INFO:tensorflow:loss = 0.45863923, step = 2000 (0.963 sec)
INFO:tensorflow:global_step/sec: 125.899
INFO:tensorflow:loss = 0.3514557, step = 2100 (0.795 sec)
INFO:tensorflow:global_step/sec: 129.752
INFO:tensorflow:loss = 0.43468484, step = 2200 (0.771 sec)
INFO:tensorflow:global_step/sec: 131.014
INFO:tensorflow:loss = 0.48132992, step = 2300 (0.763 sec)
INFO:tensorflow:global_step/sec: 130.271
INFO:tensorflow:loss = 0.44048753, step = 2400 (0.768 sec)
INFO:tensorflow:global_step/sec: 129.449
INFO:tensorflow:loss = 0.3523005, step = 2500 (0.773 sec)
INFO:tensorflow:global_step/sec: 130.936
INFO:tensorflow:loss = 0.3773502, step = 2600 (0.764 sec)
INFO:tensorflow:global_step/sec: 129.258
INFO:tensorflow:loss = 0.43350023, step = 2700 (0.774 sec)
INFO:tensorflow:global_step/sec: 133.75
INFO:tensorflow:loss = 0.37304792, step = 2800 (0.748 sec)
INFO:tensorflow:global_step/sec: 130.275
INFO:tensorflow:loss = 0.3801176, step = 2900 (0.768 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2997...
INFO:tensorflow:Saving checkpoints for 2997 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2997...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 107.092
INFO:tensorflow:loss = 0.3836586, step = 3000 (0.933 sec)
INFO:tensorflow:global_step/sec: 133.249
INFO:tensorflow:loss = 0.43525982, step = 3100 (0.751 sec)
INFO:tensorflow:global_step/sec: 124.615
INFO:tensorflow:loss = 0.42075485, step = 3200 (0.803 sec)
INFO:tensorflow:global_step/sec: 122.737
INFO:tensorflow:loss = 0.3901537, step = 3300 (0.815 sec)
INFO:tensorflow:global_step/sec: 122.54
INFO:tensorflow:loss = 0.35952353, step = 3400 (0.816 sec)
INFO:tensorflow:global_step/sec: 124.721
INFO:tensorflow:loss = 0.3873772, step = 3500 (0.802 sec)
INFO:tensorflow:global_step/sec: 128.574
INFO:tensorflow:loss = 0.36566123, step = 3600 (0.778 sec)
INFO:tensorflow:global_step/sec: 126.009
INFO:tensorflow:loss = 0.40229043, step = 3700 (0.794 sec)
INFO:tensorflow:global_step/sec: 122.1
INFO:tensorflow:loss = 0.4070228, step = 3800 (0.819 sec)
INFO:tensorflow:global_step/sec: 123.903
INFO:tensorflow:loss = 0.4688112, step = 3900 (0.807 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3996...
INFO:tensorflow:Saving checkpoints for 3996 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3996...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 100.776
INFO:tensorflow:loss = 0.49602365, step = 4000 (0.992 sec)
INFO:tensorflow:global_step/sec: 123.848
INFO:tensorflow:loss = 0.2742646, step = 4100 (0.808 sec)
INFO:tensorflow:global_step/sec: 123.423
INFO:tensorflow:loss = 0.44800407, step = 4200 (0.810 sec)
INFO:tensorflow:global_step/sec: 123.175
INFO:tensorflow:loss = 0.43835735, step = 4300 (0.812 sec)
INFO:tensorflow:global_step/sec: 123.891
INFO:tensorflow:loss = 0.32207388, step = 4400 (0.807 sec)
INFO:tensorflow:global_step/sec: 125.024
INFO:tensorflow:loss = 0.38216084, step = 4500 (0.800 sec)
INFO:tensorflow:global_step/sec: 124.891
INFO:tensorflow:loss = 0.45092455, step = 4600 (0.801 sec)
INFO:tensorflow:global_step/sec: 123.988
INFO:tensorflow:loss = 0.3750969, step = 4700 (0.807 sec)
INFO:tensorflow:global_step/sec: 125.77
INFO:tensorflow:loss = 0.3925597, step = 4800 (0.795 sec)
INFO:tensorflow:global_step/sec: 125.459
INFO:tensorflow:loss = 0.43026057, step = 4900 (0.797 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4995...
INFO:tensorflow:Saving checkpoints for 4995 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4995...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 103.654
INFO:tensorflow:loss = 0.41449785, step = 5000 (0.964 sec)
INFO:tensorflow:global_step/sec: 125.195
INFO:tensorflow:loss = 0.3261056, step = 5100 (0.799 sec)
INFO:tensorflow:global_step/sec: 127.462
INFO:tensorflow:loss = 0.35417694, step = 5200 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.844
INFO:tensorflow:loss = 0.4030676, step = 5300 (0.782 sec)
INFO:tensorflow:global_step/sec: 130.923
INFO:tensorflow:loss = 0.4126954, step = 5400 (0.764 sec)
INFO:tensorflow:global_step/sec: 130.98
INFO:tensorflow:loss = 0.32259554, step = 5500 (0.764 sec)
INFO:tensorflow:global_step/sec: 128.589
INFO:tensorflow:loss = 0.3811575, step = 5600 (0.777 sec)
INFO:tensorflow:global_step/sec: 125.918
INFO:tensorflow:loss = 0.40286702, step = 5700 (0.794 sec)
INFO:tensorflow:global_step/sec: 129.137
INFO:tensorflow:loss = 0.32921094, step = 5800 (0.775 sec)
INFO:tensorflow:global_step/sec: 130.104
INFO:tensorflow:loss = 0.4013093, step = 5900 (0.768 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5994...
INFO:tensorflow:Saving checkpoints for 5994 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5994...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 106.58
INFO:tensorflow:loss = 0.31678432, step = 6000 (0.938 sec)
INFO:tensorflow:global_step/sec: 126.52
INFO:tensorflow:loss = 0.3150363, step = 6100 (0.791 sec)
INFO:tensorflow:global_step/sec: 128.723
INFO:tensorflow:loss = 0.39068645, step = 6200 (0.777 sec)
INFO:tensorflow:global_step/sec: 128.852
INFO:tensorflow:loss = 0.2832807, step = 6300 (0.776 sec)
INFO:tensorflow:global_step/sec: 125.866
INFO:tensorflow:loss = 0.32048258, step = 6400 (0.794 sec)
INFO:tensorflow:global_step/sec: 125.299
INFO:tensorflow:loss = 0.38626963, step = 6500 (0.798 sec)
INFO:tensorflow:global_step/sec: 125.342
INFO:tensorflow:loss = 0.39416704, step = 6600 (0.798 sec)
INFO:tensorflow:global_step/sec: 127.235
INFO:tensorflow:loss = 0.30232263, step = 6700 (0.786 sec)
INFO:tensorflow:global_step/sec: 126.055
INFO:tensorflow:loss = 0.41977397, step = 6800 (0.793 sec)
INFO:tensorflow:global_step/sec: 127.477
INFO:tensorflow:loss = 0.47491065, step = 6900 (0.784 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 6993...
INFO:tensorflow:Saving checkpoints for 6993 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 6993...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 104.669
INFO:tensorflow:loss = 0.35919297, step = 7000 (0.955 sec)
INFO:tensorflow:global_step/sec: 126.032
INFO:tensorflow:loss = 0.42433387, step = 7100 (0.794 sec)
INFO:tensorflow:global_step/sec: 126.079
INFO:tensorflow:loss = 0.3359905, step = 7200 (0.793 sec)
INFO:tensorflow:global_step/sec: 124.834
INFO:tensorflow:loss = 0.4118205, step = 7300 (0.801 sec)
INFO:tensorflow:global_step/sec: 126.048
INFO:tensorflow:loss = 0.3594822, step = 7400 (0.793 sec)
INFO:tensorflow:global_step/sec: 127.316
INFO:tensorflow:loss = 0.3544901, step = 7500 (0.786 sec)
INFO:tensorflow:global_step/sec: 125.922
INFO:tensorflow:loss = 0.3517708, step = 7600 (0.794 sec)
INFO:tensorflow:global_step/sec: 127.473
INFO:tensorflow:loss = 0.32316074, step = 7700 (0.784 sec)
INFO:tensorflow:global_step/sec: 127.602
INFO:tensorflow:loss = 0.28583208, step = 7800 (0.784 sec)
INFO:tensorflow:global_step/sec: 128.23
INFO:tensorflow:loss = 0.379911, step = 7900 (0.780 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 7992...
INFO:tensorflow:Saving checkpoints for 7992 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 7992...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 105.424
INFO:tensorflow:loss = 0.3968008, step = 8000 (0.948 sec)
INFO:tensorflow:global_step/sec: 128.249
INFO:tensorflow:loss = 0.43308416, step = 8100 (0.780 sec)
INFO:tensorflow:global_step/sec: 128.472
INFO:tensorflow:loss = 0.42253828, step = 8200 (0.778 sec)
INFO:tensorflow:global_step/sec: 125.642
INFO:tensorflow:loss = 0.39132017, step = 8300 (0.796 sec)
INFO:tensorflow:global_step/sec: 128.607
INFO:tensorflow:loss = 0.30107036, step = 8400 (0.777 sec)
INFO:tensorflow:global_step/sec: 126.434
INFO:tensorflow:loss = 0.30194753, step = 8500 (0.791 sec)
INFO:tensorflow:global_step/sec: 127.391
INFO:tensorflow:loss = 0.30165237, step = 8600 (0.785 sec)
INFO:tensorflow:global_step/sec: 127.042
INFO:tensorflow:loss = 0.44196972, step = 8700 (0.787 sec)
INFO:tensorflow:global_step/sec: 126.923
INFO:tensorflow:loss = 0.42164555, step = 8800 (0.788 sec)
INFO:tensorflow:global_step/sec: 127.39
INFO:tensorflow:loss = 0.3490799, step = 8900 (0.785 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 8991...
INFO:tensorflow:Saving checkpoints for 8991 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 8991...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:global_step/sec: 105.59
INFO:tensorflow:loss = 0.31310123, step = 9000 (0.947 sec)
INFO:tensorflow:global_step/sec: 124.933
INFO:tensorflow:loss = 0.4325568, step = 9100 (0.801 sec)
INFO:tensorflow:global_step/sec: 127.59
INFO:tensorflow:loss = 0.30360752, step = 9200 (0.784 sec)
INFO:tensorflow:global_step/sec: 130.138
INFO:tensorflow:loss = 0.29442087, step = 9300 (0.768 sec)
INFO:tensorflow:global_step/sec: 130.544
INFO:tensorflow:loss = 0.31136292, step = 9400 (0.766 sec)
INFO:tensorflow:global_step/sec: 130.849
INFO:tensorflow:loss = 0.34016177, step = 9500 (0.764 sec)
INFO:tensorflow:global_step/sec: 129.551
INFO:tensorflow:loss = 0.39522016, step = 9600 (0.772 sec)
INFO:tensorflow:global_step/sec: 130.262
INFO:tensorflow:loss = 0.34697112, step = 9700 (0.768 sec)
INFO:tensorflow:global_step/sec: 129.093
INFO:tensorflow:loss = 0.38748953, step = 9800 (0.775 sec)
INFO:tensorflow:global_step/sec: 131.427
INFO:tensorflow:loss = 0.29107836, step = 9900 (0.761 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 9990...
INFO:tensorflow:Saving checkpoints for 9990 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 9990...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 10000...
INFO:tensorflow:Saving checkpoints for 10000 into /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 10000...
INFO:tensorflow:Skip the current checkpoint eval due to throttle secs (600 secs).
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:absl:Feature company has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature dropoff_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature fare has a shape . Setting to DenseTensor.
INFO:absl:Feature payment_type has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_census_tract has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_community_area has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_latitude has a shape . Setting to DenseTensor.
INFO:absl:Feature pickup_longitude has a shape . Setting to DenseTensor.
INFO:absl:Feature tips has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_miles has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_seconds has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_day has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_hour has a shape . Setting to DenseTensor.
INFO:absl:Feature trip_start_month has a shape . Setting to DenseTensor.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2024-05-08T09:31:40
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [500/5000]
INFO:tensorflow:Evaluation [1000/5000]
INFO:tensorflow:Evaluation [1500/5000]
INFO:tensorflow:Evaluation [2000/5000]
INFO:tensorflow:Evaluation [2500/5000]
INFO:tensorflow:Evaluation [3000/5000]
INFO:tensorflow:Evaluation [3500/5000]
INFO:tensorflow:Evaluation [4000/5000]
INFO:tensorflow:Evaluation [4500/5000]
INFO:tensorflow:Evaluation [5000/5000]
INFO:tensorflow:Inference Time : 30.60751s
INFO:tensorflow:Finished evaluation at 2024-05-08-09:32:11
INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.78671, accuracy_baseline = 0.771205, auc = 0.93178755, auc_precision_recall = 0.69907445, average_loss = 0.34567952, global_step = 10000, label/mean = 0.228795, loss = 0.34567845, precision = 0.7014421, prediction/mean = 0.23119248, recall = 0.117987715
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Performing the final export in the end of training.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
WARNING:tensorflow:From /tmpfs/src/temp/docs/tutorials/tfx/taxi_trainer.py:81: build_parsing_serving_input_receiver_fn (from tensorflow_estimator.python.estimator.export.export) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/export/export.py:312: ServingInputReceiver.__new__ (from tensorflow_estimator.python.estimator.export.export) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/base_head.py:786: ClassificationOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/binary_class_head.py:561: RegressionOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/head/binary_class_head.py:563: PredictOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:168: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://tensorflow.dev.org.tw/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/model_utils/export_utils.py:83: get_tensor_from_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://tensorflow.dev.org.tw/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1715160731/assets
INFO:tensorflow:SavedModel written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/export/chicago-taxi/temp-1715160731/saved_model.pb
INFO:tensorflow:Loss for final step: 0.33868045.
INFO:absl:Training complete. Model written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving. ModelRun written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6
INFO:absl:Exporting eval_savedmodel for TFMA.
INFO:absl:Feature company has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature dropoff_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature fare has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature payment_type has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_census_tract has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_community_area has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_latitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature pickup_longitude has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature tips has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_miles has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_seconds has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_day has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_hour has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_month has no shape. Setting to varlen_sparse_tensor.
INFO:absl:Feature trip_start_timestamp has no shape. Setting to varlen_sparse_tensor.
WARNING:tensorflow:Loading a TF2 SavedModel but eager mode seems disabled.
INFO:tensorflow:struct2tensor is not available.
INFO:tensorflow:tensorflow_decision_forests is not available.
INFO:tensorflow:tensorflow_text is not available.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/saved_model/model_utils/export_utils.py:345: _SupervisedOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.keras instead.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: None
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: ['eval']
WARNING:tensorflow:Export includes no default signature!
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-Serving/model.ckpt-10000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA/temp-1715160733/assets
INFO:tensorflow:SavedModel written to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA/temp-1715160733/saved_model.pb
INFO:absl:Exported eval_savedmodel to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model_run/6/Format-TFMA.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/serving_model_dir/saved_model.pb"
INFO:absl:Serving model copied to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-Serving.
WARNING:absl:Support for estimator-based executor and model export will be deprecated soon. Please use export structure <ModelExportPath>/eval_model_dir/saved_model.pb"
INFO:absl:Eval model copied to: /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA.
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized

使用 TensorBoard 分析訓練

您可以選擇性地將 TensorBoard 連線至 Trainer,以分析模型的訓練曲線。

# Get the URI of the output artifact representing the training logs, which is a directory
model_run_dir = trainer.outputs['model_run'].get()[0].uri

%load_ext tensorboard
%tensorboard --logdir {model_run_dir}

Evaluator

Evaluator 元件會計算評估集中模型的效能指標。它使用 TensorFlow Model Analysis 程式庫。Evaluator 也可以選擇性地驗證新訓練的模型是否優於先前的模型。這在生產環境管線設定中很有用,在這種設定中,您可以每天自動訓練和驗證模型。在這個筆記本中,我們只訓練一個模型,因此 Evaluator 會自動將模型標示為「良好」。

Evaluator 會將來自 ExampleGen 的資料、來自 Trainer 的已訓練模型,以及分層設定做為輸入。分層設定可讓您根據特徵值對指標進行分層 (例如,您的模型在早上 8 點開始的計程車行程與晚上 8 點開始的計程車行程上的效能如何?)。請參閱下方的設定範例

eval_config = tfma.EvalConfig(
    model_specs=[
        # Using signature 'eval' implies the use of an EvalSavedModel. To use
        # a serving model remove the signature to defaults to 'serving_default'
        # and add a label_key.
        tfma.ModelSpec(signature_name='eval')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount')
            ],
            # To add validation thresholds for metrics saved with the model,
            # add them keyed by metric name to the thresholds map.
            thresholds = {
                'accuracy': tfma.MetricThreshold(
                    value_threshold=tfma.GenericValueThreshold(
                        lower_bound={'value': 0.5}),
                    # Change threshold will be ignored if there is no
                    # baseline model resolved from MLMD (first run).
                    change_threshold=tfma.GenericChangeThreshold(
                       direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                       absolute={'value': -1e-10}))
            }
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

接下來,我們將這個設定提供給 Evaluator 並執行它。

# Use TFMA to compute a evaluation statistics over features of a model and
# validate them against a baseline.

# The model resolver is only required if performing model validation in addition
# to evaluation. In this case we validate against the latest blessed model. If
# no model has been blessed before (as in this case) the evaluator will make our
# candidate the first blessed model.
model_resolver = tfx.dsl.Resolver(
      strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
      model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
      model_blessing=tfx.dsl.Channel(
          type=tfx.types.standard_artifacts.ModelBlessing)).with_id(
              'latest_blessed_model_resolver')
context.run(model_resolver)

evaluator = tfx.components.Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    eval_config=eval_config)
context.run(evaluator)
INFO:absl:Running driver for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running publisher for latest_blessed_model_resolver
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running driver for Evaluator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Evaluator
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_eval_shared_model'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Using /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA as  model.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
INFO:absl:The 'example_splits' parameter is not set, using 'eval' split.
INFO:absl:Evaluating model.
INFO:absl:udf_utils.get_fn {'eval_config': '{\n  "metrics_specs": [\n    {\n      "metrics": [\n        {\n          "class_name": "ExampleCount"\n        }\n      ],\n      "thresholds": {\n        "accuracy": {\n          "change_threshold": {\n            "absolute": -1e-10,\n            "direction": "HIGHER_IS_BETTER"\n          },\n          "value_threshold": {\n            "lower_bound": 0.5\n          }\n        }\n      }\n    }\n  ],\n  "model_specs": [\n    {\n      "signature_name": "eval"\n    }\n  ],\n  "slicing_specs": [\n    {},\n    {\n      "feature_keys": [\n        "trip_start_hour"\n      ]\n    }\n  ]\n}', 'feature_slicing_spec': None, 'fairness_indicator_thresholds': 'null', 'example_splits': 'null', 'module_file': None, 'module_path': None} 'custom_extractors'
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}

INFO:absl:eval_shared_models have model_types: {'tfma_eval'}
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  signature_name: "eval"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "trip_start_hour"
}
metrics_specs {
  metrics {
    class_name: "ExampleCount"
  }
  model_names: ""
  thresholds {
    key: "accuracy"
    value {
      value_threshold {
        lower_bound {
          value: 0.5
        }
      }
    }
  }
}
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/eval_saved_model/load.py:163: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Trainer/model/6/Format-TFMA/variables/variables
2024-05-08 09:32:18.153481: W tensorflow/c/c_api.cc:305] Operation '{name:'head/metrics/count_3/Assign' id:1368 op device:{requested: '', assigned: ''} def:{ { {node head/metrics/count_3/Assign} } = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](head/metrics/count_3, head/metrics/count_3/Initializer/zeros)} }' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2024-05-08 09:32:18.323322: W tensorflow/c/c_api.cc:305] Operation '{name:'head/metrics/count_3/Assign' id:1368 op device:{requested: '', assigned: ''} def:{ { {node head/metrics/count_3/Assign} } = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](head/metrics/count_3, head/metrics/count_3/Initializer/zeros)} }' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
INFO:absl:Evaluation complete. Results written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Evaluator/evaluation/8.
INFO:absl:Checking validation results.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_model_analysis/writers/metrics_plots_and_validations_writer.py:112: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
INFO:absl:Blessing result True written to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Evaluator/blessing/8.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized

現在讓我們檢查 Evaluator 的輸出成品。

evaluator.outputs
{'evaluation': OutputChannel(artifact_type=ModelEvaluation, producer_component_id=Evaluator, output_key=evaluation, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False),
 'blessing': OutputChannel(artifact_type=ModelBlessing, producer_component_id=Evaluator, output_key=blessing, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

使用 evaluation 輸出,我們可以顯示整個評估集中全域指標的預設視覺化。

context.show(evaluator.outputs['evaluation'])
SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'Overall', 'metrics':…

如要查看分層評估指標的視覺化,我們可以直接呼叫 TensorFlow Model Analysis 程式庫。

import tensorflow_model_analysis as tfma

# Get the TFMA output result path and load the result.
PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
tfma_result = tfma.load_eval_result(PATH_TO_RESULT)

# Show data sliced along feature column trip_start_hour.
tfma.view.render_slicing_metrics(
    tfma_result, slicing_column='trip_start_hour')
SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'trip_start_hour:19',…

這個視覺化顯示相同的指標,但針對 trip_start_hour 的每個特徵值計算,而不是針對整個評估集計算。

TensorFlow Model Analysis 支援許多其他視覺化,例如 Fairness Indicators 和繪製模型效能的時間序列。如要瞭解詳情,請參閱教學課程

由於我們已將門檻新增至設定,因此驗證輸出也可用。存在 blessing 成品表示我們的模型通過驗證。由於這是第一次執行驗證,因此候選模型會自動通過驗證。

blessing_uri = evaluator.outputs['blessing'].get()[0].uri
!ls -l {blessing_uri}
total 0
-rw-rw-r-- 1 kbuilder kbuilder 0 May  8 09:32 BLESSED

現在也可以透過載入驗證結果記錄來驗證成功

PATH_TO_RESULT = evaluator.outputs['evaluation'].get()[0].uri
print(tfma.load_validation_result(PATH_TO_RESULT))
validation_ok: true
validation_details {
  slicing_details {
    slicing_spec {
    }
    num_matching_slices: 25
  }
}

Pusher

Pusher 元件通常位於 TFX 管線的結尾。它會檢查模型是否通過驗證,如果通過驗證,則將模型匯出至 _serving_model_dir

pusher = tfx.components.Pusher(
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    push_destination=tfx.proto.PushDestination(
        filesystem=tfx.proto.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))
context.run(pusher)
INFO:absl:Running driver for Pusher
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Pusher
INFO:absl:Model version: 1715160747
INFO:absl:Model written to serving path /tmpfs/tmp/tmp15vk_44e/serving_model/taxi_simple/1715160747.
INFO:absl:Model pushed to /tmpfs/tmp/tfx-interactive-2024-05-08T09_28_51.324450-cz1zlfzs/Pusher/pushed_model/9.
INFO:absl:Running publisher for Pusher
INFO:absl:MetadataStore with DB connection initialized

讓我們檢查 Pusher 的輸出成品。

pusher.outputs
{'pushed_model': OutputChannel(artifact_type=PushedModel, producer_component_id=Pusher, output_key=pushed_model, additional_properties={}, additional_custom_properties={}, _input_trigger=None, _is_async=False)}

特別是,Pusher 會以 SavedModel 格式匯出您的模型,如下所示

push_uri = pusher.outputs['pushed_model'].get()[0].uri
model = tf.saved_model.load(push_uri)

for item in model.signatures.items():
  pp.pprint(item)
INFO:absl:Fingerprint not found. Saved model loading will continue.
INFO:absl:path_and_singleprint metric could not be logged. Saved model loading will continue.
('regression',
 <ConcreteFunction () -> Dict[['outputs', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)]] at 0x7F38C020BDF0>)
('predict',
 <ConcreteFunction () -> Dict[['class_ids', TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)], ['classes', TensorSpec(shape=(None, 1), dtype=tf.string, name=None)], ['probabilities', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)], ['logits', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)], ['all_class_ids', TensorSpec(shape=(None, 2), dtype=tf.int32, name=None)], ['logistic', TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)], ['all_classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)]] at 0x7F39100E0A90>)
('serving_default',
 <ConcreteFunction () -> Dict[['scores', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)], ['classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)]] at 0x7F38C0424700>)
('classification',
 <ConcreteFunction () -> Dict[['classes', TensorSpec(shape=(None, 2), dtype=tf.string, name=None)], ['scores', TensorSpec(shape=(None, 2), dtype=tf.float32, name=None)]] at 0x7F38C0620D60>)

我們已完成內建 TFX 元件的導覽!