Evaluator TFX 管線組件

Evaluator TFX 管線組件會對您的模型訓練結果執行深入分析，協助您瞭解模型在資料子集上的效能。評估器也能協助您驗證匯出的模型，確保它們「夠好」能夠推送至生產環境。

啟用驗證時，評估器會將新模型與基準模型 (例如目前服務中的模型) 進行比較，以判斷新模型相對於基準模型是否「夠好」。它會透過在評估資料集上評估這兩個模型，並計算它們在指標 (例如 AUC、loss) 上的效能來完成此操作。如果新模型的指標符合開發人員指定的相對於基準模型的標準 (例如 AUC 不會更低)，則模型會被「祝福」(標記為良好)，向Pusher指示可以將模型推送至生產環境。

輸入
- 來自 Examples 的評估分割
- 來自 Trainer 的已訓練模型
- 先前已祝福的模型 (如果執行驗證)
輸出
- 至 ML Metadata 的分析結果
- 至 ML Metadata 的驗證結果 (如果執行驗證)

Evaluator 與 TensorFlow 模型分析

Evaluator 運用 TensorFlow 模型分析函式庫執行分析，而後者又使用 Apache Beam 進行可擴充的處理。

使用 Evaluator 組件

Evaluator 管線組件通常非常容易部署，而且只需要少量自訂，因為大部分工作都由 Evaluator TFX 組件完成。

設定評估器需要以下資訊

要設定的指標 (只有在新增模型儲存的指標以外的其他指標時才需要)。請參閱 Tensorflow 模型分析指標以取得更多資訊。
要設定的切片 (如果未提供任何切片，則預設會新增「整體」切片)。請參閱 Tensorflow 模型分析設定以取得更多資訊。

如果要包含驗證，則需要以下額外資訊

要與哪個模型比較 (最新祝福模型等)。
要驗證的模型驗證 (閾值)。請參閱 Tensorflow 模型分析模型驗證以取得更多資訊。

啟用時，將會針對所有已定義的指標和切片執行驗證。

典型的程式碼看起來像這樣

import tensorflow_model_analysis as tfma
...

# For TFMA evaluation

eval_config = tfma.EvalConfig(
    model_specs=[
        # This assumes a serving model with signature 'serving_default'. If
        # using estimator based EvalSavedModel, add signature_name='eval' and
        # remove the label_key. Note, if using a TFLite model, then you must set
        # model_type='tf_lite'.
        tfma.ModelSpec(label_key='<label_key>')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount'),
                tfma.MetricConfig(
                    class_name='BinaryAccuracy',
                    threshold=tfma.MetricThreshold(
                        value_threshold=tfma.GenericValueThreshold(
                            lower_bound={'value': 0.5}),
                        change_threshold=tfma.GenericChangeThreshold(
                            direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                            absolute={'value': -1e-10})))
            ]
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

# The following component is experimental and may change in the future. This is
# required to specify the latest blessed model will be used as the baseline.
model_resolver = Resolver(
      strategy_class=dsl.experimental.LatestBlessedModelStrategy,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing)
).with_id('latest_blessed_model_resolver')

model_analyzer = Evaluator(
      examples=examples_gen.outputs['examples'],
      model=trainer.outputs['model'],
      baseline_model=model_resolver.outputs['model'],
      # Change threshold will be ignored if there is no baseline (first run).
      eval_config=eval_config)

評估器會產生 EvalResult (如果使用了驗證，則選擇性地產生 ValidationResult)，可以使用 TFMA 載入。以下是如何將結果載入 Jupyter Notebook 的範例

import tensorflow_model_analysis as tfma

output_path = evaluator.outputs['evaluation'].get()[0].uri

# Load the evaluation results.
eval_result = tfma.load_eval_result(output_path)

# Visualize the metrics and plots using tfma.view.render_slicing_metrics,
# tfma.view.render_plot, etc.
tfma.view.render_slicing_metrics(tfma_result)
...

# Load the validation results
validation_result = tfma.load_validation_result(output_path)
if not validation_result.validation_ok:
  ...

更多詳細資訊請參閱Evaluator API 參考文件。