tf.feature_column
s to Keras preprocessing layers" />
![]() |
![]() |
![]() |
![]() |
訓練模型通常會進行一定程度的特徵預處理,尤其是在處理結構化資料時。在 TensorFlow 1 中訓練 tf.estimator.Estimator
時,您通常會使用 tf.feature_column
API 執行特徵預處理。在 TensorFlow 2 中,您可以直接使用 Keras 預處理層執行此操作。
本遷移指南示範了使用 Feature Columns 和預處理層的常見特徵轉換,接著將說明如何使用這兩種 API 訓練完整的模型。
首先,從幾個必要的匯入作業開始
import tensorflow as tf
import tensorflow.compat.v1 as tf1
import math
2024-01-17 02:27:02.309609: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-01-17 02:27:02.309651: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-01-17 02:27:02.311142: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
現在,新增一個公用程式函式來呼叫 Feature Column 以進行示範
def call_feature_columns(feature_columns, inputs):
# This is a convenient way to call a `feature_column` outside of an estimator
# to display its output.
feature_layer = tf1.keras.layers.DenseFeatures(feature_columns)
return feature_layer(inputs)
輸入處理
若要搭配估算器使用 Feature Columns,模型輸入一律應為張量字典
input_dict = {
'foo': tf.constant([1]),
'bar': tf.constant([0]),
'baz': tf.constant([-1])
}
每個 Feature Column 都需要使用鍵來建立索引,以便索引至來源資料。所有 Feature Column 的輸出會串連在一起,並由估算器模型使用。
columns = [
tf1.feature_column.numeric_column('foo'),
tf1.feature_column.numeric_column('bar'),
tf1.feature_column.numeric_column('baz'),
]
call_feature_columns(columns, input_dict)
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3124623333.py:2: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[ 0., -1., 1.]], dtype=float32)>
在 Keras 中,模型輸入更加彈性。tf.keras.Model
可以處理單一張量輸入、張量特徵清單或張量特徵字典。您可以藉由在模型建立時傳遞 tf.keras.Input
字典來處理字典輸入。輸入不會自動串連,因此可以更彈性地使用。您可以使用 tf.keras.layers.Concatenate
將它們串連在一起。
inputs = {
'foo': tf.keras.Input(shape=()),
'bar': tf.keras.Input(shape=()),
'baz': tf.keras.Input(shape=()),
}
# Inputs are typically transformed by preprocessing layers before concatenation.
outputs = tf.keras.layers.Concatenate()(inputs.values())
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model(input_dict)
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([ 1., 0., -1.], dtype=float32)>
One-Hot 編碼整數 ID
常見的特徵轉換是針對已知範圍的整數輸入進行 One-Hot 編碼。以下是使用 Feature Columns 的範例
categorical_col = tf1.feature_column.categorical_column_with_identity(
'type', num_buckets=3)
indicator_col = tf1.feature_column.indicator_column(categorical_col)
call_feature_columns(indicator_col, {'type': [0, 1, 2]})
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/1369923821.py:1: categorical_column_with_identity (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/1369923821.py:3: indicator_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. <tf.Tensor: shape=(3, 3), dtype=float32, numpy= array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], dtype=float32)>
使用 Keras 預處理層,這些欄可以由單一 tf.keras.layers.CategoryEncoding
層取代,且 output_mode
設定為 'one_hot'
one_hot_layer = tf.keras.layers.CategoryEncoding(
num_tokens=3, output_mode='one_hot')
one_hot_layer([0, 1, 2])
<tf.Tensor: shape=(3, 3), dtype=float32, numpy= array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], dtype=float32)>
正規化數值特徵
處理具有 Feature Columns 的連續浮點特徵時,您需要使用 tf.feature_column.numeric_column
。如果輸入已正規化,則轉換為 Keras 非常簡單。您可以直接在模型中使用 tf.keras.Input
,如上所示。
numeric_column
也可用於正規化輸入
def normalize(x):
mean, variance = (2.0, 1.0)
return (x - mean) / math.sqrt(variance)
numeric_col = tf1.feature_column.numeric_column('col', normalizer_fn=normalize)
call_feature_columns(numeric_col, {'col': tf.constant([[0.], [1.], [2.]])})
<tf.Tensor: shape=(3, 1), dtype=float32, numpy= array([[-2.], [-1.], [ 0.]], dtype=float32)>
相較之下,使用 Keras 時,可以使用 tf.keras.layers.Normalization
完成此正規化。
normalization_layer = tf.keras.layers.Normalization(mean=2.0, variance=1.0)
normalization_layer(tf.constant([[0.], [1.], [2.]]))
<tf.Tensor: shape=(3, 1), dtype=float32, numpy= array([[-2.], [-1.], [ 0.]], dtype=float32)>
分桶和 One-Hot 編碼數值特徵
連續浮點輸入的另一個常見轉換是分桶,然後轉換為固定範圍的整數。
在 Feature Columns 中,可以使用 tf.feature_column.bucketized_column
達成此目的
numeric_col = tf1.feature_column.numeric_column('col')
bucketized_col = tf1.feature_column.bucketized_column(numeric_col, [1, 4, 5])
call_feature_columns(bucketized_col, {'col': tf.constant([1., 2., 3., 4., 5.])})
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3043215186.py:2: bucketized_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. <tf.Tensor: shape=(5, 4), dtype=float32, numpy= array([[0., 1., 0., 0.], [0., 1., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]], dtype=float32)>
在 Keras 中,可以使用 tf.keras.layers.Discretization
取代
discretization_layer = tf.keras.layers.Discretization(bin_boundaries=[1, 4, 5])
one_hot_layer = tf.keras.layers.CategoryEncoding(
num_tokens=4, output_mode='one_hot')
one_hot_layer(discretization_layer([1., 2., 3., 4., 5.]))
<tf.Tensor: shape=(5, 4), dtype=float32, numpy= array([[0., 1., 0., 0.], [0., 1., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]], dtype=float32)>
使用詞彙表進行 One-Hot 編碼字串資料
處理字串特徵通常需要詞彙表查詢,才能將字串轉換為索引。以下是使用 Feature Columns 查詢字串,然後對索引進行 One-Hot 編碼的範例
vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
'sizes',
vocabulary_list=['small', 'medium', 'large'],
num_oov_buckets=0)
indicator_col = tf1.feature_column.indicator_column(vocab_col)
call_feature_columns(indicator_col, {'sizes': ['small', 'medium', 'large']})
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/2845961037.py:1: categorical_column_with_vocabulary_list (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. <tf.Tensor: shape=(3, 3), dtype=float32, numpy= array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], dtype=float32)>
使用 Keras 預處理層,請使用 tf.keras.layers.StringLookup
層,且 output_mode
設定為 'one_hot'
string_lookup_layer = tf.keras.layers.StringLookup(
vocabulary=['small', 'medium', 'large'],
num_oov_indices=0,
output_mode='one_hot')
string_lookup_layer(['small', 'medium', 'large'])
<tf.Tensor: shape=(3, 3), dtype=float32, numpy= array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], dtype=float32)>
使用詞彙表嵌入字串資料
對於較大的詞彙表,通常需要嵌入才能獲得良好的效能。以下是使用 Feature Columns 嵌入字串特徵的範例
vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
'col',
vocabulary_list=['small', 'medium', 'large'],
num_oov_buckets=0)
embedding_col = tf1.feature_column.embedding_column(vocab_col, 4)
call_feature_columns(embedding_col, {'col': ['small', 'medium', 'large']})
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/999372599.py:5: embedding_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. <tf.Tensor: shape=(3, 4), dtype=float32, numpy= array([[ 0.19490445, -0.41044798, -0.5603343 , 0.32616043], [-0.03856034, 0.45223498, 0.15508214, 0.02957107], [-0.20157112, 0.27702358, -0.079571 , -0.6845111 ]], dtype=float32)>
使用 Keras 預處理層,可以結合 tf.keras.layers.StringLookup
層和 tf.keras.layers.Embedding
層來達成此目的。StringLookup
的預設輸出會是整數索引,可以直接饋送至嵌入。
string_lookup_layer = tf.keras.layers.StringLookup(
vocabulary=['small', 'medium', 'large'], num_oov_indices=0)
embedding = tf.keras.layers.Embedding(3, 4)
embedding(string_lookup_layer(['small', 'medium', 'large']))
<tf.Tensor: shape=(3, 4), dtype=float32, numpy= array([[-0.01309206, -0.01742087, 0.04137394, 0.03209827], [ 0.009628 , 0.03649005, 0.0101964 , -0.02682712], [ 0.02947212, 0.04911048, 0.00290644, -0.00840418]], dtype=float32)>
加總加權類別資料
在某些情況下,您需要處理類別資料,其中每個類別的出現都帶有關聯的權重。在 Feature Columns 中,可以使用 tf.feature_column.weighted_categorical_column
處理。與 indicator_column
配對時,這會產生加總每個類別權重的效果。
ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])
categorical_col = tf1.feature_column.categorical_column_with_identity(
'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
categorical_col, 'weights')
indicator_col = tf1.feature_column.indicator_column(weighted_categorical_col)
call_feature_columns(indicator_col, {'ids': ids, 'weights': weights})
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/3529191023.py:6: weighted_categorical_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4033: sparse_merge (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: No similar op available at this time. <tf.Tensor: shape=(1, 20), dtype=float32, numpy= array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. , 0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>
在 Keras 中,可以藉由將 count_weights
輸入傳遞至 tf.keras.layers.CategoryEncoding
(且 output_mode='count'
) 來完成此操作。
ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])
# Using sparse output is more efficient when `num_tokens` is large.
count_layer = tf.keras.layers.CategoryEncoding(
num_tokens=20, output_mode='count', sparse=True)
tf.sparse.to_dense(count_layer(ids, count_weights=weights))
<tf.Tensor: shape=(1, 20), dtype=float32, numpy= array([[0. , 0. , 0. , 0. , 0. , 1.2, 0. , 0. , 0. , 0. , 0. , 1.5, 0. , 0. , 0. , 0. , 0. , 2. , 0. , 0. ]], dtype=float32)>
嵌入加權類別資料
您可能想要改為嵌入加權類別輸入。在 Feature Columns 中,embedding_column
包含 combiner
引數。如果任何樣本包含類別的多個項目,則會根據引數設定 (預設為 'mean'
) 組合這些項目。
ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])
categorical_col = tf1.feature_column.categorical_column_with_identity(
'ids', num_buckets=20)
weighted_categorical_col = tf1.feature_column.weighted_categorical_column(
categorical_col, 'weights')
embedding_col = tf1.feature_column.embedding_column(
weighted_categorical_col, 4, combiner='mean')
call_feature_columns(embedding_col, {'ids': ids, 'weights': weights})
<tf.Tensor: shape=(1, 4), dtype=float32, numpy= array([[ 0.3714184 , -0.01248874, -0.3219157 , -0.39734876]], dtype=float32)>
在 Keras 中,tf.keras.layers.Embedding
沒有 combiner
選項,但您可以使用 tf.keras.layers.Dense
達成相同的效果。上述 embedding_column
只會根據類別權重線性組合嵌入向量。雖然一開始不明顯,但這與將您的類別輸入表示為大小為 (num_tokens)
的稀疏權重向量,並將它們乘以形狀為 (embedding_size, num_tokens)
的 Dense
核心完全相同。
ids = tf.constant([[5, 11, 5, 17, 17]])
weights = tf.constant([[0.5, 1.5, 0.7, 1.8, 0.2]])
# For `combiner='mean'`, normalize your weights to sum to 1. Removing this line
# would be equivalent to an `embedding_column` with `combiner='sum'`.
weights = weights / tf.reduce_sum(weights, axis=-1, keepdims=True)
count_layer = tf.keras.layers.CategoryEncoding(
num_tokens=20, output_mode='count', sparse=True)
embedding_layer = tf.keras.layers.Dense(4, use_bias=False)
embedding_layer(count_layer(ids, count_weights=weights))
<tf.Tensor: shape=(1, 4), dtype=float32, numpy= array([[-0.24909994, -0.02287644, 0.06182466, -0.11370523]], dtype=float32)>
完整訓練範例
為了展示完整的訓練工作流程,首先準備一些具有三種不同類型特徵的資料
features = {
'type': [0, 1, 1],
'size': ['small', 'small', 'medium'],
'weight': [2.7, 1.8, 1.6],
}
labels = [1, 1, 0]
predict_features = {'type': [0], 'size': ['foo'], 'weight': [-0.7]}
定義 TensorFlow 1 和 TensorFlow 2 工作流程的一些通用常數
vocab = ['small', 'medium', 'large']
one_hot_dims = 3
embedding_dims = 4
weight_mean = 2.0
weight_variance = 1.0
使用 Feature Columns
Feature Columns 必須在建立時以清單形式傳遞至估算器,並會在訓練期間隱含地呼叫。
categorical_col = tf1.feature_column.categorical_column_with_identity(
'type', num_buckets=one_hot_dims)
# Convert index to one-hot; e.g., [2] -> [0,0,1].
indicator_col = tf1.feature_column.indicator_column(categorical_col)
# Convert strings to indices; e.g., ['small'] -> [1].
vocab_col = tf1.feature_column.categorical_column_with_vocabulary_list(
'size', vocabulary_list=vocab, num_oov_buckets=1)
# Embed the indices.
embedding_col = tf1.feature_column.embedding_column(vocab_col, embedding_dims)
normalizer_fn = lambda x: (x - weight_mean) / math.sqrt(weight_variance)
# Normalize the numeric inputs; e.g., [2.0] -> [0.0].
numeric_col = tf1.feature_column.numeric_column(
'weight', normalizer_fn=normalizer_fn)
estimator = tf1.estimator.DNNClassifier(
feature_columns=[indicator_col, embedding_col, numeric_col],
hidden_units=[1])
def _input_fn():
return tf1.data.Dataset.from_tensor_slices((features, labels)).batch(1)
estimator.train(_input_fn)
WARNING:tensorflow:From /tmpfs/tmp/ipykernel_19805/1997355744.py:17: DNNClassifier.__init__ (from tensorflow_estimator.python.estimator.canned.dnn) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py:807: Estimator.__init__ (from tensorflow_estimator.python.estimator.estimator) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1844: RunConfig.__init__ (from tensorflow_estimator.python.estimator.run_config) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:Using default config. WARNING:tensorflow:Using temporary folder as model directory: /tmpfs/tmp/tmpm02fidwe INFO:tensorflow:Using config: {'_model_dir': '/tmpfs/tmp/tmpm02fidwe', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} INFO:tensorflow:Calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/dnn.py:446: dnn_logit_fn_builder (from tensorflow_estimator.python.estimator.canned.dnn) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/adagrad.py:138: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/model_fn.py:250: EstimatorSpec.__new__ (from tensorflow_estimator.python.estimator.model_fn) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:Done calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1416: NanTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1419: LoggingTensorHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/basic_session_run_hooks.py:232: SecondOrStepTimer.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/estimator.py:1456: CheckpointSaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:Create CheckpointSaverHook. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:579: StepCounterHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:586: SummarySaverHook.__init__ (from tensorflow.python.training.basic_session_run_hooks) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. 2024-01-17 02:27:09.322220: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_INT64 } } } is neither a subtype nor a supertype of the combined inputs preceding it: type_id: TFT_OPTIONAL args { type_id: TFT_PRODUCT args { type_id: TFT_TENSOR args { type_id: TFT_INT32 } } } for Tuple type infernce function 0 while inferring type of node 'dnn/zero_fraction/cond/output/_18' INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0... INFO:tensorflow:Saving checkpoints for 0 into /tmpfs/tmp/tmpm02fidwe/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0... WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1455: SessionRunArgs.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1454: SessionRunContext.__init__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/training/monitored_session.py:1474: SessionRunValues.__new__ (from tensorflow.python.training.session_run_hook) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:loss = 0.6289518, step = 0 INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3... INFO:tensorflow:Saving checkpoints for 3 into /tmpfs/tmp/tmpm02fidwe/model.ckpt. INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3... INFO:tensorflow:Loss for final step: 0.81661654. <tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f4edc48e8b0>
在模型上執行推論時,Feature Columns 也會用於轉換輸入資料。
def _predict_fn():
return tf1.data.Dataset.from_tensor_slices(predict_features).batch(1)
next(estimator.predict(_predict_fn))
INFO:tensorflow:Calling model_fn. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/head.py:596: ClassificationOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/head.py:1307: RegressionOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow_estimator/python/estimator/canned/head.py:1309: PredictOutput.__init__ (from tensorflow.python.saved_model.model_utils.export_output) is deprecated and will be removed in a future version. Instructions for updating: Use tf.keras instead. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from /tmpfs/tmp/tmpm02fidwe/model.ckpt-3 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. {'logits': array([0.57964283], dtype=float32), 'logistic': array([0.6409852], dtype=float32), 'probabilities': array([0.35901475, 0.6409852 ], dtype=float32), 'class_ids': array([1]), 'classes': array([b'1'], dtype=object), 'all_class_ids': array([0, 1], dtype=int32), 'all_classes': array([b'0', b'1'], dtype=object)}
使用 Keras 預處理層
Keras 預處理層在呼叫位置方面更具彈性。層可以直接套用至張量、在 tf.data
輸入管線內部使用,或直接建置到可訓練的 Keras 模型中。
在本範例中,您將在 tf.data
輸入管線內套用預處理層。若要執行此操作,您可以定義個別的 tf.keras.Model
來預處理您的輸入特徵。此模型不可訓練,但它是將預處理層分組的便利方式。
inputs = {
'type': tf.keras.Input(shape=(), dtype='int64'),
'size': tf.keras.Input(shape=(), dtype='string'),
'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Convert index to one-hot; e.g., [2] -> [0,0,1].
type_output = tf.keras.layers.CategoryEncoding(
one_hot_dims, output_mode='one_hot')(inputs['type'])
# Convert size strings to indices; e.g., ['small'] -> [1].
size_output = tf.keras.layers.StringLookup(vocabulary=vocab)(inputs['size'])
# Normalize the numeric inputs; e.g., [2.0] -> [0.0].
weight_output = tf.keras.layers.Normalization(
axis=None, mean=weight_mean, variance=weight_variance)(inputs['weight'])
outputs = {
'type': type_output,
'size': size_output,
'weight': weight_output,
}
preprocessing_model = tf.keras.Model(inputs, outputs)
您現在可以在呼叫 tf.data.Dataset.map
時套用此模型。請注意,傳遞至 map
的函式會自動轉換為 tf.function
,且適用於撰寫 tf.function
程式碼的常見注意事項 (無副作用)。
# Apply the preprocessing in tf.data.Dataset.map.
dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)
dataset = dataset.map(lambda x, y: (preprocessing_model(x), y),
num_parallel_calls=tf.data.AUTOTUNE)
# Display a preprocessed input sample.
next(dataset.take(1).as_numpy_iterator())
({'type': array([[1., 0., 0.]], dtype=float32), 'size': array([1]), 'weight': array([0.70000005], dtype=float32)}, array([1], dtype=int32))
接下來,您可以定義個別的 Model
,其中包含可訓練的層。請注意,此模型的輸入現在如何反映預處理的特徵類型和形狀。
inputs = {
'type': tf.keras.Input(shape=(one_hot_dims,), dtype='float32'),
'size': tf.keras.Input(shape=(), dtype='int64'),
'weight': tf.keras.Input(shape=(), dtype='float32'),
}
# Since the embedding is trainable, it needs to be part of the training model.
embedding = tf.keras.layers.Embedding(len(vocab), embedding_dims)
outputs = tf.keras.layers.Concatenate()([
inputs['type'],
embedding(inputs['size']),
tf.expand_dims(inputs['weight'], -1),
])
outputs = tf.keras.layers.Dense(1)(outputs)
training_model = tf.keras.Model(inputs, outputs)
您現在可以使用 tf.keras.Model.fit
訓練 training_model
。
# Train on the preprocessed data.
training_model.compile(
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True))
training_model.fit(dataset)
3/3 [==============================] - 1s 5ms/step - loss: 0.8194 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1705458433.603835 19973 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. <keras.src.callbacks.History at 0x7f4e700e7a30>
最後,在推論時,將這些個別階段組合成單一模型來處理原始特徵輸入可能會很有用。
inputs = preprocessing_model.input
outputs = training_model(preprocessing_model(inputs))
inference_model = tf.keras.Model(inputs, outputs)
predict_dataset = tf.data.Dataset.from_tensor_slices(predict_features).batch(1)
inference_model.predict(predict_dataset)
1/1 [==============================] - 0s 102ms/step array([[-0.95717776]], dtype=float32)
這個組合模型可以儲存為 .keras
檔案以供日後使用。
inference_model.save('model.keras')
restored_model = tf.keras.models.load_model('model.keras')
restored_model.predict(predict_dataset)
1/1 [==============================] - 0s 79ms/step array([[-0.95717776]], dtype=float32)
Feature Column 等效表
為了方便參考,以下是 Feature Columns 和 Keras 預處理層之間的大致對應關係
* output_mode
可以傳遞至 tf.keras.layers.CategoryEncoding
、tf.keras.layers.StringLookup
、tf.keras.layers.IntegerLookup
和 tf.keras.layers.TextVectorization
。
† tf.keras.layers.TextVectorization
可以直接處理自由形式的文字輸入 (例如,整個句子或段落)。這並非 TensorFlow 1 中類別序列處理的一對一替代方案,但可以為臨時文字預處理提供便利的替代方案。
後續步驟
- 如需 Keras 預處理層的詳細資訊,請前往使用預處理層指南。
- 如需將預處理層套用至結構化資料的更深入範例,請參閱使用 Keras 預處理層分類結構化資料教學課程。