使用 RNN

作者： Scott Zhu、Francois Chollet

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

在 keras.io 上檢視

簡介

遞迴神經網路 (RNN) 是一類功能強大的神經網路，適用於時間序列或自然語言等序列資料建模。

從示意圖上看，RNN 層使用 for 迴圈來迭代序列的時間步長，同時維護一個內部狀態，用於編碼迄今為止所見時間步長的相關資訊。

Keras RNN API 的設計重點是

易於使用：內建的 keras.layers.RNN、keras.layers.LSTM、keras.layers.GRU 層可讓您快速建構遞迴模型，而無需做出困難的組態選擇。
易於自訂：您也可以定義自己的 RNN 儲存格層 ( for 迴圈的內部部分) 並具有自訂行為，並將其與一般 keras.layers.RNN 層 ( for 迴圈本身) 搭配使用。這可讓您以彈性方式快速建立不同研究構想的原型，並盡可能減少程式碼。

設定

import numpy as np
import tensorflow as tf
import keras
from keras import layers

2023-11-16 12:10:07.977993: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-16 12:10:07.978039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-16 12:10:07.979464: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

內建 RNN 層：簡單範例

Keras 中有三個內建 RNN 層

keras.layers.SimpleRNN，完全連線的 RNN，其中前一個時間步長的輸出將饋送到下一個時間步長。
keras.layers.GRU，最初在 Cho 等人，2014 中提出。
keras.layers.LSTM，最初在 Hochreiter 和 Schmidhuber，1997 中提出。

在 2015 年初，Keras 首次推出 LSTM 和 GRU 的可重複使用開放原始碼 Python 實作項目。

以下是一個簡單的 Sequential 模型範例，該模型處理整數序列，將每個整數嵌入到 64 維向量中，然後使用 LSTM 層處理向量序列。

model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# Add a LSTM layer with 128 internal units.
model.add(layers.LSTM(128))

# Add a Dense layer with 10 units.
model.add(layers.Dense(10))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 64)          64000     
                                                                 
 lstm (LSTM)                 (None, 128)               98816     
                                                                 
 dense (Dense)               (None, 10)                1290      
                                                                 
=================================================================
Total params: 164106 (641.04 KB)
Trainable params: 164106 (641.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

內建 RNN 支援多項實用功能

透過 dropout 和 recurrent_dropout 引數進行遞迴退出
透過 go_backwards 引數反向處理輸入序列的功能
迴圈展開 (在 CPU 上處理短序列時，可能會大幅加速)，透過 unroll 引數
...等等。

如需更多資訊，請參閱 RNN API 文件。

輸出和狀態

預設情況下，RNN 層的輸出包含每個樣本的單一向量。此向量是 RNN 儲存格輸出，對應於最後一個時間步長，其中包含有關整個輸入序列的資訊。此輸出的形狀為 (batch_size, units)，其中 units 對應於傳遞至層建構函式的 units 引數。

如果您設定 return_sequences=True，RNN 層也可以傳回每個樣本的完整輸出序列 (每個時間步長每個樣本一個向量)。此輸出的形狀為 (batch_size, timesteps, units)。

model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256)
model.add(layers.GRU(256, return_sequences=True))

# The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128)
model.add(layers.SimpleRNN(128))

model.add(layers.Dense(10))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, None, 64)          64000     
                                                                 
 gru (GRU)                   (None, None, 256)         247296    
                                                                 
 simple_rnn (SimpleRNN)      (None, 128)               49280     
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 361866 (1.38 MB)
Trainable params: 361866 (1.38 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

此外，RNN 層可以傳回其最終內部狀態。傳回的狀態可用於稍後恢復 RNN 執行，或用於初始化另一個 RNN。此設定通常用於編碼器-解碼器序列到序列模型中，其中編碼器最終狀態用作解碼器的初始狀態。

若要將 RNN 層設定為傳回其內部狀態，請在建立層時將 return_state 參數設定為 True。請注意，LSTM 有 2 個狀態張量，但 GRU 只有一個。

若要設定層的初始狀態，只需使用額外的關鍵字引數 initial_state 呼叫層即可。請注意，狀態的形狀需要符合層的單位大小，如下例所示。

encoder_vocab = 1000
decoder_vocab = 2000

encoder_input = layers.Input(shape=(None,))
encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=64)(
    encoder_input
)

# Return states in addition to output
output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")(
    encoder_embedded
)
encoder_state = [state_h, state_c]

decoder_input = layers.Input(shape=(None,))
decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=64)(
    decoder_input
)

# Pass the 2 states to a new LSTM layer, as initial state
decoder_output = layers.LSTM(64, name="decoder")(
    decoder_embedded, initial_state=encoder_state
)
output = layers.Dense(10)(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 input_2 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 embedding_2 (Embedding)     (None, None, 64)             64000     ['input_1[0][0]']             
                                                                                                  
 embedding_3 (Embedding)     (None, None, 64)             128000    ['input_2[0][0]']             
                                                                                                  
 encoder (LSTM)              [(None, 64),                 33024     ['embedding_2[0][0]']         
                              (None, 64),                                                         
                              (None, 64)]                                                         
                                                                                                  
 decoder (LSTM)              (None, 64)                   33024     ['embedding_3[0][0]',         
                                                                     'encoder[0][1]',             
                                                                     'encoder[0][2]']             
                                                                                                  
 dense_2 (Dense)             (None, 10)                   650       ['decoder[0][0]']             
                                                                                                  
==================================================================================================
Total params: 258698 (1010.54 KB)
Trainable params: 258698 (1010.54 KB)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________

RNN 層和 RNN 儲存格

除了內建 RNN 層之外，RNN API 也提供儲存格層級 API。RNN 層處理整個輸入序列批次，而 RNN 儲存格僅處理單一時間步長。

儲存格是 RNN 層 for 迴圈的內部。將儲存格包裝在 keras.layers.RNN 層中，可讓您獲得能夠處理序列批次的層，例如 RNN(LSTMCell(10))。

在數學上， RNN(LSTMCell(10)) 產生的結果與 LSTM(10) 相同。事實上，此層在 TF v1.x 中的實作方式只是建立對應的 RNN 儲存格並將其包裝在 RNN 層中。但是，使用內建的 GRU 和 LSTM 層可啟用 CuDNN 的使用，而且您可能會看到更佳的效能。

有三個內建 RNN 儲存格，每個儲存格都對應於相符的 RNN 層。

keras.layers.SimpleRNNCell 對應於 SimpleRNN 層。
keras.layers.GRUCell 對應於 GRU 層。
keras.layers.LSTMCell 對應於 LSTM 層。

儲存格抽象概念與一般 keras.layers.RNN 類別結合，讓您可以非常輕鬆地為您的研究實作自訂 RNN 架構。

跨批次狀態性

在處理非常長的序列 (可能是無限的序列) 時，您可能會想要使用跨批次狀態性模式。

通常，RNN 層的內部狀態會在每次看到新批次時重設 (即，層看到的每個樣本都假定與過去無關)。層只會在處理給定樣本時維護狀態。

但是，如果您有非常長的序列，將其分解為較短的序列，然後循序將這些較短的序列饋送到 RNN 層 (而不重設層的狀態) 會很有用。這樣，即使層一次只看到一個子序列，它也可以保留有關整個序列的資訊。

您可以透過在建構函式中設定 stateful=True 來執行此操作。

如果您有一個序列 s = [t0, t1, ... t1546, t1547]，您可以將其分割為例如

s1 = [t0, t1, ... t100]
s2 = [t101, ... t201]
...
s16 = [t1501, ... t1547]

然後您可以透過以下方式處理它

lstm_layer = layers.LSTM(64, stateful=True)
for s in sub_sequences:
  output = lstm_layer(s)

當您想要清除狀態時，可以使用 layer.reset_states()。

注意：在此設定中，給定批次中的樣本 i 假定為前一個批次中樣本 i 的延續。這表示所有批次都應包含相同數量的樣本 (批次大小)。例如，如果一個批次包含 [sequence_A_from_t0_to_t100, sequence_B_from_t0_to_t100]，則下一個批次應包含 [sequence_A_from_t101_to_t200, sequence_B_from_t101_to_t200]。

以下是一個完整範例

paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)

lstm_layer = layers.LSTM(64, stateful=True)
output = lstm_layer(paragraph1)
output = lstm_layer(paragraph2)
output = lstm_layer(paragraph3)

# reset_states() will reset the cached state to the original initial_state.
# If no initial_state was provided, zero-states will be used by default.
lstm_layer.reset_states()

RNN 狀態重複使用

RNN 層的記錄狀態未包含在 layer.weights() 中。如果您想要重複使用 RNN 層中的狀態，可以透過 layer.states 擷取狀態值，並透過 Keras Functional API (例如 new_layer(inputs, initial_state=layer.states)) 或模型子類別化，將其用作新層的初始狀態。

另請注意，循序模型可能不適用於此案例，因為它僅支援具有單一輸入和輸出的層，初始狀態的額外輸入使其無法在此處使用。

paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)

lstm_layer = layers.LSTM(64, stateful=True)
output = lstm_layer(paragraph1)
output = lstm_layer(paragraph2)

existing_state = lstm_layer.states

new_lstm_layer = layers.LSTM(64)
new_output = new_lstm_layer(paragraph3, initial_state=existing_state)

雙向 RNN

對於時間序列以外的序列 (例如文字)，RNN 模型如果不僅從頭到尾處理序列，而且還向後處理，通常可以表現得更好。例如，若要預測句子中的下一個字詞，通常最好有字詞周圍的上下文，而不僅僅是之前的字詞。

Keras 提供簡單的 API，供您建構此類雙向 RNN： keras.layers.Bidirectional 包裝函式。

model = keras.Sequential()

model.add(
    layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(5, 10))
)
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(10))

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bidirectional (Bidirection  (None, 5, 128)            38400     
 al)                                                             
                                                                 
 bidirectional_1 (Bidirecti  (None, 64)                41216     
 onal)                                                           
                                                                 
 dense_3 (Dense)             (None, 10)                650       
                                                                 
=================================================================
Total params: 80266 (313.54 KB)
Trainable params: 80266 (313.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

在幕後， Bidirectional 將複製傳入的 RNN 層，並翻轉新複製層的 go_backwards 欄位，使其以反向順序處理輸入。

預設情況下， Bidirectional RNN 的輸出將是正向層輸出和反向層輸出的串連。如果您需要不同的合併行為 (例如串連)，請變更 Bidirectional 包裝函式建構函式中的 merge_mode 參數。如需有關 Bidirectional 的更多詳細資訊，請查看 API 文件。

效能最佳化和 CuDNN 核心

在 TensorFlow 2.0 中，內建 LSTM 和 GRU 層已更新為在 GPU 可用時預設利用 CuDNN 核心。透過此變更，先前的 keras.layers.CuDNNLSTM/CuDNNGRU 層已棄用，您可以建構模型，而無需擔心它將在什麼硬體上執行。

由於 CuDNN 核心是根據某些假設建構的，因此這表示如果您變更內建 LSTM 或 GRU 層的預設值，則該層將無法使用 CuDNN 核心。例如：

將 activation 函式從 tanh 變更為其他函式。
將 recurrent_activation 函式從 sigmoid 變更為其他函式。
使用 recurrent_dropout > 0。
將 unroll 設定為 True，這會強制 LSTM/GRU 將內部 tf.while_loop 分解為展開的 for 迴圈。
將 use_bias 設定為 False。
當輸入資料不是嚴格靠右填補時使用遮罩 (如果遮罩對應於嚴格靠右填補的資料，則仍可使用 CuDNN。這是最常見的情況)。

如需限制的詳細清單，請參閱 LSTM 和 GRU 層的文件。

在可用時使用 CuDNN 核心

讓我們建構一個簡單的 LSTM 模型來示範效能差異。

我們將 MNIST 數字的行序列用作輸入序列 (將像素的每一行視為一個時間步長)，並且我們將預測數字的標籤。

batch_size = 64
# Each MNIST image batch is a tensor of shape (batch_size, 28, 28).
# Each input sequence will be of size (28, 28) (height is treated like time).
input_dim = 28

units = 64
output_size = 10  # labels are from 0 to 9


# Build the RNN model
def build_model(allow_cudnn_kernel=True):
    # CuDNN is only available at the layer level, and not at the cell level.
    # This means `LSTM(units)` will use the CuDNN kernel,
    # while RNN(LSTMCell(units)) will run on non-CuDNN kernel.
    if allow_cudnn_kernel:
        # The LSTM layer with default options uses CuDNN.
        lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim))
    else:
        # Wrapping a LSTMCell in a RNN layer will not use CuDNN.
        lstm_layer = keras.layers.RNN(
            keras.layers.LSTMCell(units), input_shape=(None, input_dim)
        )
    model = keras.models.Sequential(
        [
            lstm_layer,
            keras.layers.BatchNormalization(),
            keras.layers.Dense(output_size),
        ]
    )
    return model

讓我們載入 MNIST 資料集

mnist = keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
sample, sample_label = x_train[0], y_train[0]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

讓我們建立模型執行個體並訓練它。

我們選擇 sparse_categorical_crossentropy 作為模型的損失函式。模型的輸出形狀為 [batch_size, 10]。模型的目標是整數向量，每個整數都在 0 到 9 的範圍內。

model = build_model(allow_cudnn_kernel=True)

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)


model.fit(
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1700136618.250305    9824 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
938/938 [==============================] - 7s 5ms/step - loss: 0.9965 - accuracy: 0.6845 - val_loss: 0.5699 - val_accuracy: 0.8181
<keras.src.callbacks.History at 0x7f71d8117c10>

現在，讓我們與不使用 CuDNN 核心的模型進行比較

noncudnn_model = build_model(allow_cudnn_kernel=False)
noncudnn_model.set_weights(model.get_weights())
noncudnn_model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)
noncudnn_model.fit(
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
)

938/938 [==============================] - 20s 20ms/step - loss: 0.4268 - accuracy: 0.8698 - val_loss: 0.3017 - val_accuracy: 0.9145
<keras.src.callbacks.History at 0x7f71d84e5520>

在安裝 NVIDIA GPU 和 CuDNN 的機器上執行時，與使用一般 TensorFlow 核心的模型相比，使用 CuDNN 建構的模型訓練速度快得多。

相同的啟用 CuDNN 模型也可用於在僅限 CPU 的環境中執行推論。以下 tf.device 註解只是強制裝置放置。如果沒有 GPU 可用，模型預設會在 CPU 上執行。

您不再需要擔心您執行的硬體。這不是很酷嗎？

import matplotlib.pyplot as plt

with tf.device("CPU:0"):
    cpu_model = build_model(allow_cudnn_kernel=True)
    cpu_model.set_weights(model.get_weights())
    result = tf.argmax(cpu_model.predict_on_batch(tf.expand_dims(sample, 0)), axis=1)
    print(
        "Predicted result is: %s, target result is: %s" % (result.numpy(), sample_label)
    )
    plt.imshow(sample, cmap=plt.get_cmap("gray"))

Predicted result is: [3], target result is: 5

png

具有清單/字典輸入或巢狀輸入的 RNN

巢狀結構允許實作者在單一時間步長內包含更多資訊。例如，影片幀可以同時具有音訊和影片輸入。在這種情況下，資料形狀可能是

[batch, timestep, {"video": [height, width, channel], "audio": [frequency]}]

在另一個範例中，手寫資料可以同時具有筆目前位置的座標 x 和 y，以及壓力資訊。因此，資料表示法可能是

[batch, timestep, {"location": [x, y], "pressure": [force]}]

以下程式碼提供如何建構接受此類結構化輸入的自訂 RNN 儲存格的範例。

定義支援巢狀輸入/輸出的自訂儲存格

請參閱透過子類別化建立新的層和模型，以瞭解有關編寫您自己的層的詳細資訊。

@keras.saving.register_keras_serializable()
class NestedCell(keras.layers.Layer):
    def __init__(self, unit_1, unit_2, unit_3, **kwargs):
        self.unit_1 = unit_1
        self.unit_2 = unit_2
        self.unit_3 = unit_3
        self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        super().__init__(**kwargs)

    def build(self, input_shapes):
        # expect input_shape to contain 2 items, [(batch, i1), (batch, i2, i3)]
        i1 = input_shapes[0][1]
        i2 = input_shapes[1][1]
        i3 = input_shapes[1][2]

        self.kernel_1 = self.add_weight(
            shape=(i1, self.unit_1), initializer="uniform", name="kernel_1"
        )
        self.kernel_2_3 = self.add_weight(
            shape=(i2, i3, self.unit_2, self.unit_3),
            initializer="uniform",
            name="kernel_2_3",
        )

    def call(self, inputs, states):
        # inputs should be in [(batch, input_1), (batch, input_2, input_3)]
        # state should be in shape [(batch, unit_1), (batch, unit_2, unit_3)]
        input_1, input_2 = tf.nest.flatten(inputs)
        s1, s2 = states

        output_1 = tf.matmul(input_1, self.kernel_1)
        output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3)
        state_1 = s1 + output_1
        state_2_3 = s2 + output_2_3

        output = (output_1, output_2_3)
        new_states = (state_1, state_2_3)

        return output, new_states

    def get_config(self):
        return {"unit_1": self.unit_1, "unit_2": self.unit_2, "unit_3": self.unit_3}

使用巢狀輸入/輸出建構 RNN 模型

讓我們建構一個 Keras 模型，該模型使用 keras.layers.RNN 層和我們剛才定義的自訂儲存格。

unit_1 = 10
unit_2 = 20
unit_3 = 30

i1 = 32
i2 = 64
i3 = 32
batch_size = 64
num_batches = 10
timestep = 50

cell = NestedCell(unit_1, unit_2, unit_3)
rnn = keras.layers.RNN(cell)

input_1 = keras.Input((None, i1))
input_2 = keras.Input((None, i2, i3))

outputs = rnn((input_1, input_2))

model = keras.models.Model([input_1, input_2], outputs)

model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])

使用隨機產生的資料訓練模型

由於此模型沒有合適的候選資料集，因此我們使用隨機 Numpy 資料進行示範。

input_1_data = np.random.random((batch_size * num_batches, timestep, i1))
input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3))
target_1_data = np.random.random((batch_size * num_batches, unit_1))
target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3))
input_data = [input_1_data, input_2_data]
target_data = [target_1_data, target_2_data]

model.fit(input_data, target_data, batch_size=batch_size)

10/10 [==============================] - 1s 27ms/step - loss: 0.7623 - rnn_1_loss: 0.2873 - rnn_1_1_loss: 0.4750 - rnn_1_accuracy: 0.1016 - rnn_1_1_accuracy: 0.0350
<keras.src.callbacks.History at 0x7f734c8e2d30>

使用 Keras keras.layers.RNN 層，您只需要定義序列中個別步驟的數學邏輯，而 keras.layers.RNN 層將為您處理序列迭代。這是快速建立新型 RNN (例如 LSTM 變體) 原型的絕佳方式。

如需更多詳細資訊，請瀏覽 API 文件。