TensorFlow 基礎知識

在 TensorFlow.org 上查看

在 Google Colab 中執行

在 GitHub 上查看原始碼

下載筆記本

本指南快速總覽了 TensorFlow 基礎知識。本文件的每個章節都概要介紹了一個更大的主題，您可以在每個章節末尾找到完整指南的連結。

TensorFlow 是一個端對端機器學習平台。它支援下列功能：

以多維陣列為基礎的數值運算 (類似 NumPy)。
GPU 和分散式處理
自動微分
模型建構、訓練和匯出
以及更多

張量

TensorFlow 對多維陣列或張量進行運算，這些張量表示為 tf.Tensor 物件。以下是一個二維張量：

import tensorflow as tf

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

print(x)
print(x.shape)
print(x.dtype)

tf.Tensor 最重要的屬性是其 shape 和 dtype：

Tensor.shape：告訴您張量沿著每個軸的大小。
Tensor.dtype：告訴您張量中所有元素的類型。

TensorFlow 實作了張量的標準數學運算，以及許多專為機器學習設計的運算。

例如：

x + x

5 * x

x @ tf.transpose(x)

tf.concat([x, x, x], axis=0)

tf.nn.softmax(x, axis=-1)

tf.reduce_sum(x)

tf.convert_to_tensor([1,2,3])

tf.reduce_sum([1,2,3])

在 CPU 上執行大型計算可能很慢。如果配置正確，TensorFlow 可以使用加速器硬體 (例如 GPU) 非常快速地執行運算。

if tf.config.list_physical_devices('GPU'):
  print("TensorFlow **IS** using the GPU")
else:
  print("TensorFlow **IS NOT** using the GPU")

如需詳細資訊，請參閱張量指南。

變數

一般的 tf.Tensor 物件是不可變的。若要在 TensorFlow 中儲存模型權重 (或其他可變狀態)，請使用 tf.Variable。

var = tf.Variable([0.0, 0.0, 0.0])

var.assign([1, 2, 3])

var.assign_add([1, 1, 1])

如需詳細資訊，請參閱變數指南。

自動微分

梯度下降和相關演算法是現代機器學習的基石。

為了實現這一點，TensorFlow 實作了自動微分 (autodiff)，它使用微積分來計算梯度。通常，您會使用它來計算模型誤差或損失相對於其權重的梯度。

x = tf.Variable(1.0)

def f(x):
  y = x**2 + 2*x - 5
  return y

f(x)

在 x = 1.0 時，y = f(x) = (1**2 + 2*1 - 5) = -2。

y 的導數為 y' = f'(x) = (2*x + 2) = 4。TensorFlow 可以自動計算這個值：

with tf.GradientTape() as tape:
  y = f(x)

g_x = tape.gradient(y, x)  # g(x) = dy/dx

g_x

這個簡化的範例僅取得相對於單一純量 (x) 的導數，但 TensorFlow 可以同時計算相對於任意數量非純量張量的梯度。

如需詳細資訊，請參閱自動微分指南。

圖和 tf.function

雖然您可以像使用任何 Python 程式庫一樣互動式地使用 TensorFlow，但 TensorFlow 也提供以下工具：

效能最佳化：加速訓練和推論。
匯出：以便您在完成訓練後儲存模型。

這些工具要求您使用 tf.function 將純 TensorFlow 程式碼與 Python 分隔開來。

@tf.function
def my_func(x):
  print('Tracing.\n')
  return tf.reduce_sum(x)

第一次執行 tf.function 時，即使它在 Python 中執行，它也會擷取一個完整的最佳化圖，表示函式內完成的 TensorFlow 計算。

x = tf.constant([1, 2, 3])
my_func(x)

在後續呼叫中，TensorFlow 僅執行最佳化圖，跳過任何非 TensorFlow 步驟。在下方，請注意 my_func 不會列印追蹤，因為 print 是 Python 函式，而不是 TensorFlow 函式。

x = tf.constant([10, 9, 8])
my_func(x)

圖可能無法重複用於具有不同簽名 (shape 和 dtype) 的輸入，因此會改為產生新圖：

x = tf.constant([10.0, 9.1, 8.2], dtype=tf.float32)
my_func(x)

擷取的圖提供兩個優點：

在許多情況下，它們可以顯著加速執行 (儘管這個簡單的範例沒有)。
您可以使用 tf.saved_model 匯出這些圖，以便在其他系統 (例如伺服器或行動裝置) 上執行，而無需安裝 Python。

如需更多詳細資訊，請參閱圖簡介。

模組、層和模型

tf.Module 是一個用於管理 tf.Variable 物件和對其進行運算的 tf.function 物件的類別。tf.Module 類別是支援以下兩個重要功能所必需的：

您可以使用 tf.train.Checkpoint 儲存和還原變數的值。這在訓練期間非常有用，因為它可以快速儲存和還原模型的狀態。
您可以使用 tf.saved_model 匯入和匯出 tf.Variable 值和 tf.function 圖。這可讓您獨立於建立模型的 Python 程式執行模型。

以下是一個完整的範例，匯出簡單的 tf.Module 物件：

class MyModule(tf.Module):
  def __init__(self, value):
    self.weight = tf.Variable(value)

  @tf.function
  def multiply(self, x):
    return x * self.weight

mod = MyModule(3)
mod.multiply(tf.constant([1, 2, 3]))

儲存 Module：

save_path = './saved'
tf.saved_model.save(mod, save_path)

產生的 SavedModel 與建立它的程式碼無關。您可以從 Python、其他語言繫結或 TensorFlow Serving 載入 SavedModel。您也可以轉換它以搭配 TensorFlow Lite 或 TensorFlow JS 執行。

reloaded = tf.saved_model.load(save_path)
reloaded.multiply(tf.constant([1, 2, 3]))

tf.keras.layers.Layer 和 tf.keras.Model 類別建立在 tf.Module 之上，為模型建構、訓練和儲存提供額外功能和便利方法。其中一些將在下一節中示範。

如需詳細資訊，請參閱模組簡介。

訓練迴圈

現在將所有這些放在一起，以建構基本模型並從頭開始訓練。

首先，建立一些範例資料。這會產生大致遵循二次曲線的點雲：

import matplotlib
from matplotlib import pyplot as plt

matplotlib.rcParams['figure.figsize'] = [9, 6]

x = tf.linspace(-2, 2, 201)
x = tf.cast(x, tf.float32)

def f(x):
  y = x**2 + 2*x - 5
  return y

y = f(x) + tf.random.normal(shape=[201])

plt.plot(x.numpy(), y.numpy(), '.', label='Data')
plt.plot(x, f(x), label='Ground truth')
plt.legend();

建立具有隨機初始化權重和偏差的二次模型：

class Model(tf.Module):

  def __init__(self):
    # Randomly generate weight and bias terms
    rand_init = tf.random.uniform(shape=[3], minval=0., maxval=5., seed=22)
    # Initialize model parameters
    self.w_q = tf.Variable(rand_init[0])
    self.w_l = tf.Variable(rand_init[1])
    self.b = tf.Variable(rand_init[2])

  @tf.function
  def __call__(self, x):
    # Quadratic Model : quadratic_weight * x^2 + linear_weight * x + bias
    return self.w_q * (x**2) + self.w_l * x + self.b

首先，觀察模型在訓練前的效能：

quad_model = Model()

def plot_preds(x, y, f, model, title):
  plt.figure()
  plt.plot(x, y, '.', label='Data')
  plt.plot(x, f(x), label='Ground truth')
  plt.plot(x, model(x), label='Predictions')
  plt.title(title)
  plt.legend()

plot_preds(x, y, f, quad_model, 'Before training')

現在，為模型定義損失：

鑑於此模型旨在預測連續值，均方誤差 (MSE) 是損失函數的理想選擇。給定預測向量 \(\hat{y}\) 和真實目標向量 \(y\)，MSE 定義為預測值與實際值之間平方差的平均值。

\(MSE = \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i -y_i)^2\)

def mse_loss(y_pred, y):
  return tf.reduce_mean(tf.square(y_pred - y))

為模型編寫基本訓練迴圈。迴圈將使用 MSE 損失函數及其相對於輸入的梯度，以便反覆更新模型的參數。使用迷你批次進行訓練可提供記憶體效率和更快的收斂速度。tf.data.Dataset API 具有用於批次處理和隨機排序的實用函式。

batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=x.shape[0]).batch(batch_size)

# Set training parameters
epochs = 100
learning_rate = 0.01
losses = []

# Format training loop
for epoch in range(epochs):
  for x_batch, y_batch in dataset:
    with tf.GradientTape() as tape:
      batch_loss = mse_loss(quad_model(x_batch), y_batch)
    # Update parameters with respect to the gradient calculations
    grads = tape.gradient(batch_loss, quad_model.variables)
    for g,v in zip(grads, quad_model.variables):
        v.assign_sub(learning_rate*g)
  # Keep track of model loss per epoch
  loss = mse_loss(quad_model(x), y)
  losses.append(loss)
  if epoch % 10 == 0:
    print(f'Mean squared error for step {epoch}: {loss.numpy():0.3f}')

# Plot model results
print("\n")
plt.plot(range(epochs), losses)
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error (MSE)")
plt.title('MSE loss vs training iterations');

現在，觀察模型在訓練後的效能：

plot_preds(x, y, f, quad_model, 'After training')

這正在運作，但請記住，tf.keras 模組中提供了常見訓練公用程式的實作。因此，請考慮先使用這些公用程式，再編寫您自己的公用程式。首先，Model.compile 和 Model.fit 方法為您實作訓練迴圈：

首先，使用 tf.keras.Sequential 在 Keras 中建立循序模型。最簡單的 Keras 層之一是密集層，可以使用 tf.keras.layers.Dense 進行例項化。密集層能夠學習 \(\mathrm{Y} = \mathrm{W}\mathrm{X} + \vec{b}\) 形式的多維線性關係。為了學習 \(w_1x^2 + w_2x + b\) 形式的非線性方程式，密集層的輸入應該是具有 \(x^2\) 和 \(x\) 作為特徵的資料矩陣。lambda 層 tf.keras.layers.Lambda 可用於執行此堆疊轉換。

new_model = tf.keras.Sequential([
    tf.keras.layers.Lambda(lambda x: tf.stack([x, x**2], axis=1)),
    tf.keras.layers.Dense(units=1, kernel_initializer=tf.random.normal)])

new_model.compile(
    loss=tf.keras.losses.MSE,
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01))

history = new_model.fit(x, y,
                        epochs=100,
                        batch_size=32,
                        verbose=0)

new_model.save('./my_new_model')

觀察 Keras 模型在訓練後的效能：

plt.plot(history.history['loss'])
plt.xlabel('Epoch')
plt.ylim([0, max(plt.ylim())])
plt.ylabel('Loss [Mean Squared Error]')
plt.title('Keras training progress');

plot_preds(x, y, f, new_model, 'After Training: Keras')

如需更多詳細資訊，請參閱基本訓練迴圈和 Keras 指南。