使用 GPU

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

下載筆記本

TensorFlow 程式碼和 tf.keras 模型將在單一 GPU 上透明執行，無需變更任何程式碼。

在單一或多部機器上的多個 GPU 上執行的最簡單方法是使用分散式策略。

本指南適用於已嘗試過這些方法，但發現他們需要精細控制 TensorFlow 如何使用 GPU 的使用者。若要瞭解如何在單一和多 GPU 情境中偵錯效能問題，請參閱最佳化 TensorFlow GPU 效能指南。

設定

確保您已安裝最新的 TensorFlow gpu 版本。

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

總覽

TensorFlow 支援在各種裝置類型上執行運算，包括 CPU 和 GPU。它們以字串識別碼表示，例如

"/device:CPU:0"：您機器的 CPU。
"/GPU:0"：您機器上 TensorFlow 可見的第一個 GPU 的簡寫符號。
"/job:localhost/replica:0/task:0/device:GPU:1"：您機器上 TensorFlow 可見的第二個 GPU 的完整名稱。

如果 TensorFlow 運算同時具有 CPU 和 GPU 實作，則預設情況下，在指派運算時，GPU 裝置會優先處理。例如，tf.matmul 同時具有 CPU 和 GPU 核心，且在具有 CPU:0 和 GPU:0 裝置的系統上，除非您明確要求在另一個裝置上執行，否則會選取 GPU:0 裝置來執行 tf.matmul。

如果 TensorFlow 運算沒有對應的 GPU 實作，則運算會回退到 CPU 裝置。例如，由於 tf.cast 僅具有 CPU 核心，因此在具有 CPU:0 和 GPU:0 裝置的系統上，即使要求在 GPU:0 裝置上執行，也會選取 CPU:0 裝置來執行 tf.cast。

記錄裝置放置

若要找出您的運算和張量指派給哪個裝置，請將 tf.debugging.set_log_device_placement(True) 放在您程式碼的第一個陳述式。啟用裝置放置記錄會導致列印任何張量配置或運算。

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

上述程式碼會列印 MatMul 運算在 GPU:0 上執行的指示。

手動裝置放置

如果您希望特定運算在您選擇的裝置上執行，而不是自動為您選取的裝置，您可以使用 with tf.device 來建立裝置內容，且該內容內的所有運算都會在相同的指定裝置上執行。

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Run on the GPU
c = tf.matmul(a, b)
print(c)

您會看到現在 a 和 b 已指派給 CPU:0。由於未明確指定 MatMul 運算的裝置，因此 TensorFlow 執行階段將根據運算和可用裝置 (在此範例中為 GPU:0) 選擇一個裝置，並在需要時自動在裝置之間複製張量。

限制 GPU 記憶體成長

預設情況下，TensorFlow 會對應所有 GPU 的幾乎所有 GPU 記憶體 (取決於 CUDA_VISIBLE_DEVICES)，這些 GPU 對程序可見。這樣做的目的是更有效率地使用裝置上相對珍貴的 GPU 記憶體資源，方法是減少記憶體片段。若要將 TensorFlow 限制為一組特定的 GPU，請使用 tf.config.set_visible_devices 方法。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

在某些情況下，程序可能只需要配置可用記憶體的子集，或僅在程序需要時增加記憶體用量。TensorFlow 提供兩種方法來控制此行為。

第一個選項是透過呼叫 tf.config.experimental.set_memory_growth 來開啟記憶體成長，這會嘗試僅配置執行階段配置所需的 GPU 記憶體量：它一開始配置的記憶體非常少，且隨著程式執行和需要更多 GPU 記憶體，GPU 記憶體區域會針對 TensorFlow 程序擴充。由於記憶體釋放可能會導致記憶體片段，因此不會釋放記憶體。若要針對特定 GPU 開啟記憶體成長，請在配置任何張量或執行任何運算之前，使用下列程式碼。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

啟用此選項的另一種方法是將環境變數 TF_FORCE_GPU_ALLOW_GROWTH 設定為 true。此設定是平台專屬的。

第二種方法是使用 tf.config.set_logical_device_configuration 配置虛擬 GPU 裝置，並設定要在 GPU 上配置的總記憶體的硬性限制。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

如果您想要真正限制 TensorFlow 程序可用的 GPU 記憶體量，這會很有用。當 GPU 與其他應用程式 (例如工作站 GUI) 共用時，這是本機開發的常見做法。

在多 GPU 系統上使用單一 GPU

如果您的系統中有多個 GPU，則預設會選取 ID 最低的 GPU。如果您想要在不同的 GPU 上執行，則需要明確指定偏好設定

tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

如果您指定的裝置不存在，您會收到 RuntimeError：.../device:GPU:2 unknown device。

如果您希望在指定的裝置不存在的情況下，TensorFlow 自動選擇現有且支援的裝置來執行運算，您可以呼叫 tf.config.set_soft_device_placement(True)。

tf.config.set_soft_device_placement(True)
tf.debugging.set_log_device_placement(True)

# Creates some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

使用多個 GPU

針對多個 GPU 進行開發可讓模型隨著額外資源擴充。如果在具有單一 GPU 的系統上進行開發，您可以使用虛擬裝置模擬多個 GPU。這可輕鬆測試多 GPU 設定，而無需額外資源。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

一旦執行階段有多個邏輯 GPU 可用，您就可以使用 tf.distribute.Strategy 或手動放置來利用多個 GPU。

使用 `tf.distribute.Strategy`

使用多個 GPU 的最佳實務做法是使用 tf.distribute.Strategy。以下是一個簡單的範例

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

此程式會在每個 GPU 上執行模型副本，並在它們之間分割輸入資料，也稱為「資料平行處理」。

如需分散式策略的詳細資訊，請查看此處的指南。

手動放置

tf.distribute.Strategy 的底層運作方式是在裝置之間複製運算。您可以透過在每個 GPU 上建構模型來手動實作複製。例如

tf.debugging.set_log_device_placement(True)

gpus = tf.config.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)