自訂入門：張量與運算

在 TensorFlow.org 上檢視

在 Google Colab 中執行

在 GitHub 上檢視原始碼

下載筆記本

這是一份 TensorFlow 入門教學課程，示範如何：

匯入所需的套件。
建立及使用張量。
使用 GPU 加速。
使用 tf.data.Dataset 建立資料管道。

匯入 TensorFlow

若要開始使用，請匯入 tensorflow 模組。在 TensorFlow 2 及更高版本中，立即執行功能預設為開啟。立即執行功能可為 TensorFlow 啟用更具互動性的前端，您稍後將更詳細地探索此功能。

import tensorflow as tf

張量

張量是一種多維陣列。類似於 NumPy ndarray 物件，tf.Tensor 物件具有資料類型和形狀。此外，tf.Tensor 也可以駐留在加速器記憶體中 (例如 GPU)。TensorFlow 提供了豐富的運算函式庫 (例如 tf.math.add、tf.linalg.matmul 和 tf.linalg.inv)，這些函式庫會使用及產生 tf.Tensor。這些運算會自動轉換內建的 Python 類型。例如：

print(tf.math.add(1, 2))
print(tf.math.add([1, 2], [3, 4]))
print(tf.math.square(5))
print(tf.math.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.math.square(2) + tf.math.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)

每個 tf.Tensor 都有形狀和資料類型

x = tf.linalg.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>

NumPy 陣列和 tf.Tensor 之間最明顯的差異在於：

張量可以由加速器記憶體 (例如 GPU、TPU) 支援。
張量是不可變的。

NumPy 相容性

在 TensorFlow tf.Tensor 和 NumPy ndarray 之間轉換非常容易：

TensorFlow 運算會自動將 NumPy ndarray 轉換為張量。
NumPy 運算會自動將張量轉換為 NumPy ndarray。

張量會使用其 .numpy() 方法明確轉換為 NumPy ndarray。這些轉換通常很快速，因為如果可能，陣列和 tf.Tensor 會共用底層記憶體表示法。然而，共用底層表示法並非總是可行，因為 tf.Tensor 可能託管在 GPU 記憶體中，而 NumPy 陣列始終由主機記憶體支援，而且轉換涉及從 GPU 複製到主機記憶體。

import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.math.multiply(ndarray, 42)
print(tensor)


print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to NumPy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]

GPU 加速

許多 TensorFlow 運算都使用 GPU 加速進行運算。在沒有任何註解的情況下，TensorFlow 會自動決定是否使用 GPU 或 CPU 進行運算，並在必要時在 CPU 和 GPU 記憶體之間複製張量。運算產生的張量通常由執行運算的裝置的記憶體支援。例如：

x = tf.random.uniform([3, 3])

print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is there a GPU available: 
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Is the Tensor on GPU #0:  
True

裝置名稱

Tensor.device 屬性提供託管張量內容的裝置的完整字串名稱。這個名稱編碼了許多詳細資訊，例如執行此程式的主機網路位址識別碼，以及該主機內的裝置。這是 TensorFlow 程式的分散式執行所必需的。如果張量放置在主機上的第 N 個 GPU 上，則字串會以 GPU:<N> 結尾。

明確裝置放置

在 TensorFlow 中，「放置」是指如何將個別運算指派 (放置在) 裝置上以供執行。如前所述，當沒有提供明確的指引時，TensorFlow 會自動決定要執行運算的裝置，並在需要時將張量複製到該裝置。

然而，可以使用 tf.device 環境管理器將 TensorFlow 運算明確放置在特定裝置上。例如：

import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.linalg.matmul(x, x)

  result = time.time()-start

  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
  print("On GPU:")
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

On CPU:
10 loops: 42.76ms
On GPU:
10 loops: 300.72ms

資料集

本節使用 tf.data.Dataset API 建立管道，以將資料饋送到您的模型。tf.data.Dataset 用於從簡單、可重複使用的組件建置高效能、複雜的輸入管道，這些組件將饋送模型的訓練或評估迴圈。(如需詳細資訊，請參閱 tf.data：建構 TensorFlow 輸入管道指南。)

建立來源 `Dataset`

使用其中一個工廠函式 (例如 tf.data.Dataset.from_tensors、tf.data.Dataset.from_tensor_slices) 或使用從檔案讀取的物件 (例如 tf.data.TextLineDataset 或 tf.data.TFRecordDataset) 建立來源資料集。如需更多資訊，請參閱 tf.data：建構 TensorFlow 輸入管道指南的讀取輸入資料章節。

ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
  f.write("""Line 1
Line 2
Line 3
  """)

ds_file = tf.data.TextLineDataset(filename)

套用轉換

使用轉換函式 (例如 tf.data.Dataset.map、tf.data.Dataset.batch 和 tf.data.Dataset.shuffle) 將轉換套用至資料集記錄。

ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

迭代

tf.data.Dataset 物件支援迭代以迴圈處理記錄

print('Elements of ds_tensors:')
for x in ds_tensors:
  print(x)

print('\nElements in ds_file:')
for x in ds_file:
  print(x)

Elements of ds_tensors:
tf.Tensor([4 9], shape=(2,), dtype=int32)
tf.Tensor([ 1 25], shape=(2,), dtype=int32)
tf.Tensor([16 36], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'Line 3' b'  '], shape=(2,), dtype=string)