![]() |
![]() |
![]() |
![]() |
import collections
import tensorflow as tf
tf.compat.v2.enable_v2_behavior()
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
基本概念
TensorFlow 分佈形狀有三個相關的重要概念
- 事件形狀描述從分佈中單次繪製的形狀;它可能在維度之間具有相關性。對於純量分佈,事件形狀為
[]
。對於 5 維 MultivariateNormal,事件形狀為[5]
。 - 批次形狀描述獨立但並非完全相同分佈的繪製,亦稱為「分批」分佈。
- 樣本形狀描述從分佈族群中分批進行的獨立且完全相同分佈的繪製。
事件形狀和批次形狀是 Distribution
物件的屬性,而樣本形狀則與對 sample
或 log_prob
的特定呼叫相關聯。
本筆記本的目的是透過範例說明這些概念,因此如果這並非顯而易見,請別擔心!
如需這些概念的另一個概念性總覽,請參閱這篇網誌文章。
關於 TensorFlow Eager 的注意事項。
整個筆記本皆使用 TensorFlow Eager 編寫。雖然使用 Eager 時,分佈批次和事件形狀會在 Python 中建立 Distribution
物件時進行評估 (因此已知),但在圖形 (非 Eager 模式) 中,可以定義事件和批次形狀在執行圖形之前皆未確定的分佈,但這裡呈現的任何概念皆不依賴 Eager。
純量分佈
如上所述,Distribution
物件已定義事件和批次形狀。我們將從描述分佈的公用程式開始
def describe_distributions(distributions):
print('\n'.join([str(d) for d in distributions]))
在本節中,我們將探索純量分佈:事件形狀為 []
的分佈。典型的範例是卜瓦松分佈,由 rate
指定
poisson_distributions = [
tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),
tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),
tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons'),
tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),
tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')
]
describe_distributions(poisson_distributions)
tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32)
卜瓦松分佈是純量分佈,因此其事件形狀始終為 []
。如果我們指定更多比率,這些比率會顯示在批次形狀中。最後一對範例很有趣:只有單一比率,但由於該比率嵌入在具有非空形狀的 numpy 陣列中,因此該形狀會變成批次形狀。
標準常態分佈也是純量。其事件形狀為 []
,就像卜瓦松分佈一樣,但我們將使用它來查看我們的第一個廣播範例。常態分佈是使用 loc
和 scale
參數指定
normal_distributions = [
tfd.Normal(loc=0., scale=1., name='Standard'),
tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],
name='Broadcasting Scale')
]
describe_distributions(normal_distributions)
tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32)
以上有趣的範例是 Broadcasting Scale
分佈。loc
參數的形狀為 [4]
,而 scale
參數的形狀為 [2, 1]
。使用 Numpy 廣播規則,批次形狀為 [2, 4]
。定義 "Broadcasting Scale"
分佈的等效 (但較不精簡且不建議) 方式為
describe_distributions(
[tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],
scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])
tfp.distributions.Normal("Normal", batch_shape=[2, 4], event_shape=[], dtype=float32)
我們可以理解為何廣播標記法很有用,但這也是頭痛和錯誤的根源。
取樣純量分佈
我們可以使用分佈執行兩項主要操作:我們可以從中 sample
取樣,也可以計算 log_prob
。我們先來探索取樣。基本規則是,當我們從分佈取樣時,產生的張量具有形狀 [sample_shape, batch_shape, event_shape]
,其中 batch_shape
和 event_shape
由 Distribution
物件提供,而 sample_shape
由對 sample
的呼叫提供。對於純量分佈,event_shape = []
,因此從取樣傳回的張量將具有形狀 [sample_shape, batch_shape]
。讓我們試試看
def describe_sample_tensor_shape(sample_shape, distribution):
print('Sample shape:', sample_shape)
print('Returned sample tensor shape:',
distribution.sample(sample_shape).shape)
def describe_sample_tensor_shapes(distributions, sample_shapes):
started = False
for distribution in distributions:
print(distribution)
for sample_shape in sample_shapes:
describe_sample_tensor_shape(sample_shape, distribution)
print()
sample_shapes = [1, 2, [1, 5], [3, 4, 5]]
describe_sample_tensor_shapes(poisson_distributions, sample_shapes)
tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1, 1)
describe_sample_tensor_shapes(normal_distributions, sample_shapes)
tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 4) Sample shape: 2 Returned sample tensor shape: (2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 4) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 4) Sample shape: 2 Returned sample tensor shape: (2, 2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 4)
這幾乎說明了關於 sample
的所有內容:傳回的樣本張量具有形狀 [sample_shape, batch_shape, event_shape]
。
計算純量分佈的 log_prob
現在讓我們看看 log_prob
,這稍微棘手。log_prob
接受一個 (非空) 張量作為輸入,表示要計算分佈 log_prob
的位置。在最直接的情況下,此張量將具有 [sample_shape, batch_shape, event_shape]
形式的形狀,其中 batch_shape
和 event_shape
符合分佈的批次和事件形狀。再次回想一下,對於純量分佈,event_shape = []
,因此輸入張量的形狀為 [sample_shape, batch_shape]
在此情況下,我們會取回形狀為 [sample_shape, batch_shape]
的張量
three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')
three_poissons
<tfp.distributions.Poisson 'Three_Poissons' batch_shape=[3] event_shape=[] dtype=float32>
three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]]) # sample_shape is [2].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]], dtype=float32)>
three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]]) # sample_shape is [1, 1, 2].
<tf.Tensor: shape=(1, 1, 2, 3), dtype=float32, numpy= array([[[[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]]]], dtype=float32)>
請注意,在第一個範例中,輸入和輸出具有形狀 [2, 3]
,而在第二個範例中,它們具有形狀 [1, 1, 2, 3]
。
如果沒有廣播,這就是所有要說明的內容。以下是考慮廣播後的規則。我們將完整地描述它,並注意純量分佈的簡化
- 定義
n = len(batch_shape) + len(event_shape)
。(對於純量分佈,len(event_shape)=0
。) - 如果輸入張量
t
的維度少於n
個,請在其形狀左側新增大小為1
的維度,直到它正好有n
個維度。將產生的張量稱為t'
。 - 針對您要計算
log_prob
的分佈的[batch_shape, event_shape]
廣播t'
最右側的n
個維度。更詳細地說:對於t'
已符合分佈的維度,不執行任何操作;對於t'
具有單例的維度,將該單例複製適當的次數。任何其他情況皆為錯誤。(對於純量分佈,我們僅針對batch_shape
廣播,因為 event_shape =[]
。) - 現在我們終於能夠計算
log_prob
。產生的張量將具有形狀[sample_shape, batch_shape]
,其中sample_shape
定義為t
或t'
中位於最右側n
個維度左側的任何維度:sample_shape = shape(t)[:-n]
。
如果您不知道其含義,這可能會一團糟,因此讓我們來看一些範例
three_poissons.log_prob([10.])
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-16.104412 , -2.0785608, -69.05272 ], dtype=float32)>
張量 [10.]
(形狀為 [1]
) 會跨 3 的 batch_shape
廣播,因此我們在值 10 處評估所有三個卜瓦松分佈的對數機率。
three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[-1.0000000e+00, -7.6974149e+00, -9.5394836e+01], [-1.6104412e+01, -2.0785608e+00, -6.9052719e+01]], [[-3.6473938e+02, -1.4348087e+02, -3.2223587e+00], [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>
在上述範例中,輸入張量的形狀為 [2, 2, 1]
,而分佈物件的批次形狀為 3。因此,對於每個 [2, 2]
樣本維度,提供的單個值會廣播到每個卜瓦松分佈。
一種可能實用的思考方式:由於 three_poissons
具有 batch_shape = [2, 3]
,因此對 log_prob
的呼叫必須採用最後一個維度為 1 或 3 的張量;任何其他情況皆為錯誤。(numpy 廣播規則將純量視為與形狀為 [1]
的張量完全等效的特殊情況。)
讓我們透過使用更複雜的卜瓦松分佈 (具有 batch_shape = [2, 3]
) 來測試我們的能力
poisson_2_by_3 = tfd.Poisson(
rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons')
poisson_2_by_3.log_prob(1.)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([1.]) # Exactly equivalent to above, demonstrating the scalar special case.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]]) # Another way to write the same thing. No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 10., 100.]]) # Input is [1, 3] broadcast to [2, 3].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]]) # Equivalent to above. No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]]) # No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1.], [2.]]) # Equivalent to above. Input shape [2, 1] broadcast to [2, 3].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>
以上範例涉及跨批次廣播,但樣本形狀為空。假設我們有一系列值,並且我們想要取得每個批次點處每個值的對數機率。我們可以手動執行
poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]]) # Input shape [2, 2, 3].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
或者,我們可以讓廣播處理最後一個批次維度
poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]]) # Input shape [2, 2, 1].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
我們也可以 (可能較不自然地) 讓廣播僅處理第一個批次維度
poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]]) # Input shape [2, 1, 3].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
或者,我們可以讓廣播處理兩個批次維度
poisson_2_by_3.log_prob([[[1.]], [[2.]]]) # Input shape [2, 1, 1].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
當我們只有兩個想要的值時,以上方法運作良好,但假設我們有一個長長的值清單,我們想要在每個批次點評估。為此,以下標記法 (在形狀的右側新增大小為 1 的額外維度) 非常實用
poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
這是 步幅切片標記法的一個範例,值得瞭解。
回到 three_poissons
以求完整性,相同的範例看起來像
three_poissons.log_prob([[1.], [10.], [50.], [100.]])
<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>
three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis]) # Equivalent to above.
<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>
多變量分佈
我們現在轉向具有非空事件形狀的多變量分佈。讓我們看看多項式分佈。
multinomial_distributions = [
# Multinomial is a vector-valued distribution: if we have k classes,
# an individual sample from the distribution has k values in it, so the
# event_shape is `[k]`.
tfd.Multinomial(total_count=100., probs=[.5, .4, .1],
name='One Multinomial'),
tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],
name='Two Multinomials Same Probs'),
tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Same Counts'),
tfd.Multinomial(total_count=[100., 1000.],
probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Different Everything')
]
describe_distributions(multinomial_distributions)
tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32)
請注意,在最後三個範例中,batch_shape 始終為 [2]
,但我們可以透過廣播來共用 total_count
或共用 probs
(或兩者皆非),因為在底層它們會廣播為具有相同的形狀。
取樣很簡單,因為我們已經知道了
describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)
tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3)
計算對數機率同樣簡單。讓我們使用對角線多變量常態分佈來舉例說明。(多項式分佈不太適合廣播,因為對計數和機率的限制表示廣播通常會產生不可接受的值。) 我們將使用一批 2 個 3 維分佈,它們具有相同的平均值但不同的尺度 (標準差)
two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_diag=tf.ones([2, 3]) * [[1.], [2.]])
two_multivariate_normals
<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[3] dtype=float32>
現在讓我們在每個批次點的平均值和位移平均值處評估對數機率
two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]]) # Input has shape [2,1,3].
<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>
完全等效地,我們可以使用 https://tensorflow.dev.org.tw/api_docs/cc/class/tensorflow/ops/strided-slice 在常數的中間插入額外的 shape=1 維度
two_multivariate_normals.log_prob(
tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :]) # Equivalent to above.
<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>
另一方面,如果我們不插入額外維度,我們會將 [1., 2., 3.]
傳遞到第一個批次點,並將 [3., 4., 5.]
傳遞到第二個批次點
two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>
形狀操作技術
Reshape 雙射器
Reshape
雙射器可用於重塑分佈的 event_shape。讓我們來看一個範例
six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])
six_way_multinomial
<tfp.distributions.Multinomial 'Multinomial' batch_shape=[] event_shape=[6] dtype=float32>
我們建立了一個事件形狀為 [6]
的多項式分佈。Reshape 雙射器允許我們將其視為事件形狀為 [2, 3]
的分佈。
Bijector
表示 \({\mathbb R}^n\) 開子集上的可微分一對一函數。Bijectors
與 TransformedDistribution
結合使用,後者根據基本分佈 \(p(x)\) 和表示 \(Y = g(X)\) 的 Bijector
來建模分佈 \(p(y)\)。讓我們看看它的實際作用
transformed_multinomial = tfd.TransformedDistribution(
distribution=six_way_multinomial,
bijector=tfb.Reshape(event_shape_out=[2, 3]))
transformed_multinomial
<tfp.distributions.TransformedDistribution 'reshapeMultinomial' batch_shape=[] event_shape=[2, 3] dtype=float32>
six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])
<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>
transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])
<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>
這是 Reshape
雙射器可以執行的唯一操作:它無法將事件維度轉換為批次維度,反之亦然。
獨立分佈
Independent
分佈用於將獨立但非完全相同 (亦稱為一批) 分佈的集合視為單個分佈。更簡潔地說,Independent
允許將 batch_shape
中的維度轉換為 event_shape
中的維度。我們將透過範例來說明
two_by_five_bernoulli = tfd.Bernoulli(
probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],
name="Two By Five Bernoulli")
two_by_five_bernoulli
<tfp.distributions.Bernoulli 'Two_By_Five_Bernoulli' batch_shape=[2, 5] event_shape=[] dtype=int32>
我們可以將其視為 2x5 的硬幣陣列,其中包含相關的正面機率。讓我們評估一組特定的任意 1 和 0 的機率
pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]
two_by_five_bernoulli.log_prob(pattern)
<tf.Tensor: shape=(2, 5), dtype=float32, numpy= array([[-2.9957323 , -0.10536051, -0.16251892, -1.609438 , -0.2876821 ], [-0.35667497, -0.4307829 , -0.91629076, -0.79850775, -0.6931472 ]], dtype=float32)>
如果我們想要將「一排」硬幣翻轉以特定模式出現視為單一結果,則可以使用 Independent
將其轉換為兩個不同的「五個白努利集合」
two_sets_of_five = tfd.Independent(
distribution=two_by_five_bernoulli,
reinterpreted_batch_ndims=1,
name="Two Sets Of Five")
two_sets_of_five
<tfp.distributions.Independent 'Two_Sets_Of_Five' batch_shape=[2] event_shape=[5] dtype=int32>
在數學上,我們正在計算每個「集合」的對數機率,方法是將集合中五個「獨立」硬幣翻轉的對數機率相加,這也是分佈名稱的由來
two_sets_of_five.log_prob(pattern)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>
我們可以更進一步,並使用 Independent
來建立一個分佈,其中個別事件是一組 2x5 的白努利分佈
one_set_of_two_by_five = tfd.Independent(
distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,
name="One Set Of Two By Five")
one_set_of_two_by_five.log_prob(pattern)
<tf.Tensor: shape=(), dtype=float32, numpy=-8.356134>
值得注意的是,從 sample
的角度來看,使用 Independent
不會改變任何內容
describe_sample_tensor_shapes(
[two_by_five_bernoulli,
two_sets_of_five,
one_set_of_two_by_five],
[[3, 5]])
tfp.distributions.Bernoulli("Two_By_Five_Bernoulli", batch_shape=[2, 5], event_shape=[], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("Two_Sets_Of_Five", batch_shape=[2], event_shape=[5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("One_Set_Of_Two_By_Five", batch_shape=[], event_shape=[2, 5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5)
作為讀者的課後練習,我們建議考慮向量批次的 Normal
分佈與 MultivariateNormalDiag
分佈在取樣和對數機率角度的差異和相似之處。我們如何使用 Independent
從一批 Normal
分佈建構 MultivariateNormalDiag
?(請注意,MultivariateNormalDiag
實際上並非以此方式實作。)