理解 TensorFlow 發佈版分佈形狀

import collections

import tensorflow as tf
tf.compat.v2.enable_v2_behavior()

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors

基本概念

TensorFlow 分佈形狀有三個相關的重要概念

事件形狀描述從分佈中單次繪製的形狀；它可能在維度之間具有相關性。對於純量分佈，事件形狀為 []。對於 5 維 MultivariateNormal，事件形狀為 [5]。
批次形狀描述獨立但並非完全相同分佈的繪製，亦稱為「分批」分佈。
樣本形狀描述從分佈族群中分批進行的獨立且完全相同分佈的繪製。

事件形狀和批次形狀是 Distribution 物件的屬性，而樣本形狀則與對 sample 或 log_prob 的特定呼叫相關聯。

本筆記本的目的是透過範例說明這些概念，因此如果這並非顯而易見，請別擔心！

如需這些概念的另一個概念性總覽，請參閱這篇網誌文章。

關於 TensorFlow Eager 的注意事項。

整個筆記本皆使用 TensorFlow Eager 編寫。雖然使用 Eager 時，分佈批次和事件形狀會在 Python 中建立 Distribution 物件時進行評估 (因此已知)，但在圖形 (非 Eager 模式) 中，可以定義事件和批次形狀在執行圖形之前皆未確定的分佈，但這裡呈現的任何概念皆不依賴 Eager。

純量分佈

如上所述，Distribution 物件已定義事件和批次形狀。我們將從描述分佈的公用程式開始

def describe_distributions(distributions):
  print('\n'.join([str(d) for d in distributions]))

在本節中，我們將探索純量分佈：事件形狀為 [] 的分佈。典型的範例是卜瓦松分佈，由 rate 指定

poisson_distributions = [
    tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),
    tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),
    tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],
                name='Two-by-Three Poissons'),
    tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),
    tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')
]

describe_distributions(poisson_distributions)

tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32)
tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32)
tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32)
tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32)
tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32)

卜瓦松分佈是純量分佈，因此其事件形狀始終為 []。如果我們指定更多比率，這些比率會顯示在批次形狀中。最後一對範例很有趣：只有單一比率，但由於該比率嵌入在具有非空形狀的 numpy 陣列中，因此該形狀會變成批次形狀。

標準常態分佈也是純量。其事件形狀為 []，就像卜瓦松分佈一樣，但我們將使用它來查看我們的第一個廣播範例。常態分佈是使用 loc 和 scale 參數指定

normal_distributions = [
    tfd.Normal(loc=0., scale=1., name='Standard'),
    tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),
    tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),
    tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],
               name='Broadcasting Scale')
]

describe_distributions(normal_distributions)

tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32)
tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32)
tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32)
tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32)

以上有趣的範例是 Broadcasting Scale 分佈。loc 參數的形狀為 [4]，而 scale 參數的形狀為 [2, 1]。使用 Numpy 廣播規則，批次形狀為 [2, 4]。定義 "Broadcasting Scale" 分佈的等效 (但較不精簡且不建議) 方式為

describe_distributions(
    [tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],
                scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])

tfp.distributions.Normal("Normal", batch_shape=[2, 4], event_shape=[], dtype=float32)

我們可以理解為何廣播標記法很有用，但這也是頭痛和錯誤的根源。

取樣純量分佈

我們可以使用分佈執行兩項主要操作：我們可以從中 sample 取樣，也可以計算 log_prob。我們先來探索取樣。基本規則是，當我們從分佈取樣時，產生的張量具有形狀 [sample_shape, batch_shape, event_shape]，其中 batch_shape 和 event_shape 由 Distribution 物件提供，而 sample_shape 由對 sample 的呼叫提供。對於純量分佈，event_shape = []，因此從取樣傳回的張量將具有形狀 [sample_shape, batch_shape]。讓我們試試看

def describe_sample_tensor_shape(sample_shape, distribution):
    print('Sample shape:', sample_shape)
    print('Returned sample tensor shape:',
          distribution.sample(sample_shape).shape)

def describe_sample_tensor_shapes(distributions, sample_shapes):
    started = False
    for distribution in distributions:
      print(distribution)
      for sample_shape in sample_shapes:
        describe_sample_tensor_shape(sample_shape, distribution)
      print()

sample_shapes = [1, 2, [1, 5], [3, 4, 5]]
describe_sample_tensor_shapes(poisson_distributions, sample_shapes)

tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1,)
Sample shape: 2
Returned sample tensor shape: (2,)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5)

tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 3)
Sample shape: 2
Returned sample tensor shape: (2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 3)

tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1)

tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1, 1)

describe_sample_tensor_shapes(normal_distributions, sample_shapes)

tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1,)
Sample shape: 2
Returned sample tensor shape: (2,)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5)

tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1)

tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 4)
Sample shape: 2
Returned sample tensor shape: (2, 4)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 4)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 4)

tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 4)
Sample shape: 2
Returned sample tensor shape: (2, 2, 4)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 4)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 4)

這幾乎說明了關於 sample 的所有內容：傳回的樣本張量具有形狀 [sample_shape, batch_shape, event_shape]。

計算純量分佈的 `log_prob`

現在讓我們看看 log_prob，這稍微棘手。log_prob 接受一個 (非空) 張量作為輸入，表示要計算分佈 log_prob 的位置。在最直接的情況下，此張量將具有 [sample_shape, batch_shape, event_shape] 形式的形狀，其中 batch_shape 和 event_shape 符合分佈的批次和事件形狀。再次回想一下，對於純量分佈，event_shape = []，因此輸入張量的形狀為 [sample_shape, batch_shape] 在此情況下，我們會取回形狀為 [sample_shape, batch_shape] 的張量

three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')
three_poissons

<tfp.distributions.Poisson 'Three_Poissons' batch_shape=[3] event_shape=[] dtype=float32>

three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]])  # sample_shape is [2].

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -2.0785608,   -3.2223587],
       [-364.73938  ,   -2.0785608,  -95.39484  ]], dtype=float32)>

three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]])  # sample_shape is [1, 1, 2].

<tf.Tensor: shape=(1, 1, 2, 3), dtype=float32, numpy=
array([[[[  -1.       ,   -2.0785608,   -3.2223587],
         [-364.73938  ,   -2.0785608,  -95.39484  ]]]], dtype=float32)>

請注意，在第一個範例中，輸入和輸出具有形狀 [2, 3]，而在第二個範例中，它們具有形狀 [1, 1, 2, 3]。

如果沒有廣播，這就是所有要說明的內容。以下是考慮廣播後的規則。我們將完整地描述它，並注意純量分佈的簡化

定義 n = len(batch_shape) + len(event_shape)。(對於純量分佈，len(event_shape)=0。)
如果輸入張量 t 的維度少於 n 個，請在其形狀左側新增大小為 1 的維度，直到它正好有 n 個維度。將產生的張量稱為 t'。
針對您要計算 log_prob 的分佈的 [batch_shape, event_shape] 廣播 t' 最右側的 n 個維度。更詳細地說：對於 t' 已符合分佈的維度，不執行任何操作；對於 t' 具有單例的維度，將該單例複製適當的次數。任何其他情況皆為錯誤。(對於純量分佈，我們僅針對 batch_shape 廣播，因為 event_shape = []。)
現在我們終於能夠計算 log_prob。產生的張量將具有形狀 [sample_shape, batch_shape]，其中 sample_shape 定義為 t 或 t' 中位於最右側 n 個維度左側的任何維度：sample_shape = shape(t)[:-n]。

如果您不知道其含義，這可能會一團糟，因此讓我們來看一些範例

three_poissons.log_prob([10.])

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-16.104412 ,  -2.0785608, -69.05272  ], dtype=float32)>

張量 [10.] (形狀為 [1]) 會跨 3 的 batch_shape 廣播，因此我們在值 10 處評估所有三個卜瓦松分佈的對數機率。

three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[-1.0000000e+00, -7.6974149e+00, -9.5394836e+01],
        [-1.6104412e+01, -2.0785608e+00, -6.9052719e+01]],

       [[-3.6473938e+02, -1.4348087e+02, -3.2223587e+00],
        [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>

在上述範例中，輸入張量的形狀為 [2, 2, 1]，而分佈物件的批次形狀為 3。因此，對於每個 [2, 2] 樣本維度，提供的單個值會廣播到每個卜瓦松分佈。

一種可能實用的思考方式：由於 three_poissons 具有 batch_shape = [2, 3]，因此對 log_prob 的呼叫必須採用最後一個維度為 1 或 3 的張量；任何其他情況皆為錯誤。(numpy 廣播規則將純量視為與形狀為 [1] 的張量完全等效的特殊情況。)

讓我們透過使用更複雜的卜瓦松分佈 (具有 batch_shape = [2, 3]) 來測試我們的能力

poisson_2_by_3 = tfd.Poisson(
    rate=[[1., 10., 100.,], [2., 20., 200.]],
    name='Two-by-Three Poissons')

poisson_2_by_3.log_prob(1.)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>

poisson_2_by_3.log_prob([1.])  # Exactly equivalent to above, demonstrating the scalar special case.

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]])  # Another way to write the same thing. No broadcasting.

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 10., 100.]])  # Input is [1, 3] broadcast to [2, 3].

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ -1.       ,  -2.0785608,  -3.2223587],
       [ -1.3068528,  -5.14709  , -33.90767  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]])  # Equivalent to above. No broadcasting.

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ -1.       ,  -2.0785608,  -3.2223587],
       [ -1.3068528,  -5.14709  , -33.90767  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]])  # No broadcasting.

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [  -1.3068528,  -14.701683 , -190.09653  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1.], [2.]])  # Equivalent to above. Input shape [2, 1] broadcast to [2, 3].

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [  -1.3068528,  -14.701683 , -190.09653  ]], dtype=float32)>

以上範例涉及跨批次廣播，但樣本形狀為空。假設我們有一系列值，並且我們想要取得每個批次點處每個值的對數機率。我們可以手動執行

poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]])  # Input shape [2, 2, 3].

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39484  ],
        [  -1.3068528,  -17.004269 , -194.70169  ]],

       [[  -1.6931472,   -6.087977 ,  -91.48282  ],
        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>

或者，我們可以讓廣播處理最後一個批次維度

poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]])  # Input shape [2, 2, 1].

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39484  ],
        [  -1.3068528,  -17.004269 , -194.70169  ]],

       [[  -1.6931472,   -6.087977 ,  -91.48282  ],
        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>

我們也可以 (可能較不自然地) 讓廣播僅處理第一個批次維度

poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]])  # Input shape [2, 1, 3].

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39484  ],
        [  -1.3068528,  -17.004269 , -194.70169  ]],

       [[  -1.6931472,   -6.087977 ,  -91.48282  ],
        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>

或者，我們可以讓廣播處理兩個批次維度

poisson_2_by_3.log_prob([[[1.]], [[2.]]])  # Input shape [2, 1, 1].

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39484  ],
        [  -1.3068528,  -17.004269 , -194.70169  ]],

       [[  -1.6931472,   -6.087977 ,  -91.48282  ],
        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>

當我們只有兩個想要的值時，以上方法運作良好，但假設我們有一個長長的值清單，我們想要在每個批次點評估。為此，以下標記法 (在形狀的右側新增大小為 1 的額外維度) 非常實用

poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])

<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39484  ],
        [  -1.3068528,  -17.004269 , -194.70169  ]],

       [[  -1.6931472,   -6.087977 ,  -91.48282  ],
        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>

這是步幅切片標記法的一個範例，值得瞭解。

回到 three_poissons 以求完整性，相同的範例看起來像

three_poissons.log_prob([[1.], [10.], [50.], [100.]])

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [ -16.104412 ,   -2.0785608,  -69.05272  ],
       [-149.47777  ,  -43.34851  ,  -18.219261 ],
       [-364.73938  , -143.48087  ,   -3.2223587]], dtype=float32)>

three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis])  # Equivalent to above.

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39484  ],
       [ -16.104412 ,   -2.0785608,  -69.05272  ],
       [-149.47777  ,  -43.34851  ,  -18.219261 ],
       [-364.73938  , -143.48087  ,   -3.2223587]], dtype=float32)>

多變量分佈

我們現在轉向具有非空事件形狀的多變量分佈。讓我們看看多項式分佈。

multinomial_distributions = [
    # Multinomial is a vector-valued distribution: if we have k classes,
    # an individual sample from the distribution has k values in it, so the
    # event_shape is `[k]`.
    tfd.Multinomial(total_count=100., probs=[.5, .4, .1],
                    name='One Multinomial'),
    tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],
                    name='Two Multinomials Same Probs'),
    tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],
                    name='Two Multinomials Same Counts'),
    tfd.Multinomial(total_count=[100., 1000.],
                    probs=[[.5, .4, .1], [.1, .2, .7]],
                    name='Two Multinomials Different Everything')

]

describe_distributions(multinomial_distributions)

tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32)
tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32)
tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32)
tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32)

請注意，在最後三個範例中，batch_shape 始終為 [2]，但我們可以透過廣播來共用 total_count 或共用 probs (或兩者皆非)，因為在底層它們會廣播為具有相同的形狀。

取樣很簡單，因為我們已經知道了

describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)

tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 3)
Sample shape: 2
Returned sample tensor shape: (2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 3)

tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

計算對數機率同樣簡單。讓我們使用對角線多變量常態分佈來舉例說明。(多項式分佈不太適合廣播，因為對計數和機率的限制表示廣播通常會產生不可接受的值。) 我們將使用一批 2 個 3 維分佈，它們具有相同的平均值但不同的尺度 (標準差)

two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_diag=tf.ones([2, 3]) * [[1.], [2.]])
two_multivariate_normals

<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[3] dtype=float32>

現在讓我們在每個批次點的平均值和位移平均值處評估對數機率

two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]])  # Input has shape [2,1,3].

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-2.7568154, -4.836257 ],
       [-8.756816 , -6.336257 ]], dtype=float32)>

完全等效地，我們可以使用 https://tensorflow.dev.org.tw/api_docs/cc/class/tensorflow/ops/strided-slice 在常數的中間插入額外的 shape=1 維度

two_multivariate_normals.log_prob(
    tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :])  # Equivalent to above.

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-2.7568154, -4.836257 ],
       [-8.756816 , -6.336257 ]], dtype=float32)>

另一方面，如果我們不插入額外維度，我們會將 [1., 2., 3.] 傳遞到第一個批次點，並將 [3., 4., 5.] 傳遞到第二個批次點

two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>

形狀操作技術

Reshape 雙射器

Reshape 雙射器可用於重塑分佈的 event_shape。讓我們來看一個範例

six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])
six_way_multinomial

<tfp.distributions.Multinomial 'Multinomial' batch_shape=[] event_shape=[6] dtype=float32>

我們建立了一個事件形狀為 [6] 的多項式分佈。Reshape 雙射器允許我們將其視為事件形狀為 [2, 3] 的分佈。

Bijector 表示 \({\mathbb R}^n\) 開子集上的可微分一對一函數。Bijectors 與 TransformedDistribution 結合使用，後者根據基本分佈 \(p(x)\) 和表示 \(Y = g(X)\) 的 Bijector 來建模分佈 \(p(y)\)。讓我們看看它的實際作用

transformed_multinomial = tfd.TransformedDistribution(
    distribution=six_way_multinomial,
    bijector=tfb.Reshape(event_shape_out=[2, 3]))
transformed_multinomial

<tfp.distributions.TransformedDistribution 'reshapeMultinomial' batch_shape=[] event_shape=[2, 3] dtype=float32>

six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])

<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>

transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])

<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>

這是 Reshape 雙射器可以執行的唯一操作：它無法將事件維度轉換為批次維度，反之亦然。

獨立分佈

Independent 分佈用於將獨立但非完全相同 (亦稱為一批) 分佈的集合視為單個分佈。更簡潔地說，Independent 允許將 batch_shape 中的維度轉換為 event_shape 中的維度。我們將透過範例來說明

two_by_five_bernoulli = tfd.Bernoulli(
    probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],
    name="Two By Five Bernoulli")
two_by_five_bernoulli

<tfp.distributions.Bernoulli 'Two_By_Five_Bernoulli' batch_shape=[2, 5] event_shape=[] dtype=int32>

我們可以將其視為 2x5 的硬幣陣列，其中包含相關的正面機率。讓我們評估一組特定的任意 1 和 0 的機率

pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]
two_by_five_bernoulli.log_prob(pattern)

<tf.Tensor: shape=(2, 5), dtype=float32, numpy=
array([[-2.9957323 , -0.10536051, -0.16251892, -1.609438  , -0.2876821 ],
       [-0.35667497, -0.4307829 , -0.91629076, -0.79850775, -0.6931472 ]],
      dtype=float32)>

如果我們想要將「一排」硬幣翻轉以特定模式出現視為單一結果，則可以使用 Independent 將其轉換為兩個不同的「五個白努利集合」

two_sets_of_five = tfd.Independent(
    distribution=two_by_five_bernoulli,
    reinterpreted_batch_ndims=1,
    name="Two Sets Of Five")
two_sets_of_five

<tfp.distributions.Independent 'Two_Sets_Of_Five' batch_shape=[2] event_shape=[5] dtype=int32>

在數學上，我們正在計算每個「集合」的對數機率，方法是將集合中五個「獨立」硬幣翻轉的對數機率相加，這也是分佈名稱的由來

two_sets_of_five.log_prob(pattern)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>

我們可以更進一步，並使用 Independent 來建立一個分佈，其中個別事件是一組 2x5 的白努利分佈

one_set_of_two_by_five = tfd.Independent(
    distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,
    name="One Set Of Two By Five")
one_set_of_two_by_five.log_prob(pattern)

<tf.Tensor: shape=(), dtype=float32, numpy=-8.356134>

值得注意的是，從 sample 的角度來看，使用 Independent 不會改變任何內容

describe_sample_tensor_shapes(
    [two_by_five_bernoulli,
     two_sets_of_five,
     one_set_of_two_by_five],
    [[3, 5]])

tfp.distributions.Bernoulli("Two_By_Five_Bernoulli", batch_shape=[2, 5], event_shape=[], dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

tfp.distributions.Independent("Two_Sets_Of_Five", batch_shape=[2], event_shape=[5], dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

tfp.distributions.Independent("One_Set_Of_Two_By_Five", batch_shape=[], event_shape=[2, 5], dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

作為讀者的課後練習，我們建議考慮向量批次的 Normal 分佈與 MultivariateNormalDiag 分佈在取樣和對數機率角度的差異和相似之處。我們如何使用 Independent 從一批 Normal 分佈建構 MultivariateNormalDiag？(請注意，MultivariateNormalDiag 實際上並非以此方式實作。)