注意： TensorFlow Lite 現在是 Google AI Edge 的一部分。最新文件位於 ai.google.dev/edge/lite。瞭解詳情

TensorFlow 運算元融合

總覽

本頁面說明將 TensorFlow 中的複合運算元轉換為 TensorFlow Lite 中融合運算元的設計和必要步驟。此基礎架構為通用架構，支援將 TensorFlow 中的任何複合運算元轉換為 TensorFlow Lite 中對應的融合運算元。

如此處詳述，此基礎架構的一個使用範例是 TensorFlow RNN 運算元融合到 TensorFlow Lite。

什麼是融合運算元

drawing

TensorFlow 運算元可以是基本運算元 (例如 tf.add)，也可以由其他基本運算元組成 (例如 tf.einsum)。基本運算元在 TensorFlow 圖表中顯示為單一節點，而複合運算元是 TensorFlow 圖表中節點的集合。執行複合運算元相當於執行其每個組成的基本運算元。

融合運算元對應於單一運算元，該運算元包含對應複合運算元中每個基本運算元執行的所有運算。

融合運算元的優點

融合運算元的存在是為了透過最佳化整體運算和減少記憶體用量，來最大化其底層核心實作的效能。這非常有價值，特別是對於低延遲推論工作負載和資源受限的行動平台。

融合運算元也提供更高等級的介面來定義複雜的轉換 (如量化)，否則在更精細的層級上執行這些轉換將不可行或非常困難。

TensorFlow Lite 基於上述原因，有許多融合運算元的實例。這些融合運算元通常對應於來源 TensorFlow 程式中的複合運算元。在 TensorFlow 中，作為 TensorFlow Lite 中的單一融合運算元實作的複合運算元範例包括各種 RNN 運算元 (如單向和雙向序列 LSTM)、卷積 (conv2d、bias add、relu)、全連接 (matmul、bias add、relu) 等。在 TensorFlow Lite 中，LSTM 量化目前僅在融合 LSTM 運算元中實作。

融合運算元的挑戰

將 TensorFlow 中的複合運算元轉換為 TensorFlow Lite 中的融合運算元是一個難題。這是因為

複合運算元在 TensorFlow 圖表中表示為一組沒有明確邊界的基本運算元。識別 (例如透過模式比對) 對應於此類複合運算元的子圖可能非常具有挑戰性。
可能有多個 TensorFlow 實作針對融合 TensorFlow Lite 運算元。例如，TensorFlow 中有許多 LSTM 實作 (Keras、Babelfish/lingvo 等)，它們各自由不同的基本運算元組成，但它們仍然可以全部轉換為 TensorFlow Lite 中的相同融合 LSTM 運算元。

因此，融合運算元的轉換已被證明相當具有挑戰性。

從複合運算元轉換為 TFLite 自訂運算元 (建議做法)

將複合運算元包裝在 `tf.function` 中

在許多情況下，模型的某些部分可以對應到 TFLite 中的單一運算元。在為特定運算元編寫最佳化實作時，這有助於提高效能。為了能夠在 TFLite 中建立融合運算元，請識別代表融合運算元的圖表部分，並將其包裝在 tf.function 中，其中 "experimental_implements" 屬性指向具有屬性值 tfl_fusable_op (值為 true) 的 tf.function。如果自訂運算元採用屬性，則將它們作為同一個 "experimental_implements" 的一部分傳遞。

範例：

def get_implements_signature():
  implements_signature = [
    # 'name' will be used as a name for the operation.
    'name: "my_custom_fused_op"',
    # attr "tfl_fusable_op" is required to be set with true value.
    'attr {key: "tfl_fusable_op" value { b: true } }',
    # Example attribute "example_option" that the op accepts.
    'attr {key: "example_option" value { i: %d } }' % 10
  ]
  return ' '.join(implements_signature)

@tf.function(experimental_implements=get_implements_signature())
def my_custom_fused_op(input_1, input_2):
  # An empty function that represents pre/post processing example that
  # is not represented as part of the Tensorflow graph.
  output_1 = tf.constant(0.0, dtype=tf.float32, name='first_output')
  output_2 = tf.constant(0.0, dtype=tf.float32, name='second_output')
  return output_1, output_2

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()
    self.conv_1 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3))
    self.conv_2 = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3))

  @tf.function(input_signature=[
      tf.TensorSpec(shape=[1, 28, 28, 3], dtype=tf.float32),
      tf.TensorSpec(shape=[1, 28, 28, 3], dtype=tf.float32),
  ])
  def simple_eval(self, input_a, input_b):
    return my_custom_fused_op(self.conv_1(input_a), self.conv_2(input_b))

請注意，您不需要在轉換器上設定 allow_custom_ops，因為 tfl_fusable_op 屬性已暗示了這一點。

實作自訂運算元並向 TFLite Interpreter 註冊

將您的融合運算元實作為 TFLite 自訂運算元 - 請參閱說明。

請注意，用於註冊運算元的名稱應與實作簽章中的 name 屬性中指定的名稱相似。

範例中運算元的範例為

  TfLiteRegistration reg = {};
  // This name must match the name specified in the implements signature.
  static constexpr char kOpName[] = "my_custom_fused_op";
  reg.custom_name = kOpName;
  reg.prepare = [](TfLiteContext* context, TfLiteNode* node) -> TfLiteStatus {
    // Add your code.
    return kTfLiteOk;
  };
  reg.invoke = [](TfLiteContext* context, TfLiteNode* node) -> TfLiteStatus {
    // Add your code.
    return kTfLiteOk;
  };
  reg.builtin_code = kTfLiteCustom;
  resolver->AddCustom(kOpName, &reg);

從複合運算元轉換為融合運算元 (進階)

將 TensorFlow 複合運算元轉換為 TensorFlow Lite 融合運算元的整體架構如下

drawing

將複合運算元包裝在 `tf.function` 中

在 TensorFlow 模型原始碼中，將複合運算元識別並抽象化為具有 experimental_implements 函數註解的 tf.function。請參閱嵌入查閱的範例。函數定義介面，其引數應被用於實作轉換邏輯。

編寫轉換程式碼

轉換程式碼是根據具有 implements 註解的函數介面編寫的。請參閱嵌入查閱的融合範例。從概念上講，轉換程式碼會將此介面的複合實作替換為融合實作。

在 prepare-composite-functions 傳遞中，外掛您的轉換程式碼。

在更進階的用法中，可以實作複合運算元運算元的複雜轉換，以便推導出融合運算元的運算元。請參閱Keras LSTM。轉換程式碼作為範例。

轉換為 TensorFlow Lite

使用 TFLiteConverter.from_saved_model API 轉換為 TensorFlow Lite。

底層原理

我們現在描述轉換為 TensorFlow Lite 中融合運算元的整體設計的高階細節。

在 TensorFlow 中組合運算元

使用具有 experimental_implements 函數屬性的 tf.function，使用者可以使用 TensorFlow 基本運算元明確組合新的運算元，並指定產生的複合運算元實作的介面。這非常有用，因為它提供

底層 TensorFlow 圖表中複合運算元的明確邊界。
明確指定此運算元實作的介面。tf.function 的引數對應於此介面的引數。

作為範例，讓我們考慮定義為實作嵌入查閱的複合運算元。這會對應到 TensorFlow Lite 中的融合運算元。

  @tf.function(
        experimental_implements="embedding_lookup")
    def EmbFprop(embs, ids_vec):
      """Embedding forward prop.

      Effectively, it computes:
        num = size of ids_vec
        rets = zeros([num, embedding dim])
        for i in range(num):
          rets[i, :] = embs[ids_vec[i], :]
        return rets

      Args:
        embs: The embedding matrix.
        ids_vec: A vector of int32 embedding ids.

      Returns:
        The result of embedding lookups. A matrix of shape
        [num ids in ids_vec, embedding dims].
      """
      num = tf.shape(ids_vec)[0]
      rets = inplace_ops.empty([num] + emb_shape_suf, py_utils.FPropDtype(p))

      def EmbFpropLoop(i, embs, ids_vec, rets):
        # row_id = ids_vec[i]
        row_id = tf.gather(ids_vec, i)
        # row = embs[row_id]
        row = tf.reshape(tf.gather(embs, row_id), [1] + emb_shape_suf)
        # rets[i] = row
        rets = inplace_ops.alias_inplace_update(rets, [i], row)
        return embs, ids_vec, rets

      _, _, rets = functional_ops.For(
          start=0,
          limit=num,
          delta=1,
          inputs=[embs, ids_vec, rets],
          body=EmbFpropLoop,
          rewrite_with_while=compiled)
      if len(weight_shape) > 2:
        rets = tf.reshape(rets, [num, symbolic.ToStatic(p.embedding_dim)])
      return rets

透過使模型透過 tf.function 使用複合運算元 (如上所示)，就可以建構通用基礎架構，以識別和轉換此類運算元為融合 TensorFlow Lite 運算元。

擴充 TensorFlow Lite 轉換器

今年稍早發佈的 TensorFlow Lite 轉換器僅支援將 TensorFlow 模型作為圖表匯入，其中所有變數都替換為其對應的常數值。這不適用於運算元融合，因為此類圖表已內嵌所有函數，以便將變數轉換為常數。

為了在轉換過程中利用具有 experimental_implements 功能的 tf.function，需要在轉換過程的後期保留函數。

因此，我們實作了在轉換器中匯入和轉換 TensorFlow 模型的新工作流程，以支援複合運算元融合的使用案例。具體而言，新增的新功能包括

這讓我們可以在函數內嵌和變數凍結之前，使用代表複合運算元的函數來執行運算元融合。

實作運算元融合

讓我們更詳細地查看運算元融合傳遞。此傳遞執行以下操作

迴圈遍歷 MLIR 模組中的所有函數。
如果函數具有 tf._implements 屬性，則根據屬性值，呼叫適當的運算元融合公用程式。
運算元融合公用程式對函數的運算元和屬性 (用作轉換介面) 進行運算，並將函數的主體替換為包含融合運算元的等效函數主體。
在許多情況下，替換的主體將包含融合運算元以外的運算元。這些對應於函數運算元上的一些靜態轉換，以便取得融合運算元的運算元。由於這些運算都可以恆定折疊，因此它們不會出現在匯出的平面緩衝區中，而平面緩衝區中只會存在融合運算元。

以下是傳遞中的程式碼片段，顯示了主要工作流程

void PrepareCompositeFunctionsPass::ConvertTFImplements(FuncOp func,
                                                        StringAttr attr) {
  if (attr.getValue() == "embedding_lookup") {
    func.eraseBody();
    func.addEntryBlock();
    // Convert the composite embedding_lookup function body to a
    // TFLite fused embedding_lookup op.
    ConvertEmbeddedLookupFunc convert_embedded_lookup(func);
    if (failed(convert_embedded_lookup.VerifySignature())) {
      return signalPassFailure();
    }
    convert_embedded_lookup.RewriteFunc();
  } else if (attr.getValue() == mlir::TFL::kKerasLstm) {
     func.eraseBody();
     func.addEntryBlock();
     OpBuilder builder(func.getBody());
     if (failed(ConvertKerasLSTMLayer(func, &builder))) {
       return signalPassFailure();
     }
  } else if (.....) /* Other fusions can plug in here */
}

以下是程式碼片段，顯示了如何利用函數作為轉換介面，將此複合運算元對應到 TensorFlow Lite 中的融合運算元。

void RewriteFunc() {
    Value lookup = func_.getArgument(1);
    Value value = func_.getArgument(0);
    auto output_type = func_.getType().getResult(0);

    OpBuilder builder(func_.getBody());
    auto op = builder.create<mlir::TFL::EmbeddingLookupOp>(
        func_.getLoc(), output_type, lookup, value);

    builder.create<mlir::ReturnOp>(func_.getLoc(), op.getResult());
  }

TensorFlow 運算元融合

總覽

什麼是融合運算元

融合運算元的優點

融合運算元的挑戰

從複合運算元轉換為 TFLite 自訂運算元 (建議做法)

將複合運算元包裝在 tf.function 中

實作自訂運算元並向 TFLite Interpreter 註冊

從複合運算元轉換為融合運算元 (進階)

將複合運算元包裝在 tf.function 中

編寫轉換程式碼

轉換為 TensorFlow Lite

底層原理

在 TensorFlow 中組合運算元

擴充 TensorFlow Lite 轉換器

實作運算元融合

將複合運算元包裝在 `tf.function` 中

將複合運算元包裝在 `tf.function` 中