注意： TensorFlow Lite 現在是 Google AI Edge 的一部分。最新文件現已移至 ai.google.dev/edge/lite。瞭解詳情

實作自訂委派

什麼是 TensorFlow Lite 委派？

TensorFlow Lite 委派可讓您在另一個執行器上執行模型 (部分或全部)。此機制可以利用各種裝置端加速器 (例如 GPU 或 Edge TPU (張量處理單元)) 進行推論。這為開發人員提供一種彈性且解耦的方法，使其擺脫預設 TFLite 以加速推論。

下圖總結了委派，更多詳細資訊請參閱以下章節。

TFLite Delegates

我應在何時建立自訂委派？

TensorFlow Lite 針對目標加速器 (例如 GPU、DSP、EdgeTPU) 和架構 (例如 Android NNAPI) 具有各種委派。

在下列情況下，建立您自己的委派非常有用

您想要整合任何現有委派都不支援的新 ML 推論引擎。
您有自訂硬體加速器，可改善已知情境的執行階段。
您正在開發 CPU 最佳化 (例如運算元融合)，以加速某些模型。

委派如何運作？

考量如下列所示的簡單模型圖，以及針對 Conv2D 和 Mean 運算具有更快實作的委派「MyDelegate」。

Original graph

套用此「MyDelegate」之後，原始 TensorFlow Lite 圖將更新如下

Graph with delegate

上述圖表是透過 TensorFlow Lite 遵循以下兩個規則分割原始圖表而取得

可由委派處理的特定運算會放入分割區中，同時仍滿足運算之間原始運算工作流程的依附關係。
每個待委派的分割區都只有委派未處理的輸入和輸出節點。

由委派處理的每個分割區都會由原始圖表中的委派節點 (也稱為委派核心) 取代，該節點會在叫用呼叫時評估分割區。

根據模型而定，最終圖表可能會以一或多個節點結尾，後者表示委派不支援某些運算元。一般來說，您不希望有多個委派處理的分割區，因為每次從委派切換到主要圖表時，都會產生將結果從委派子圖傳遞到主要圖表的額外負荷，這是由於記憶體複製 (例如，GPU 到 CPU) 所造成。當有大量記憶體複製時，此類額外負荷可能會抵銷效能增益。

實作您自己的自訂委派

新增委派的慣用方法是使用 SimpleDelegate API。

若要建立新的委派，您需要實作 2 個介面，並為介面方法提供您自己的實作。

1 - `SimpleDelegateInterface`

此類別代表委派的功能、支援哪些運算，以及用於建立封裝委派圖的核心的工廠類別。如需更多詳細資訊，請參閱此 C++ 標頭檔中定義的介面。程式碼中的註解詳細說明了每個 API。

2 - `SimpleDelegateKernelInterface`

此類別封裝了初始化/準備/和執行委派分割區的邏輯。

它具有：(請參閱定義)

Init(...)：將呼叫一次以執行任何一次性初始化。
Prepare(...)：針對此節點的每個不同執行個體呼叫 - 如果您有多個委派分割區，就會發生這種情況。通常您會想要在此處執行記憶體配置，因為每次調整張量大小時都會呼叫此方法。
Invoke(...)：將針對推論呼叫。

範例

在此範例中，您將建立一個非常簡單的委派，其只能支援 2 種運算類型 (ADD) 和 (SUB)，且僅限 float32 張量。

// MyDelegate implements the interface of SimpleDelegateInterface.
// This holds the Delegate capabilities.
class MyDelegate : public SimpleDelegateInterface {
 public:
  bool IsNodeSupportedByDelegate(const TfLiteRegistration* registration,
                                 const TfLiteNode* node,
                                 TfLiteContext* context) const override {
    // Only supports Add and Sub ops.
    if (kTfLiteBuiltinAdd != registration->builtin_code &&
        kTfLiteBuiltinSub != registration->builtin_code)
      return false;
    // This delegate only supports float32 types.
    for (int i = 0; i < node->inputs->size; ++i) {
      auto& tensor = context->tensors[node->inputs->data[i]];
      if (tensor.type != kTfLiteFloat32) return false;
    }
    return true;
  }

  TfLiteStatus Initialize(TfLiteContext* context) override { return kTfLiteOk; }

  const char* Name() const override {
    static constexpr char kName[] = "MyDelegate";
    return kName;
  }

  std::unique_ptr<SimpleDelegateKernelInterface> CreateDelegateKernelInterface()
      override {
    return std::make_unique<MyDelegateKernel>();
  }
};

接下來，透過從 SimpleDelegateKernelInterface 繼承來建立您自己的委派核心

// My delegate kernel.
class MyDelegateKernel : public SimpleDelegateKernelInterface {
 public:
  TfLiteStatus Init(TfLiteContext* context,
                    const TfLiteDelegateParams* params) override {
    // Save index to all nodes which are part of this delegate.
    inputs_.resize(params->nodes_to_replace->size);
    outputs_.resize(params->nodes_to_replace->size);
    builtin_code_.resize(params->nodes_to_replace->size);
    for (int i = 0; i < params->nodes_to_replace->size; ++i) {
      const int node_index = params->nodes_to_replace->data[i];
      // Get this node information.
      TfLiteNode* delegated_node = nullptr;
      TfLiteRegistration* delegated_node_registration = nullptr;
      TF_LITE_ENSURE_EQ(
          context,
          context->GetNodeAndRegistration(context, node_index, &delegated_node,
                                          &delegated_node_registration),
          kTfLiteOk);
      inputs_[i].push_back(delegated_node->inputs->data[0]);
      inputs_[i].push_back(delegated_node->inputs->data[1]);
      outputs_[i].push_back(delegated_node->outputs->data[0]);
      builtin_code_[i] = delegated_node_registration->builtin_code;
    }
    return kTfLiteOk;
  }

  TfLiteStatus Prepare(TfLiteContext* context, TfLiteNode* node) override {
    return kTfLiteOk;
  }

  TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) override {
    // Evaluate the delegated graph.
    // Here we loop over all the delegated nodes.
    // We know that all the nodes are either ADD or SUB operations and the
    // number of nodes equals ''inputs_.size()'' and inputs[i] is a list of
    // tensor indices for inputs to node ''i'', while outputs_[i] is the list of
    // outputs for node
    // ''i''. Note, that it is intentional we have simple implementation as this
    // is for demonstration.

    for (int i = 0; i < inputs_.size(); ++i) {
      // Get the node input tensors.
      // Add/Sub operation accepts 2 inputs.
      auto& input_tensor_1 = context->tensors[inputs_[i][0]];
      auto& input_tensor_2 = context->tensors[inputs_[i][1]];
      auto& output_tensor = context->tensors[outputs_[i][0]];
      TF_LITE_ENSURE_EQ(
          context,
          ComputeResult(context, builtin_code_[i], &input_tensor_1,
                        &input_tensor_2, &output_tensor),
          kTfLiteOk);
    }
    return kTfLiteOk;
  }

 private:
  // Computes the result of addition of 'input_tensor_1' and 'input_tensor_2'
  // and store the result in 'output_tensor'.
  TfLiteStatus ComputeResult(TfLiteContext* context, int builtin_code,
                             const TfLiteTensor* input_tensor_1,
                             const TfLiteTensor* input_tensor_2,
                             TfLiteTensor* output_tensor) {
    if (NumElements(input_tensor_1) != NumElements(input_tensor_2) ||
        NumElements(input_tensor_1) != NumElements(output_tensor)) {
      return kTfLiteDelegateError;
    }
    // This code assumes no activation, and no broadcasting needed (both inputs
    // have the same size).
    auto* input_1 = GetTensorData<float>(input_tensor_1);
    auto* input_2 = GetTensorData<float>(input_tensor_2);
    auto* output = GetTensorData<float>(output_tensor);
    for (int i = 0; i < NumElements(input_tensor_1); ++i) {
      if (builtin_code == kTfLiteBuiltinAdd)
        output[i] = input_1[i] + input_2[i];
      else
        output[i] = input_1[i] - input_2[i];
    }
    return kTfLiteOk;
  }

  // Holds the indices of the input/output tensors.
  // inputs_[i] is list of all input tensors to node at index 'i'.
  // outputs_[i] is list of all output tensors to node at index 'i'.
  std::vector<std::vector<int>> inputs_, outputs_;
  // Holds the builtin code of the ops.
  // builtin_code_[i] is the type of node at index 'i'
  std::vector<int> builtin_code_;
};

基準化並評估新委派

TFLite 有一組工具，您可以針對 TFLite 模型快速測試。

模型基準化工具：此工具採用 TFLite 模型、產生隨機輸入，然後針對指定的執行次數重複執行模型。它會在結尾印出彙總延遲統計資訊。
推論差異工具：針對給定的模型，此工具會產生隨機高斯資料，並將其傳遞到兩個不同的 TFLite 解譯器，一個執行單執行緒 CPU 核心，另一個使用使用者定義的規格。它會測量每個解譯器輸出張量之間的絕對差異 (以每個元素為基礎)。此工具也有助於偵錯精確度問題。
還有特定於工作的評估工具，適用於影像分類和物件偵測。可以在此處找到這些工具

此外，TFLite 有大量的核心和運算單元測試，可以重複使用以更全面地測試新委派，並確保常規 TFLite 執行路徑未中斷。

若要達成針對新委派重複使用 TFLite 測試和工具，您可以使用下列兩個選項中的任何一個

使用委派註冊器機制。
使用外部委派機制。

選擇最佳方法

兩種方法都需要進行一些變更，詳情如下。但是，第一種方法會靜態連結委派，且需要重建測試、基準化和評估工具。相反地，第二種方法會將委派作為共用程式庫，且需要您從共用程式庫公開建立/刪除方法。

因此，外部委派機制將與 TFLite 的預先建構的 Tensorflow Lite 工具二進位檔搭配運作。但它不太明確，且在自動化整合測試中可能更難設定。使用委派註冊器方法可獲得更佳的清晰度。

選項 1：利用委派註冊器

委派註冊器會保留委派提供者的清單，每個提供者都提供一種根據命令列旗標輕鬆建立 TFLite 委派的方法，因此對於工具來說很方便。若要將新委派插入上述所有 Tensorflow Lite 工具，您首先要建立新的委派提供者，例如這個委派提供者，然後僅對 BUILD 規則進行少量變更。以下顯示此整合過程的完整範例 (程式碼可在此處找到)。

假設您有一個實作 SimpleDelegate API 的委派，以及如下所示的建立/刪除此「虛擬」委派的 extern "C" API

// Returns default options for DummyDelegate.
DummyDelegateOptions TfLiteDummyDelegateOptionsDefault();

// Creates a new delegate instance that need to be destroyed with
// `TfLiteDummyDelegateDelete` when delegate is no longer used by TFLite.
// When `options` is set to `nullptr`, the above default values are used:
TfLiteDelegate* TfLiteDummyDelegateCreate(const DummyDelegateOptions* options);

// Destroys a delegate created with `TfLiteDummyDelegateCreate` call.
void TfLiteDummyDelegateDelete(TfLiteDelegate* delegate);

若要將「DummyDelegate」與基準化工具和推論工具整合，請定義 DelegateProvider 如下

class DummyDelegateProvider : public DelegateProvider {
 public:
  DummyDelegateProvider() {
    default_params_.AddParam("use_dummy_delegate",
                             ToolParam::Create<bool>(false));
  }

  std::vector<Flag> CreateFlags(ToolParams* params) const final;

  void LogParams(const ToolParams& params) const final;

  TfLiteDelegatePtr CreateTfLiteDelegate(const ToolParams& params) const final;

  std::string GetName() const final { return "DummyDelegate"; }
};
REGISTER_DELEGATE_PROVIDER(DummyDelegateProvider);

std::vector<Flag> DummyDelegateProvider::CreateFlags(ToolParams* params) const {
  std::vector<Flag> flags = {CreateFlag<bool>("use_dummy_delegate", params,
                                              "use the dummy delegate.")};
  return flags;
}

void DummyDelegateProvider::LogParams(const ToolParams& params) const {
  TFLITE_LOG(INFO) << "Use dummy test delegate : ["
                   << params.Get<bool>("use_dummy_delegate") << "]";
}

TfLiteDelegatePtr DummyDelegateProvider::CreateTfLiteDelegate(
    const ToolParams& params) const {
  if (params.Get<bool>("use_dummy_delegate")) {
    auto default_options = TfLiteDummyDelegateOptionsDefault();
    return TfLiteDummyDelegateCreateUnique(&default_options);
  }
  return TfLiteDelegatePtr(nullptr, [](TfLiteDelegate*) {});
}

BUILD 規則定義很重要，因為您需要確保程式庫始終連結，且不會被最佳化工具捨棄。

#### The following are for using the dummy test delegate in TFLite tooling ####
cc_library(
    name = "dummy_delegate_provider",
    srcs = ["dummy_delegate_provider.cc"],
    copts = tflite_copts(),
    deps = [
        ":dummy_delegate",
        "//tensorflow/lite/tools/delegates:delegate_provider_hdr",
    ],
    alwayslink = 1, # This is required so the optimizer doesn't optimize the library away.
)

現在，在您的 BUILD 檔案中新增這兩個包裝函式規則，以建立基準化工具和推論工具以及其他評估工具的版本，這些工具可以使用您自己的委派執行。

cc_binary(
    name = "benchmark_model_plus_dummy_delegate",
    copts = tflite_copts(),
    linkopts = task_linkopts(),
    deps = [
        ":dummy_delegate_provider",
        "//tensorflow/lite/tools/benchmark:benchmark_model_main",
    ],
)

cc_binary(
    name = "inference_diff_plus_dummy_delegate",
    copts = tflite_copts(),
    linkopts = task_linkopts(),
    deps = [
        ":dummy_delegate_provider",
        "//tensorflow/lite/tools/evaluation/tasks:task_executor_main",
        "//tensorflow/lite/tools/evaluation/tasks/inference_diff:run_eval_lib",
    ],
)

cc_binary(
    name = "imagenet_classification_eval_plus_dummy_delegate",
    copts = tflite_copts(),
    linkopts = task_linkopts(),
    deps = [
        ":dummy_delegate_provider",
        "//tensorflow/lite/tools/evaluation/tasks:task_executor_main",
        "//tensorflow/lite/tools/evaluation/tasks/imagenet_image_classification:run_eval_lib",
    ],
)

cc_binary(
    name = "coco_object_detection_eval_plus_dummy_delegate",
    copts = tflite_copts(),
    linkopts = task_linkopts(),
    deps = [
        ":dummy_delegate_provider",
        "//tensorflow/lite/tools/evaluation/tasks:task_executor_main",
        "//tensorflow/lite/tools/evaluation/tasks/coco_object_detection:run_eval_lib",
    ],
)

您也可以將此委派提供者插入 TFLite 核心測試中，如此處所述。

選項 2：利用外部委派

在此替代方案中，您首先建立外部委派配接器 external_delegate_adaptor.cc，如下所示。請注意，相較於選項 1，此方法稍微較不理想，如前述。

TfLiteDelegate* CreateDummyDelegateFromOptions(char** options_keys,
                                               char** options_values,
                                               size_t num_options) {
  DummyDelegateOptions options = TfLiteDummyDelegateOptionsDefault();

  // Parse key-values options to DummyDelegateOptions.
  // You can achieve this by mimicking them as command-line flags.
  std::unique_ptr<const char*> argv =
      std::unique_ptr<const char*>(new const char*[num_options + 1]);
  constexpr char kDummyDelegateParsing[] = "dummy_delegate_parsing";
  argv.get()[0] = kDummyDelegateParsing;

  std::vector<std::string> option_args;
  option_args.reserve(num_options);
  for (int i = 0; i < num_options; ++i) {
    option_args.emplace_back("--");
    option_args.rbegin()->append(options_keys[i]);
    option_args.rbegin()->push_back('=');
    option_args.rbegin()->append(options_values[i]);
    argv.get()[i + 1] = option_args.rbegin()->c_str();
  }

  // Define command-line flags.
  // ...
  std::vector<tflite::Flag> flag_list = {
      tflite::Flag::CreateFlag(...),
      ...,
      tflite::Flag::CreateFlag(...),
  };

  int argc = num_options + 1;
  if (!tflite::Flags::Parse(&argc, argv.get(), flag_list)) {
    return nullptr;
  }

  return TfLiteDummyDelegateCreate(&options);
}

#ifdef __cplusplus
extern "C" {
#endif  // __cplusplus

// Defines two symbols that need to be exported to use the TFLite external
// delegate. See tensorflow/lite/delegates/external for details.
TFL_CAPI_EXPORT TfLiteDelegate* tflite_plugin_create_delegate(
    char** options_keys, char** options_values, size_t num_options,
    void (*report_error)(const char*)) {
  return tflite::tools::CreateDummyDelegateFromOptions(
      options_keys, options_values, num_options);
}

TFL_CAPI_EXPORT void tflite_plugin_destroy_delegate(TfLiteDelegate* delegate) {
  TfLiteDummyDelegateDelete(delegate);
}

#ifdef __cplusplus
}
#endif  // __cplusplus

現在，建立對應的 BUILD 目標以建構動態程式庫，如下所示

cc_binary(
    name = "dummy_external_delegate.so",
    srcs = [
        "external_delegate_adaptor.cc",
    ],
    linkshared = 1,
    linkstatic = 1,
    deps = [
        ":dummy_delegate",
        "//tensorflow/lite/c:common",
        "//tensorflow/lite/tools:command_line_flags",
        "//tensorflow/lite/tools:logging",
    ],
)

建立此外部委派 .so 檔案之後，您就可以建構二進位檔或使用預先建構的二進位檔，以使用新委派執行，只要二進位檔與支援命令列旗標的external_delegate_provider程式庫連結即可，如此處所述。注意：此外部委派提供者已連結到現有的測試和工具二進位檔。

請參閱此處的說明，以瞭解如何透過此外部委派方法基準化虛擬委派。您可以使用類似的命令來執行先前提及的測試和評估工具。

值得注意的是，外部委派是 Tensorflow Lite Python 繫結中委派的對應 C++ 實作，如此處所示。因此，此處建立的動態外部委派配接器程式庫可以直接與 Tensorflow Lite Python API 搭配使用。

資源

夜間預先建構的 TFLite 工具二進位檔的下載連結

作業系統	架構	BINARY_NAME
Linux	x86_64	benchmark_model inference_diff imagenet_image_classification_eval coco_object_detection_eval
	arm	benchmark_model inference_diff imagenet_image_classification_eval coco_object_detection_eval
	aarch64	benchmark_model inference_diff imagenet_image_classification_eval coco_object_detection_eval
Android	arm	benchmark_model benchmark_model.apk inference_diff imagenet_image_classification_eval coco_object_detection_eval
Android	aarch64	benchmark_model benchmark_model.apk inference_diff imagenet_image_classification_eval coco_object_detection_eval