Join our Discord server for community support and discussions Icon description

Using ONNX BERT Model for Text-based Q&A in your Mobile .NET Apps

In the first article of this series, I explained how to load, evaluate and convert a PyTorch-trained BERT QnA model to an ONNX-compatible format. In the second article, I explained how to load an ONNX Model, perform processing tasks on both the input and output and connect the prediction pipeline to the frontend UI. In this article, I will give a high-level overview of how to load the converted ONNX BERT model trained using PyTorch.

Getting Started

The BERT QnA model aims to recognize the context in each text input by looking in both forward & backward directions and provide a text-based human-readable answer as output. This code base provides a more interactive feel by allowing users to input text as a question and context, and the BERT ONNX model will reply with a text-based answer.

NuGet Packages Used

The following NuGet packages were installed in the Uno platform application project for this code base.

  1. Microsoft ML OnnxRuntime

  2. Skia Sharp

  3. BERT Tokenizers

INFERENCING WITH PyTorchBertQnA


The TensorFlowMNISTClassifier C# file houses the classbased code for loading, processing, and prediction tasks. 

STEP 1: Load Embedded Resources

In the 1st article of this series, we mentioned the large .onnx file that was created would need to be added to the Uno Platform codebase for the Example to work fully. To achieve this, you will need to add either the generated .onnx file obtained from running the Jupyter notebook in Part 1 of this series or download the .onnx file using this link to the Content folder as shown below:

Once you have added the .onnx file, you need to change the build action to embedded resource as show below:

In the PyTorchBertQnA C# file, a private function called InitTask handles the fetching, loading from the memory stream and storing in appropriate C# variables of the ONNX BERT model stored as an embedded resource. With the byte array obtained from loading the model, an ONNX Runtime inference session is created, which will be used to make the prediction in subsequent code snippets.

STEP 2: Pre Processing the Input

The function called PreProcessInput accepts 2 parameters, both of type string meant to provide the question being asked and the context from which the answer can be gleaned.

From the code snippet above, the first task combines the question and context strings by concatenating both into JSON strings. Once this is done, I used a BertUncasedLargeTokenizer to tokenize the JSON string into tokens and extract their encodings. The last step was to encapsulate the encodings input ids, attention masks and type ids in a model I created and return it to the function.

The three parameters of the model shown above can be used to create the 3 Dense Tensor inputs needed by the ONNX BERT MODEL

STEP 3: Prediction

After creating the Dense Tensors as inputs, these inputs are fed to the inference session, and the output is the results that will need to be queried to obtain the human-readable value, as shown by the code snippet below.

STEP 4: Post Processing

The interpretation of results is achieved in this step. Firstly, the start and end Logits are extracted as a list of floats. This is followed by obtaining the indices of the max value on both lists. Once the indices are obtained, a three-level LINQ query is used to select the predicted tokens and store them in a List of strings which are then converted to a string passed to the frontend. The code snippet below illustrates this:

STEP 5: Connecting to the Frontend

Building up from the previous article, I used the 2nd tab bar items to connect the inference session for this article with the frontend. The UX for the Bert QnA NLP comprehension sample is a three-rowed Grid consisting of a text block for displaying the page description and a vertical stack panel housing two text boxes, one for inputting the question and the other for inputting the context. The last row houses the button used to execute the inference session. You can use ChatGBT to get context summaries for this sample to try it out. Below are the relevant screenshots.

About Uno Platform

For those new to the Uno Platform, it allows for creating pixel-perfect, single-source C# and XAML apps that run natively on Windows, iOS, Android, macOS, Linux and Web via WebAssembly. In addition, it offers Figma integration for design-development handoff and a set of extensions to bootstrap your projects. Uno Platform is free, open-source (Apache 2.0), and available on GitHub.

Next Steps

To upgrade to the latest release of Uno Platform, please update your packages to 4.6 via your Visual Studio NuGet package manager! If you are new to Uno Platform, following our official getting started guide is the best way to get started. (5 min to complete)

Share this post:

Uno Platform 5.2 LIVE Webinar – Today at 3 PM EST – Watch