MLPerf Tiny — K230 DUT Implementation¶

A DUT (Device Under Test) implementation for measuring K230 KPU inference performance using the MLPerf Tiny benchmark framework.

Currently supports Image Classification (CIFAR-10, ResNet-8).

K230 and MLPerf Tiny

MLPerf Tiny typically targets 10-250MHz / <50mW class MCUs. The K230 falls outside this category, but by implementing a DUT conforming to the submitter API, we can reuse the standard measurement procedures provided by the official harness.

Prerequisites¶

K230 SDK must be built (toolchain extracted, MPP libraries compiled)
SDK placed at k230_sdk/ in the repository root
CMake 3.16 or later
UART connection (115200 bps) — for communication with the MLPerf Tiny legacy harness

Building the SDK

For K230 SDK build instructions, see SDK Build.

Overall Workflow¶

[Host PC]                         [K230 DUT]
                                    |
1. git submodule update             |
2. cmake configure/build            |
3. deploy (SCP)                     |
                                    |
4. Start DUT (UART)          -->  main loop
                                    |
5. runner (Python)           -->  UART command processing
   name%                      <--  m-name-dut-[...]
   db load N%                 <--  m-[Expecting N bytes]
   db HEXDATA%                <--  m-load-done
   infer N W%                 <--  m-results-[...]
   results%                   <--  m-results-[...]

Build Instructions¶

1. Fetch submodule¶

git submodule update --init mlperf_tiny

2. Configure¶

cmake -B build/mlperf_tiny -S apps/mlperf_tiny \
  -DCMAKE_TOOLCHAIN_FILE="$(pwd)/cmake/toolchain-k230-rtsmart.cmake"

3. Build¶

cmake --build build/mlperf_tiny

4. Verify¶

file build/mlperf_tiny/mlperf_tiny

Expected output:

mlperf_tiny: ELF 64-bit LSB executable, UCB RISC-V, ...

Deploying and Running on K230¶

deploy target¶

Build, convert kmodel, and transfer in one step:

cmake --build build/mlperf_tiny --target deploy

The deploy target depends on the kmodel target (convert_kmodel.py TFLite → kmodel conversion), so the kmodel is automatically generated if not yet present.

Manual transfer¶

scp build/mlperf_tiny/mlperf_tiny root@<K230_IP>:/sharefs/mlperf_tiny/
scp /path/to/model.kmodel root@<K230_IP>:/sharefs/mlperf_tiny/model.kmodel

Running on K230 bigcore (msh)¶

msh /> /sharefs/mlperf_tiny/mlperf_tiny /sharefs/mlperf_tiny/model.kmodel

Expected output on successful startup:

m-timestamp-mode-performance
m-lap-us-XXXXXXXX
m-init-done
m-ready

Manual UART Command Testing¶

Connect to the bigcore serial port (/dev/ttyACM1, 115200 bps) using minicom or similar, and send the following commands (each terminated with %):

Command	Description	Expected Response
`name%`	Show device name	`m-name-dut-[unspecified]`
`profile%`	Show profile	`m-profile-[...]` / `m-model-[ic01]`
`help%`	Show help	Command list
`db load 3072%`	Allocate input buffer (32x32x3)	`m-[Expecting 3072 bytes]`
`infer 1 0%`	Run 1 inference (0 warmup)	`m-results-[...]`
`results%`	Show last results	`m-results-[...]`

Runner-based Measurement¶

Use the MLPerf Tiny Python runner for automated measurement:

cd mlperf_tiny/benchmark/runner
python main.py --port /dev/ttyACM1 --baud 115200

Runner requirements

The runner is part of the legacy UART harness. MLCommons is transitioning to a new runner, so procedures may change in future versions.

CMake Targets¶

Target	Command	Description
(default)	`cmake --build build/mlperf_tiny`	Build C++ binary
`deploy`	`cmake --build build/mlperf_tiny --target deploy`	Build + SCP transfer to K230
`run`	`cmake --build build/mlperf_tiny --target run`	Execute on K230 via serial

CMake Options¶

Variable	Default	Description
`MLPERF_BENCHMARK`	`ic`	Benchmark type
`MLPERF_KMODEL`	`build/.../model.kmodel`	Path to kmodel file for deployment (auto-generated by kmodel target)

Source Files¶

File	Description
`src/main.cc`	Entry point — kmodel path argument, UART main loop
`src/submitter_implemented.cc`	K230/nncase implementation of th_* functions

Troubleshooting¶

UART Communication Failure¶

Verify baud rate is 115200 bps
Verify using the bigcore serial port (/dev/ttyACM1)
Ensure minicom/picocom is not occupying the port

kmodel Load Failure¶

Verify the kmodel file path is correct
Check nncase version compatibility with the kmodel

VB Initialization Failure¶

The current implementation omits VB initialization
If the nncase runtime requires VB, add VB configuration to InitPlatform() in submitter_implemented.cc

kmodel Conversion¶

Use convert_kmodel.py to convert a TFLite model to a kmodel for the K230 KPU. The conversion pipeline is a two-stage process: TFLite → ONNX → kmodel.

Install dependencies¶

pip install tf2onnx tensorflow-cpu onnxsim nncase nncase-kpu

Run conversion¶

python convert_kmodel.py

This script performs the following steps:

Converts the TFLite model to ONNX format using tf2onnx
Optimizes the ONNX model with onnxsim
Compiles the ONNX model to kmodel using the nncase compiler (targeting K230 KPU)

The generated kmodel file can be deployed to the K230 using the deploy procedure described above.

Golden Inference Test¶

golden_test.py compares TFLite reference inference results against K230 DUT inference results to verify the correctness of model conversion and device implementation.

Usage¶

python golden_test.py

How it works¶

Retrieves input images from the CIFAR-10 test dataset
Runs reference inference using the TFLite interpreter
Automatically launches the K230 DUT and sends the same inputs via UART
Compares DUT inference results against reference results
Reports accuracy and agreement metrics

The DUT launch and communication are fully automated — no manual DUT startup is required.

Runner-based Benchmark¶

run_benchmark.py runs a standard accuracy benchmark using the upstream MLPerf Tiny runner.

Usage¶

python run_benchmark.py

How it works¶

Generates an IC (Image Classification) evaluation dataset from CIFAR-10
Runs a 200-sample accuracy benchmark
Performs measurement conforming to the upstream MLPerf Tiny runner protocol

Results Summary¶

Metric	Result	Target
Accuracy	87.5%	85%
Latency	~2.3ms	—
Agreement with reference	99%	—

Accuracy achieved 87.5%, exceeding the 85% target
Inference latency of approximately 2.3ms demonstrates the K230 KPU's high-speed inference capability
99% agreement with the TFLite reference confirms the correctness of the kmodel conversion and DUT implementation