[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

4570235 · 2025-01-09T13:34:46Z

I exported two models from ai-hub by the following commands:

python -m qai_hub_models.models.quicksrnetmedium.export --width 960 --height 540 --scale-factor 2
python -m qai_hub_models.models.quicksrnetmedium_quantized.export --width 960 --height 540 --scale-factor 2

Then I integrated these models into the Super Resolution Sample App from the ai-hub-apps, and tested the App on a phone with the Snapdragon 8 Gen 3 chipset. However, I could not reproduce the expected inference time as shown in AI Hub. The gap is quite significant.

Model	inference time on my phone (Gen 3)	inference time on AI Hub (Gen 3)
quicksrnetmedium_quantized_540x960_2x.tflite	41	17.3
quicksrnetmedium_540x960_2x.tflite	32	19.5

Moreover, on my phone, the inference time of the quantized model was substantially longer than that of the unquantized model, which came as a surprise.

So, my question is how can i optimize the sample app to improve the performance?

The execution results of the export script are attached for your convenience.

(myenv) ➜  ai-hub-models git:(handley) ✗ python -m qai_hub_models.models.quicksrnetmedium.export --width 960 --height 540 --scale-factor 2
Optimizing model quicksrnetmedium to run on-device
Uploading tmpsp6xgvvq.pt
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 230k/230k [00:01<00:00, 197kB/s]
Scheduled compile job (jp14r432p) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jp14r432p/

Profiling model quicksrnetmedium on a hosted device.
Waiting for compile job (jp14r432p) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
Scheduled profile job (j57yqy6l5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/j57yqy6l5/

Running inference for quicksrnetmedium on a hosted device with example inputs.
Uploading dataset: 5.34MB [00:02, 2.38MB/s]
Scheduled inference job (jp4lzl8v5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jp4lzl8v5/

quicksrnetmedium.tflite: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 203k/203k [00:00<00:00, 413kB/s]
Downloaded model to /Users/handleychen/Github/quic/ai-hub-models/build/quicksrnetmedium/quicksrnetmedium.tflite
Waiting for profile job (j57yqy6l5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS

------------------------------------------------------------
Performance results on-device for Quicksrnetmedium.
------------------------------------------------------------
Device                          : Samsung Galaxy S24 (Family) (14)
Runtime                         : TFLITE
Estimated inference time (ms)   : 19.5
Estimated peak memory usage (MB): [46, 77]
Total # Ops                     : 17
Compute Unit(s)                 : NPU (14 ops) CPU (3 ops)
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/j57yqy6l5/

Waiting for inference job (jp4lzl8v5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
tmpy_dieez7.h5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.9M/12.9M [00:02<00:00, 6.37MB/s]

Comparing on-device vs. local-cpu inference for Quicksrnetmedium.
+----------------+--------------------+--------+
| output_name    | shape              |   psnr |
+================+====================+========+
| upscaled_image | (1, 1080, 1920, 3) |  64.89 |
+----------------+--------------------+--------+

- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.

More details: https://app.aihub.qualcomm.com/jobs/jp4lzl8v5/

Run compiled model on a hosted device on sample data using:
python /Users/handleychen/Github/quic/ai-hub-models/qai_hub_models/models/quicksrnetmedium/demo.py --on-device --hub-model-id mqe7k2kvm --device "Samsung Galaxy S24 (Family)"

(myenv) ➜  ai-hub-models git:(handley) ✗

(myenv) ➜  ai-hub-models git:(handley) ✗ python -m qai_hub_models.models.quicksrnetmedium_quantized.export --width 960 --height 540 --scale-factor 2
Quantizing model quicksrnetmedium_quantized with 100 samples.
Uploading tmpsg2di0tj.pt
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 230k/230k [00:01<00:00, 192kB/s]
Scheduled compile job (jgz313vk5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgz313vk5/

Uploading dataset: 142MB [00:17, 8.46MB/s]
Scheduled quantize job (jgdxjx0ep) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgdxjx0ep/

Optimizing model quicksrnetmedium_quantized to run on-device
Waiting for quantize job (jgdxjx0ep) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
Scheduled compile job (jpxkwkm15) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jpxkwkm15/

Profiling model quicksrnetmedium_quantized on a hosted device.
Scheduled profile job (j5mnjn4wp) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/j5mnjn4wp/

Running inference for quicksrnetmedium_quantized on a hosted device with example inputs.
Uploading dataset: 5.34MB [00:01, 2.90MB/s]
Scheduled inference job (jgn6j6xr5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgn6j6xr5/

quicksrnetmedium_quantized.tflite: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63.0k/63.0k [00:00<00:00, 329kB/s]
Downloaded model to /Users/handleychen/Github/quic/ai-hub-models/build/quicksrnetmedium_quantized/quicksrnetmedium_quantized.tflite
Waiting for profile job (j5mnjn4wp) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS

------------------------------------------------------------
Performance results on-device for Quicksrnetmedium_Quantized.
------------------------------------------------------------
Device                          : Samsung Galaxy S24 (Family) (14)
Runtime                         : TFLITE
Estimated inference time (ms)   : 17.3
Estimated peak memory usage (MB): [5, 33]
Total # Ops                     : 19
Compute Unit(s)                 : NPU (16 ops) CPU (3 ops)
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/j5mnjn4wp/

Waiting for inference job (jgn6j6xr5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
tmpm599kno5.h5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.53M/9.53M [00:01<00:00, 5.50MB/s]

Comparing on-device vs. local-cpu inference for Quicksrnetmedium_Quantized.
+----------------+--------------------+--------+
| output_name    | shape              |   psnr |
+================+====================+========+
| upscaled_image | (1, 1080, 1920, 3) |  26.38 |
+----------------+--------------------+--------+

- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.

More details: https://app.aihub.qualcomm.com/jobs/jgn6j6xr5/

Run compiled model on a hosted device on sample data using:
python /Users/handleychen/Github/quic/ai-hub-models/qai_hub_models/models/quicksrnetmedium_quantized/demo.py --on-device --hub-model-id mmr3poprm --device "Samsung Galaxy S24 (Family)"

(myenv) ➜  ai-hub-models git:(handley) ✗

The text was updated successfully, but these errors were encountered:

4570235 · 2025-01-10T02:53:16Z

Additional Information:

I have examined the configuration of the sample app and made sure every option was the same as shown on the job page, such as the SDK version, TensorFlow Lite Options and QNN Delegate Options.
The inference time on my phone was calculated after warmup running and measured as the average of multiple iterations to avoid bias.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

4570235 commented Jan 9, 2025 •

edited

Loading

4570235 commented Jan 10, 2025 •

edited

Loading

[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

Comments

4570235 commented Jan 9, 2025 • edited Loading

4570235 commented Jan 10, 2025 • edited Loading

4570235 commented Jan 9, 2025 •

edited

Loading

4570235 commented Jan 10, 2025 •

edited

Loading