Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How can I reproduce the expected inference time shown in AI Hub with the ai-hub-apps? #149

Open
4570235 opened this issue Jan 9, 2025 · 1 comment

Comments

@4570235
Copy link

4570235 commented Jan 9, 2025

I exported two models from ai-hub by the following commands:

python -m qai_hub_models.models.quicksrnetmedium.export --width 960 --height 540 --scale-factor 2
python -m qai_hub_models.models.quicksrnetmedium_quantized.export --width 960 --height 540 --scale-factor 2

Then I integrated these models into the Super Resolution Sample App from the ai-hub-apps, and tested the App on a phone with the Snapdragon 8 Gen 3 chipset. However, I could not reproduce the expected inference time as shown in AI Hub. The gap is quite significant.

Model inference time on my phone (Gen 3) inference time on AI Hub (Gen 3)
quicksrnetmedium_quantized_540x960_2x.tflite 41 17.3
quicksrnetmedium_540x960_2x.tflite 32 19.5

Moreover, on my phone, the inference time of the quantized model was substantially longer than that of the unquantized model, which came as a surprise.

So, my question is how can i optimize the sample app to improve the performance?

The execution results of the export script are attached for your convenience.

(myenv) ➜  ai-hub-models git:(handley) ✗ python -m qai_hub_models.models.quicksrnetmedium.export --width 960 --height 540 --scale-factor 2
Optimizing model quicksrnetmedium to run on-device
Uploading tmpsp6xgvvq.pt
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 230k/230k [00:01<00:00, 197kB/s]
Scheduled compile job (jp14r432p) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jp14r432p/

Profiling model quicksrnetmedium on a hosted device.
Waiting for compile job (jp14r432p) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
Scheduled profile job (j57yqy6l5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/j57yqy6l5/

Running inference for quicksrnetmedium on a hosted device with example inputs.
Uploading dataset: 5.34MB [00:02, 2.38MB/s]
Scheduled inference job (jp4lzl8v5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jp4lzl8v5/

quicksrnetmedium.tflite: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 203k/203k [00:00<00:00, 413kB/s]
Downloaded model to /Users/handleychen/Github/quic/ai-hub-models/build/quicksrnetmedium/quicksrnetmedium.tflite
Waiting for profile job (j57yqy6l5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS

------------------------------------------------------------
Performance results on-device for Quicksrnetmedium.
------------------------------------------------------------
Device                          : Samsung Galaxy S24 (Family) (14)
Runtime                         : TFLITE
Estimated inference time (ms)   : 19.5
Estimated peak memory usage (MB): [46, 77]
Total # Ops                     : 17
Compute Unit(s)                 : NPU (14 ops) CPU (3 ops)
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/j57yqy6l5/

Waiting for inference job (jp4lzl8v5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
tmpy_dieez7.h5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.9M/12.9M [00:02<00:00, 6.37MB/s]

Comparing on-device vs. local-cpu inference for Quicksrnetmedium.
+----------------+--------------------+--------+
| output_name    | shape              |   psnr |
+================+====================+========+
| upscaled_image | (1, 1080, 1920, 3) |  64.89 |
+----------------+--------------------+--------+

- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.

More details: https://app.aihub.qualcomm.com/jobs/jp4lzl8v5/

Run compiled model on a hosted device on sample data using:
python /Users/handleychen/Github/quic/ai-hub-models/qai_hub_models/models/quicksrnetmedium/demo.py --on-device --hub-model-id mqe7k2kvm --device "Samsung Galaxy S24 (Family)"

(myenv) ➜  ai-hub-models git:(handley) ✗
(myenv) ➜  ai-hub-models git:(handley) ✗ python -m qai_hub_models.models.quicksrnetmedium_quantized.export --width 960 --height 540 --scale-factor 2
Quantizing model quicksrnetmedium_quantized with 100 samples.
Uploading tmpsg2di0tj.pt
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 230k/230k [00:01<00:00, 192kB/s]
Scheduled compile job (jgz313vk5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgz313vk5/

Uploading dataset: 142MB [00:17, 8.46MB/s]
Scheduled quantize job (jgdxjx0ep) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgdxjx0ep/

Optimizing model quicksrnetmedium_quantized to run on-device
Waiting for quantize job (jgdxjx0ep) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
Scheduled compile job (jpxkwkm15) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jpxkwkm15/

Profiling model quicksrnetmedium_quantized on a hosted device.
Scheduled profile job (j5mnjn4wp) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/j5mnjn4wp/

Running inference for quicksrnetmedium_quantized on a hosted device with example inputs.
Uploading dataset: 5.34MB [00:01, 2.90MB/s]
Scheduled inference job (jgn6j6xr5) successfully. To see the status and results:
    https://app.aihub.qualcomm.com/jobs/jgn6j6xr5/

quicksrnetmedium_quantized.tflite: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63.0k/63.0k [00:00<00:00, 329kB/s]
Downloaded model to /Users/handleychen/Github/quic/ai-hub-models/build/quicksrnetmedium_quantized/quicksrnetmedium_quantized.tflite
Waiting for profile job (j5mnjn4wp) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS

------------------------------------------------------------
Performance results on-device for Quicksrnetmedium_Quantized.
------------------------------------------------------------
Device                          : Samsung Galaxy S24 (Family) (14)
Runtime                         : TFLITE
Estimated inference time (ms)   : 17.3
Estimated peak memory usage (MB): [5, 33]
Total # Ops                     : 19
Compute Unit(s)                 : NPU (16 ops) CPU (3 ops)
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/j5mnjn4wp/

Waiting for inference job (jgn6j6xr5) completion. Type Ctrl+C to stop waiting at any time.
    ✅ SUCCESS
tmpm599kno5.h5: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.53M/9.53M [00:01<00:00, 5.50MB/s]

Comparing on-device vs. local-cpu inference for Quicksrnetmedium_Quantized.
+----------------+--------------------+--------+
| output_name    | shape              |   psnr |
+================+====================+========+
| upscaled_image | (1, 1080, 1920, 3) |  26.38 |
+----------------+--------------------+--------+

- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.

More details: https://app.aihub.qualcomm.com/jobs/jgn6j6xr5/

Run compiled model on a hosted device on sample data using:
python /Users/handleychen/Github/quic/ai-hub-models/qai_hub_models/models/quicksrnetmedium_quantized/demo.py --on-device --hub-model-id mmr3poprm --device "Samsung Galaxy S24 (Family)"

(myenv) ➜  ai-hub-models git:(handley) ✗
@4570235
Copy link
Author

4570235 commented Jan 10, 2025

Additional Information:

  • I have examined the configuration of the sample app and made sure every option was the same as shown on the job page, such as the SDK version, TensorFlow Lite Options and QNN Delegate Options.
  • The inference time on my phone was calculated after warmup running and measured as the average of multiple iterations to avoid bias.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant