-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check Mobilenet V4 Large on iPhones #865
Comments
@freedomtan please share the info how to check the model accuracy for the Mobilenet V4. What dataset do I need to use, and if we have some specific steps to setup accuracy test on the iOS device. Thanks |
To validate accuracy of image classification models, we use full ImageNet 2012 validation dataset (50, 000 images) from https://www.image-net.org/index.php. |
@freedomtan I've tried accuracy test for the CoreML backend, and TF backend for the Image Classification task v1, and v2. For each case it crashes after 100%. EXC_BAD_ACCESS (code=1, address=0x27c8) in compute accuracy. I'm going to check what the problem we have. |
I've found that validation results were expected in another format, that I had (so only the category number, without image name). I can run the accuracy test now, but it gives 0.05% of the accuracy, so might be again dataset issue. When tried the one from our tests it gives 100 %, but we have 10 images there only. |
@RSMNYS I don't get it. This is the original Mobielnet EdgeTPU model we had or a new V4 one? As far as I can remember, we checked that we can have expected accuracy number for the original one. Please check
|
FYR, on an iPhone 13, for Mobilenet EdgeTPU I got 76.21% running binary built from lastest master branch. |
Thanks, all works. For iPhone 14 Pro, I have the same 76.21%. Will try with the ImageNet V2 and different optimised models based on it. |
All tests were done on iPhone 14 Pro
Also during the test I noticed the performance drop when device is warm (after several tests). And sometimes it drops from 300 to 200 qps. Please check also the screenshot, there you can see the tests for MobilenetV4_Large.mlpackage (8 bit quantization) only. You can see how the performance could differ. @freedomtan Here is the link to models: https://github.com/RSMNYS/mobile_models/tree/main/v4_0/CoreML |
@RSMNYS thermal throttling is a well-known issue on cell phone. A typical way to get numbers we want is to cool down the device before you run a new test :-) |
please try to do the first 3 items and ensure that there is not thermal throttling. e.g., cold start, wait for 5 mins, and measure the performance numbers. Note that currently we don't allow model pruning (sparsity above) for submission. If we want to allow that, we need to change our rules. |
All tests were done on iPhone 14 Pro
|
These numbers look reasonable now. But let's see if we can further improve it. Let's check if @colbybanbury can comment on this. |
MobilenetV4 was made public last week, see https://arxiv.org/abs/2404.10518 or https://arxiv.org/html/2404.10518v1 |
The V4 paper results use an iPhone 13 and fp16 quantization. The model was also derived from a Pytorch equivalent in order to be in (batch, channel, height, width) tensor format which I measured to be slightly faster. I recommend using fp16 on iPhones with a version number less than 15 pro where they added int8-int8 compute. Happy to help if needed! |
@RSMNYS
|
@freedomtan can you point please where we can get the MobileNet V4 PyTorch model. As currently we have only tf lite one. |
The PyTorch model has yet to be officially released. Sorry for the delay! The TensorFlow model should still get similar latency results, but let me know if I can help with anything. |
@freedomtan to try it on iPhone 13 again. |
As I got before, on iPhone 13, it's about 220 qps |
Let's try to have PyTorch model (with weights from the TensorFlow model). |
@colbybanbury can you please tell us if you use mlmodel or mlpackage CoreML models in your tests? |
I used MLPackage |
@RSMNYS With Xcode 16.0 beta and iOS 18 + MLPackage targeting iOS 15 or later, it's possible to get per-op time. Please check https://developer.apple.com/videos/play/wwdc2024/10161/?time=927 |
Per-op profiling actually is possible on iOS 17.4+ / MacOS 14.4+. I wrote a little command line program and tested it on my Macbook Pro M1, see https://github.com/freedomtan/coreml_modelc_profling |
FWIW There's still no official weights from the paper authors, but I've trained a number of PyTorch native MobileNetV4 models and made them available in |
@rwightman: FYI, thanks to @colbybanbury, one of the co-authors of the paper, we did have MobileNetV4-Conv-Large saved_model, and tflites, see https://github.com/mlcommons/mobile_open/tree/main/vision/mobilenetV4 |
@RSMNYS import timm
import torch
import coremltools as ct
torch_model = timm.create_model("hf-hub:timm/mobilenetv4_conv_large.e600_r384_in1k", pretrained=True)
torch_model.eval()
# Trace the model with random data.
example_input = torch.rand(1, 3, 384, 384)
traced_model = torch.jit.trace(torch_model, example_input)
out = traced_model(example_input)
model = ct.convert(
traced_model,
convert_to="mlprogram",
inputs=[ct.TensorType(shape=example_input.shape)]
)
model.save("mobilenetv4.mlpackage") This model takes around 3.10 ms (> 300 qps) on my iPhone 13. These matche what @colbybanbury and other said in the paper. Please try to see if we can get the same performance with the TF saved_model. Thanks @rwightman |
@RSMNYS and @anhappdev I used random data as calibration data. Then I got. unit: ms
Maybe we can use "real" calibration data to check if quantized int8 models could meet accuracy thresholds. |
I will try to do that. |
@freedomtan Good to hear that. For quantization, some weights quantize 'better' (less performance drop) than others, the training hparams have an impact. I'd be curious to know how the timm weights I've trained so far fair in that regard. |
@freedomtan can you share the versions of coremltools, torch and tensorflow please? So far I'm getting same 300 qps on my iPhone 14 Pro IOS 18 beta |
coremltools: 7.2 |
@freedomtan to share the models he converted. |
@RSMNYS, https://drive.google.com/drive/folders/1rR7SsqO2ZfVI7whn8ky1biuZRIB-5AMC?usp=sharing |
Thanks for sharing. So only for int8 model I got 373 qps on iPhone 14 Pro IOS 18 Beta. Another model still has <=300 qps. |
@RSMNYS that's hard to believe. Did you fully charge the phone and avoid thermal throttling? How do you test it? @anhappdev Do you have any results? |
I upgraded the iPhone 14 Pro I tested to iOS 18. Still got simiar results in Xcode 16 beta 1. |
I'm using Xcode 15.4. Will upgrade today and see. Thanks |
Xcode 16 Beta, Built the app with IOS SDK 18, using your model. Maximum what I have it's 300 qps for Image Classification V2. Not able to run this model in Xcode as well. Having an error: data could not be read because it's missing. |
Mostly you didn't use Xcode 16 beta 1 to run profiling. |
@RSMNYS I downloaded the models I shared to test with iPhone 14 Pro + iOS 18.0 beta. I got the same results. I use iPhone 14 Pro w/ iOS 18 (actually 17.5.1 is fine). To run model performance profiling on iOS 18 devices, you need Xcode 16 beta. |
I added a import timm
import torch
import coremltools as ct
# Load the pretrained model
torch_model = timm.create_model("hf-hub:timm/mobilenetv4_conv_large.e600_r384_in1k", pretrained=True)
torch_model.eval()
# Inspect the model
print("num_classes", torch_model.num_classes)
print("data_config", timm.data.resolve_model_data_config(torch_model))
# Define a wrapper to convert NCHW to NHWC input
class WrappedModel(torch.nn.Module):
def __init__(self, model):
super(WrappedModel, self).__init__()
self.model = model
def forward(self, x):
# Permute from NHWC to NCHW
x = x.permute(0, 3, 1, 2)
x = self.model(x)
return x
wrapped_model = WrappedModel(torch_model)
wrapped_model.eval()
# Trace the wrapped model with random data
example_input = torch.rand(1, 384, 384, 3)
traced_model = torch.jit.trace(wrapped_model, example_input)
out = traced_model(example_input)
# Convert the traced model to CoreML
ml_model = ct.convert(
traced_model,
convert_to="mlprogram",
inputs=[ct.TensorType(name="images", shape=(1, 384, 384, 3))],
outputs=[ct.TensorType(name="Softmax")],
)
ml_model.short_description = "hf-hub:timm/mobilenetv4_conv_large.e600_r384_in1k"
# Save the CoreML model
ml_model.save("mobilenetv4.mlpackage") |
2.30 ms is what I expected. Adding transpose (permute in PyTorch is compiled / translated into MIL transpose op) causes slow-down of other ops is a curious case. The models converted from TensorFlow usually come with some transpose op(s). Maybe removing the leading transpose op could increase infernce speed? @RSMNYS please check if you can reproduce @anhappdev's results. |
@freedomtan @anhappdev With Anh's model I have 245.8 qps for performance. And the accuracy is 82.43%. |
@RSMNYS How about the original (2.30 ms) one. |
I ran two models (w/ and w/o leading transpose ops) on iPhone 14 Pro running iOS 18 and got numbers close to what @anhappdev reported. The permutation/transpose op is a interensting. If the transpose op is the problem (e.g., if removing the leading transpose from the TF-based mobilenet v4 model we used), then we can simple do the NHWC -> NCHW transformation in preprocessing stage (see https://github.com/mlcommons/mobile_app_open/blob/master/flutter/cpp/datasets/imagenet.cc#L97-L99). |
I compared the latency of each ops and saw that the transpose ops itself takes only 59 microseconds, but every other ops (conv, relu, ...) also take longer time compared to the version without the transpose ops. I first guessed that it has something to do with the memory layout, but adding |
Yes, other ops are slowed down. That's why it's intriguing. |
@RSMNYS will try to remove the leading transpose op to see if we can get performance numbers matching PyTorch model. |
Hi guys! So here are 2 graphs for the models (CoreML) with and without transpose layer: |
@RSMNYS to check if we can reduce the inference latency of the MobileBERT model. |
QuantizationHere is the Python script to convert, quantize and test the model:
Performance is tested with XCode 16.0b1 on iPhone 14 Pro (iOS 17.5.1). Accuracy is tested on a Macbook with the Python script above. |
I added a script to test the accuracy on a Macbook. The comment above is updated with the accuracy number. |
@anhappdev I cannot download your models, it says something like "No preview available. File is in owner's trash", and I cannot find how to download it.
|
@freedomtan I updated the link. You can download the models from here: I am not sure why the w8a8 model I converted run on CPU. Can you share your conversion script? |
@anhappdev I got expected numbers with your w8a8 one. Xcode 16 beta 2/iOS 18 beta 3
|
Currently, I got
Roughly, > 300 qps for iPhone 13 should be possible.
#821 (comment)
The text was updated successfully, but these errors were encountered: