-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue updating the Apple backend #741
Comments
@freedomtan for further improvements should we use the saved models from your repo (MobileBert, MobileDet)? Or we can use some models from the TensorFlow hub (At least for MobileBert model: https://tfhub.dev/tensorflow/mobilebert_en_uncased_L-24_H-128_B-512_A-4_F-4_OPT). As I see the saved models are with tf 1 version. However in new model the inputs are different than ours. |
I am not proud of my repo :-) We should check the accuracies of models. As far as I can tell the https://tfhub.dev/tensorflow/mobilebert_en_uncased_L-24_H-128_B-512_A-4_F-4_OPT is not for SQuAD (hence not compatible) |
Hi guys! So I've converted the MobileBERT using the coreMLTools version 7, TensorFlow v 2.12 to the *.mlpackage format, as well, as optimised the model using quantization technique. Currently I have the problem to use the *.mlpackage format in our application. The problem arises when do on device compilation to receive the mlmodelc. I've tried to compile on the Mac itself and then use the compiled model, but then some issue with loading its content. So working to resolve this to see how accurate is the optimised model. When working on the task I found such issues/possible improvements:
|
@RSMNYS I don't really get what you ran into. From my past experiences, if we can make the And for performance, please check if you got latency improvement in Xcode's / Instrument's Core ML Performance Report first. |
The thing is it works for main (when doing tests), and it loads the ml program with no issues. But when trying in the app the error says can't read the spec. Will continue with this today. |
Hi guys! Here are the results of the inferences by using the original MobileBERT (mlmodel) and the new converted models (mlpackage and optimized mlpackage). For the optimized one we used the default int8 quantized data type). As we can see the converted mlpackage has worse results than the original mlmodel. Need to check what could be the problem. As for the quantized model all seems correct as we used the lower precision for the data type (int8 and not float16)m that's why worse results. MLPackage is the directory and not the single file. So to have the fingerprint for it we need to archive it. To correctly handle the archive with the mlpackage I've adjusted the archive_cached_helper. So now the app can load the mlpackage and do the inferences. Still have some difficulties with the model path after app restarts, because the logic returns only the path the archive's folder. In our case we have: https://github.com/RSMNYS/mobile_models/raw/main/v3_0/CoreML/MobileBERT.zip. After download and unarchive the model is saved to ../raw/main/v3_0/CoreML/MobileBERT/MobileBERT.mlpackage. After app restart (app uses cached resources) he app returns this model_path: ../raw/main/v3_0/CoreML/MobileBERT, which is not correct. I think we can resolve this by introducing the new property to the pbtxt settings: model_name, so we can compose the model_path correctly and support models type which are not just single file, but package(directory). Please let's discuss. |
@RSMNYS Please check model performance with Xcode Performance tab and/or Core ML Instruments first. For performance benchmark, it's hard to ask people to believe that we have "improved" model which is 1 - (92.14/121.71) = 24% slower than the original one. |
@RSMNYS Can you try rename ‘MobileBERT.zip’ to ‘MobileBERT.mlpackage.zip’ |
@RSMNYS please share your forked repo or one the .mlpackage model. |
@freedomtan here is the forked repo: https://github.com/RSMNYS/mobile_models/raw/main/v3_0/CoreML/MobileBERT.mlpackage.zip |
Let's check something basic.
|
@freedomtan here is my test with the converted model: Sometimes Xcode fails to create the report, but I believe this is the Xcode issue, because it shows me sometimes the wrong operating system, but in the result all is listed correctly. Can you share the general tab for the converted model, the results, and version of the Xcode, please. |
@RSMNYS I meant "CPU and ANE". "GPU and ANE" is the reason why your model is slower.
|
@freedomtan to post profiling results how old coreml model work on couple devices. |
@RSMNYS With coremltools's converter, you can try to convert a TF model to MIL by setting the
|
I dug a bit into it over the past weekend. Some information maybe useful.
And then, it should be possible to tweek |
@freedomtan I did some more testing with MobileBERT.mlpackage. I've set different precisions for the model: Float16, and Float32 and here are the results: FLOAT16 All units: 8.33 ms 1900 operations on NE, 8 op on GPU FLOAT32 All unit: 31.11 ms - all operations run on GPU only Also found this description: ML programs use a GPU runtime that is backed by the Metal Performance Shaders Graph framework. So could it be that mlpackage is optimised to perform the operations on the gpu (to utilise parallel execution). And since nlp models has the sequence nature, it's not so beneficial to run on gpu. (In terms of qps). We can check other models (vision) to see if the operations are faster in this case. Checking more. |
@RSMNYS CPU: flp16, fp32 (and maybe bf16) I recommend
for .mlmodel and .mlpackage we discussed,
As you can see, running on GPU is slower. With MIL program and netron, we can find what the 10 and 8 ops in mlmodel and mlpackage, respectively
Maybe we can change mlprogram manually to check what stopped the 8 ops from running on CPU. |
some ideas we can try to improve the apple backend
in the WWDC 2023 "Use Core ML Tools for machine learning model compression", https://developer.apple.com/wwdc23/10047, Apple folks claimed that Apple's new quantization scheme could help reduce inference latency
some models definitely still have room for improvement, e.g.,
The text was updated successfully, but these errors were encountered: