-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c++ performance vs c performance #358
Comments
Hi @wjbbupt , Could you give me an example of the code that you used to generate this flamegraph? |
@reicheratwork Two versions, |
@reicheratwork From the source code analysis, c++ read or writer serialization, deserialization part and c language use two sets of mechanisms, I am not sure if there is any historical reason, my understanding is that c++ should adjust the api of c up, |
Aha, thanks. |
@reicheratwork According to the actual test, the time-consuming of the serialization of the C version and the C++ version is about 4-5 times that of C. This is where the performance of the C++ version is worse than that of C. |
There are two things going on here.
I think the main performance penalties on the writing (serialization) side are from differences in implementation caused by the second point.
This requires C and C++ use two different types of serializers to translate the data from their language-native (C/C++) representation to their wire representation. And because the C++ representations of the exchanged types are part of a specification (this one) I think it will be very difficult to adapt the C serializers to be used by the C++ binding. About things being nested, I don't really understand what you mean by this, could you please expand on this? |
@wjbbupt I have been working on some performance improvements on the C++ serialization side, and will have a PR ready for that soon, could I count on you to also poke around and review it? |
@reicheratwork The premise of the test is the same idle; Differences on the read side: Therefore, the difference on the pub side is mainly reflected in the serialization. According to your conclusion, it is to adapt to the language difference, but if you check the read() and take() in the sub calculation, you will find that c and c++ use different sets interface |
@wjbbupt I have been working on some performance improvements on the C++ serialization side, and will have a PR ready for that soon, could I count on you to also poke around and review it? If you can, I can use your PR to do some demo tests. I am very willing to improve the performance. We have a common goal, thank you |
@reicheratwork could you give #361 a try and see how it improves the performance?
|
I have a suggestion;
|
@reicheratwork I am also integrating the optimized version with the release version. Unfortunately, the compilation problem has too many modifications, and it has not been resolved yet. |
@wjbbupt I ran everything through valgrind's callgrind and did a little comparison between the C and C++ performance: callgrind_files.zip (you can view these files in kcachegrind) I came to the following conclusions regarding the performance (excluding putting the data on the network, setting values in samples by the program before writing, etc.) for the parts that "really differ" between the C and C++ implementations: C receiving side:
C publishing side:
C++ receiving side:
C++ publishing side:
The main analysis (per sample):
Main places for improvements in C++ (in my eyes):
@wjbbupt could you have a look at these callgrind files and give some of your insights? |
@wjbbupt I made some changes to the deserialization code, this should now copy base types by calling std::vector::assign in stead of std::vector::resize and memcpy Give that a look? |
@wjbbupt made some small changes, as the previous improvement neglected to do endianness correction after the sequence of base types copy |
@reicheratwork
|
The insert on the std::set in the |
@reicheratwork The insert on the std::set in the finish_member function is done to later check the struct's completeness, which is necessary when reading an @appendable or @mutable datatype, this check is now only done when it is necessary, check commit de4843d in PR #361 I have already studied this, and the test results show that there is still a big performance gap between c++ and c, and there must be room for optimization. |
You are correct that the differences in performance are disappointing, and maybe there are some differences in the C and C++ throughput examples that can explain them, I will examine this |
Experiments have found that the performance of c++ is slower than that of c by 2-3 times. Through the flame graph, we can see that the sequence on the pub side is replaced by cdr, which consumes a lot. On the sub side, deserialization is faster and take is faster, so I want to analyze it together. The following is flame graph。
c:pub:
c:sub:
c++ pub
c++:sub
What I want to confirm is that I suspect that the reason for the slow performance of c++ is:
The above are some of my doubts, and I want to discuss and study them with you.
The text was updated successfully, but these errors were encountered: