Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialization performance improvements #361

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

reicheratwork
Copy link
Contributor

These improvements could solve part of issue #358

Further performance improvements by:

Expanding the buffer in place while writing
Not checking the completeness of the written struct (as it is already complete as far as we know)

@wjbbupt : could you give this a look?

@zijian-teng
Copy link
Contributor

Tested on my Ubuntu 18.04 in VirtualBox( 6 cores of Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz, 12G/32G RAM):
Testing with the CXX throughput example ./cxxThroughputPublisher 68 0 1 0, and received with C throughput example ./ThroughputSubscriber

  • 2 commits regarding removing completeness check did improve the throughput about 10%
  • 72a1afc has almost no impact on the throughput
  • af5e2af decrease the perfomance by ~10% unfortunately on my environment :(

@reicheratwork
Copy link
Contributor Author

Tested on my Ubuntu 18.04 in VirtualBox( 6 cores of Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz, 12G/32G RAM): Testing with the CXX throughput example ./cxxThroughputPublisher 68 0 1 0, and received with C throughput example ./ThroughputSubscriber

  • 2 commits regarding removing completeness check did improve the throughput about 10%
  • 72a1afc has almost no impact on the throughput
  • af5e2af decrease the perfomance by ~10% unfortunately on my environment :(

@tez1szh Sorry to hear that these changes did not result in the performance gains we had hoped for.

All changes were made by running the throughput example in valgrind with the callgrind option set, and then examining the points in the serialization functions which were of most influence on the performance
[72a1afc] is a more efficient copy into a sequence of simple types (char, long, ...) as this skips the in place default construction of these types and immediately copies the data into the memory, if this has no negative effect on performance, i would say, leave it, as it will have an effect for large sequences of simple types
[af5e2af] was an attempt to see what the effect of skipping the step of determining the serialized size of the entity and replacing it with dynamically increasing the serialization buffer size, while this may perform worse in your case, in other cases, such as deeply nested structures, the penalty of copying the buffer on an increase may be smaller than basically doing 2 serializations, where 1 just skips the actual data copying step, maybe this is something that could be set in a QoS setting, or on a DataWriter or something?

I will have a further look at this

Removed the completeness checks on structs being written, as
this is nonsensical for writing, as the compiler will ensure that
all members that are mandatory (such as key members) are present
in the struct passed for writing
For the reading step, this validation will still need to be done
as the data here does not originate from the structs being read into

Signed-off-by: Martijn Reicher <[email protected]>
Because std::vector::resize adds an unnecessary initialization step
this performs worse than using assign

Signed-off-by: Martijn Reicher <[email protected]>
In read operations, the completeness checking can be skipped for
final extensibility structs

Signed-off-by: Martijn Reicher <[email protected]>
@reicheratwork reicheratwork force-pushed the performance_improvements branch from de4843d to 1c27b67 Compare May 3, 2023 15:23
@reicheratwork
Copy link
Contributor Author

Did some rebasing and reorganising of the commits, I think it is ready for review now.

@reicheratwork reicheratwork marked this pull request as ready for review May 4, 2023 07:38
Copy link
Contributor

@eboasson eboasson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me, but I do have a few questions, likely out of ignorance ...

* Depending on the implementation and mode header length fields may be completed.
* This function can be overridden in cdr streaming implementations.
*
* @param[in] props Properties of the member to finish.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So presumably the _unchecked refers to it not outputting a set of members that were completed successfully (the member_ids) in the original one. I presume that for both functions the return value is false iff some failure occurred, and that the nice bit of the non-unchecked version is that you get more detailed information on what actually happened.

Ok, that makes sense.

What I do wonder is if it would not make more sense to overload them? If the only difference is getting some additional information out, then it seems like that would more elegant and introduce less risk of head-scratching.

Would you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _unchecked versions of the finish_member and finish_struct functions do not check the member or struct for completeness (whether all members are present) when doing a streaming operation.
These functions do the streaming steps when finishing a struct or a member, like reading/inserting headers etc.
The check for completeness is not necessary, unless reading a non-final extensibility struct, since there we have no control over the set of members present.
This check uses std::set as a container of the member ids that were read, and this was a significant performance drain, and therefore was split out for those datatype+streaming operation combinations that needed it.

Overloading is indeed an option, i will look in to this, this should also simplify the code generator.
Don't know why I overlooked this when writing this, probably went into this with a little too much C code attitude.

*
* @return Whether the struct is complete and correct.
*/
virtual bool finish_struct_unchecked(const entity_properties_t &props) {(void)props; return true;}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar reasoning to the finish_member_unchecked applies here (perhaps there are other places as well, I won't repeat this if there are, that'd be silly).

I am cool with being wrong, mind 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the explanation is included in the previous comment.

@@ -55,7 +55,7 @@ bool xcdr_v1_stream::start_member(const entity_properties_t &prop, bool is_set)
return cdr_stream::start_member(prop, is_set);
}

bool xcdr_v1_stream::finish_member(const entity_properties_t &prop, member_id_set &member_ids, bool is_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that finish_member is no longer needed because cdr_stream::finish_member takes care of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the functionality which was duplicated here is now absorbed into the base class's version of the function.

@@ -300,11 +300,8 @@ bool xcdr_v1_stream::finish_struct(const entity_properties_t &props, const membe
if (list_necessary(props))
return move_final_list_entry();
break;
case stream_mode::read:
return check_struct_completeness(props, member_ids);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't yet figure out how check_struct_completeness role is now being fulfilled (also applies to xcdr_v2_stream, obviously).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants