Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace-load: improve trace load lookup #688

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alexmohr
Copy link
Contributor

@alexmohr alexmohr commented Oct 16, 2024

Store trace load in context of doing a search on each message.

The increased load was detected by our performance team.
Please note that our internal version is not building on master but a previous release, thus I had to adjust the patch a little.
I wanted to upstream this change asap although it needs to go through some final verification on our end because it is a very important amendment to the existing implementation and should be a huge from a performance viewpoint.

@minminlittleshrimp Please review this and also give it a test in your setup.

I'll be on vacation starting at the end of next week, so response times will be a lot longer than usual and I won't be able to make changes before my return at the start of December.
Furthermore I will be switching Teams internally at the start of next year, so this might be one of my last contributions for some time.
It has been really fun working together with you :)

solves #685

The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0

Alexander Mohr, [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@alexmohr
Copy link
Contributor Author

alexmohr commented Oct 16, 2024

quick side note on these changes:

strncpy(daemon.preconfigured_trace_load_settings[0].apid, "APP0", DLT_ID_SIZE);

while not strictly required with the compiler used in the pipeline it crashed locally with the following gcc version

gcc (Ubuntu 10.5.0-1ubuntu1~20.04) 10.5.0
Copyright (C) 2020 Free Software Foundation, Inc.

@alexmohr alexmohr marked this pull request as draft October 16, 2024 17:52
Issue-ID: APRICOT-631667
Signed-off-by: Alexander Mohr <[email protected]>
@alexmohr alexmohr marked this pull request as ready for review October 20, 2024 11:44
@alexmohr alexmohr force-pushed the improve-trace-load branch 6 times, most recently from 70ce605 to fb8c315 Compare October 23, 2024 06:43
@Bichdao021195
Copy link
Contributor

Store trace load in context of doing a search on each message.

The increased load was detected by our performance team. Please note that our internal version is not building on master but a previous release, thus I had to adjust the patch a little. I wanted to upstream this change asap although it needs to go through some final verification on our end because it is a very important amendment to the existing implementation and should be a huge from a performance viewpoint.

@minminlittleshrimp Please review this and also give it a test in your setup.

I'll be on vacation starting at the end of next week, so response times will be a lot longer than usual and I won't be able to make changes before my return at the start of December. Furthermore I will be switching Teams internally at the start of next year, so this might be one of my last contributions for some time. It has been really fun working together with you :)

solves #685

Hi @alexmohr
Thanks for your pull request. Can you provide the example steps how to verify the trace load improvement with your code change if it's possible?

@Bichdao021195
Copy link
Contributor

Store trace load in context of doing a search on each message.

The increased load was detected by our performance team. Please note that our internal version is not building on master but a previous release, thus I had to adjust the patch a little. I wanted to upstream this change asap although it needs to go through some final verification on our end because it is a very important amendment to the existing implementation and should be a huge from a performance viewpoint.

@minminlittleshrimp Please review this and also give it a test in your setup.

I'll be on vacation starting at the end of next week, so response times will be a lot longer than usual and I won't be able to make changes before my return at the start of December. Furthermore I will be switching Teams internally at the start of next year, so this might be one of my last contributions for some time. It has been really fun working together with you :)

solves #685

Hi @alexmohr
I noticed that the trace load setting address is stored in the DLT context, allowing searches on each message using the dlt_daemon_context_find API instead of directly calling dlt_find_runtime_trace_load_setting. Could you please explain why this approach helps reduce CPU load? Because we still need to search the context on each message for using trace load setting .

@alexmohr
Copy link
Contributor Author

alexmohr commented Oct 31, 2024

Testing can be done the same as the original Pr. i can't provide better tests at the moment because I'm not in office until December.

Using this approach is faster because this api is using qsearch instead of a linear search. Most improvements are on clients side though where total number of search operations are reduced drastically by storing the trace load limits in the context.

dlt_find_runtime_trace_load_settings(
dlt_ll_ts_type* ll_ts = &dlt_user.dlt_ll_ts[log->handle->log_level_pos];
if (ll_ts->trace_load_settings == NULL) {
ll_ts->trace_load_settings = dlt_find_runtime_trace_load_settings(
Copy link
Contributor

@Bichdao021195 Bichdao021195 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace load settings should be check null before accessing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? Null check for tl settings is above and this if is only executed when it is null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean checking ll_ts for null? I think that check would be superfluous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After carefully checking, it is superfluous to check ll_ts->trace_load_settings null. Because this pointer already check null and print log inside func "dlt_check_trace_load"

@@ -162,7 +162,10 @@ typedef struct
int user_handle; /**< connection handle for connection to user application */
char *context_description; /**< context description */
int8_t storage_log_level; /**< log level set for offline logstorage */
bool predefined; /**< set to true if this context is predefined by runtime configuration file */
bool predefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Could you explain the purpose of removing the comment of the bool predefined variable? If it was removed intentionally, kindly please consider reverting this line if it is not necessary."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not used as far I could tell, but maybe missed something?

@@ -1636,6 +1636,13 @@ int main(int argc, char *argv[])
dlt_gateway_deinit(&daemon_local.pGateway, daemon_local.flags.vflag);

dlt_daemon_free(&daemon, daemon_local.flags.vflag);
#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those lines of code should be warped in 1 function like "dlt_traceload_free"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, once I'm back in office in December :)

@@ -1308,6 +1306,15 @@ DltReturnValue dlt_register_app(const char *apid, const char *description)

#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
strncpy(trace_load_settings[0].apid, dlt_user.appID, DLT_ID_SIZE);
if (!trace_load_context.contextID[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is make sense to register special context inside dlt_register_app. Because dlt_register_app api serve for register app purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required, as logging without calling the register api first is technically possible and is happening within our project and this change is implemented in a way that is fully compliant with existing code

@@ -1308,6 +1306,15 @@ DltReturnValue dlt_register_app(const char *apid, const char *description)

#ifdef DLT_TRACE_LOAD_CTRL_ENABLE
strncpy(trace_load_settings[0].apid, dlt_user.appID, DLT_ID_SIZE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strncpy(trace_load_settings[0].apid, dlt_user.appID, DLT_ID_SIZE);
pthread_rwlock_wrlock(&trace_load_rw_lock);
strncpy(trace_load_settings[0].apid, dlt_user.appID, DLT_ID_SIZE);
pthread_rwlock_unlock(&trace_load_rw_lock);

Writing settings needs protection to prevent seg faults

@alexmohr
Copy link
Contributor Author

alexmohr commented Dec 2, 2024

image
this is a possible deadlock that still needs to be fixed. I'm working internally on a fix and will provide this along with the other changes as soon as it's working internally.

@alexmohr
Copy link
Contributor Author

alexmohr commented Dec 6, 2024

fixed the deadlock, fixed findings.
@Bichdao021195 can you give it another round of review?

@Bichdao021195
Copy link
Contributor

fixed the deadlock, fixed findings. @Bichdao021195 can you give it another round of review?

fixed the deadlock, fixed findings. @Bichdao021195 can you give it another round of review?

Thanks @alexmohr for your contribution on DLT.
If possible, could you also guide me on how to set up and monitor the trace-load improvements in your changes? I’d like to observe how the batch improves the performance of the trace-load feature. Which indicators should we monitor?
Additionally, are there any steps to reproduce the deadlock issue?

@alexmohr
Copy link
Contributor Author

@Bichdao021195 just pushed some more changes for scenarios we had deadlocks in the fleet.

Which indicators should we monitor?

CPU time spend in libdlt especially in dlt_find_runtime_trace_load_settings was a lot prior to this change, as an expensive search was done for every message. This change drastically reduces the load by caching the values.
It shows best when a lot of messages are logged and filtering by context id is done.

Additionally, are there any steps to reproduce the deadlock issue?

Main issue was in dlt exit, where an internal application got stuck due to the scenario in the graph I provided.
It's possible to reproduce this with out of the box tools like dlt-adaptor-stdin as well, by running them in a loop and waiting for it not to exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants