Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_zb_init causes heap corruption/ overrides something important? (TZ-1469) #534

Open
3 tasks done
damian-kurek-wizzdev opened this issue Jan 13, 2025 · 10 comments
Open
3 tasks done
Labels

Comments

@damian-kurek-wizzdev
Copy link

Answers checklist.

  • I have read the documentation ESP Zigbee SDK Programming Guide and tried the debugging tips, the issue is not addressed there.
  • I have updated ESP Zigbee libs (esp-zboss-lib and esp-zigbee-lib) to the latest version, with corresponding IDF version, and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

5.1.2

esp-zigbee-lib version.

1.6.0

esp-zboss-lib version.

1.6.0

Espressif SoC revision.

esp32c6

What is the expected behavior?

No crash

What is the actual behavior?

So I integrated Zigbee sdk with the existing code base and I noticed that I got stack overflow in random lines in the thread that was not touched for year+. After debugging the issue for some time I noticed that if I comment out esp_zb_init everything works fine.
esp_zb_cfg_t zb_nwk_cfg = {}; zb_nwk_cfg.esp_zb_role = ESP_ZB_DEVICE_TYPE_ED; zb_nwk_cfg.install_code_policy = INSTALLCODE_POLICY_ENABLE; zb_nwk_cfg.nwk_cfg.zed_cfg.ed_timeout = ED_AGING_TIMEOUT; zb_nwk_cfg.nwk_cfg.zed_cfg.keep_alive = ED_KEEP_ALIVE; esp_zb_init(&zb_nwk_cfg);
I don't need to do anything with zigbee e.g. run zigbee loop and it will trigger the crash in different thread

Steps to reproduce.

Hard to tell how to reproduce it I have 18 threads running so you probably need firmware that uses a lot of threads, so Zigbee can override it.

More Information.

I will come back to debugging this tomorrow, maybe I can find some more info, if you need me to check something just write a comment.

Attaching the core dump from UART.
core_dump.txt
I tried to enable heap corruption tools but did not get anywhere with that

@github-actions github-actions bot changed the title esp_zb_init causes heap corruption/ overrides something important? esp_zb_init causes heap corruption/ overrides something important? (TZ-1469) Jan 13, 2025
@damian-kurek-wizzdev
Copy link
Author

I checked it and couldn't find anything more, commenting zigbee init fixes everything.

@xieqinan
Copy link
Contributor

ERROR A stack overflow in task CloudReporter has been detected.

This log indicates the CloudReporter task overflows, do you have try to enlarge its size?

@damian-kurek-wizzdev
Copy link
Author

I did not try, but this stack has been like this for 2 years tested by 300000+ users, when I comment out zigbee init it doesn't overflow. I will increase the stack and check it

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 14, 2025

Additionally, this stack crashes when it is getting ntp time right after starting. So there is nothing that could overflow it.

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 14, 2025

For more data, I experimented with ZigBee initialized and not initialized without changing any of my code.

    auto mark = uxTaskGetStackHighWaterMark(cloudControl.reporterTaskHandle);
    auto mark2 = uxTaskGetStackHighWaterMark2(cloudControl.reporterTaskHandle);
    LOG_ERROR("Watermark 2 = %d 1=%d", mark2, mark);

This code snippet gets a high water mark so as far as I understand it highest number of bytes used from the thread stack I put this at the start and right before it fails. I noticed that it fails when it gives back time to CPU in VTaskdelay function.
What this task does now is print log, print mark, sleep for 2s, check if ntp time is synchronized and go to sleep. It looks like it crashes after going to sleep.
My findings for no zigbee init:
image
With zigbee init:
image
Zigbee inits happens in between and thread randomly gets stack overflow

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 14, 2025

More interesting stuff happens:
image
image
So I increased stack size 3 times and when ZigBee is initialised my thread allocates an additional 1k+ of memory on the stack

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 14, 2025

I think I figured it out. Zigbee sdk does something with uart logger and it causes to allocate more memory after log when zigbee SDK was initlized. To reproduce it try starting a thread like this

void loop3(void* param)
{
    while (true)
    {
        auto mark  = uxTaskGetStackHighWaterMark(task1);
        auto mark2 = uxTaskGetStackHighWaterMark2(task1);
        LOG_ERROR("Task 3 Watermark = %lu 1=%u", mark2, mark);
        SLEEP_MS(1000);
    }
}

Wait for some time to check the watermark and then start ZigBee, logger should allocate additional memory after Zigbee was initialized

image
image

@xieqinan
Copy link
Contributor

@damian-kurek-wizzdev

Based on your description, Do you suspect that the esp_zb_init() allocate some extra common memory which affect other tasks? Due to the log from the LOG_ERROR("Watermark 2 = %d 1=%d", mark2, mark); is not very clear, I can not full crasp the points , could you please give a more detailed summary for your found? If this issue is confirmed by us, we will fix it ASAP.

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 15, 2025

I will create an example of that behaviour.
Explanation
uxTaskGetStackHighWaterMark, uxTaskGetStackHighWaterMark2
Water mark documentation
So this function gives you a minimal number of bytes left on the stack for example you have a stack 4000B. You use 1500 B. The Watermark will show 2500. So in our example when I log on to a separate thread same message I expect that the number of bytes left on the stack will be constant, but it increases after initializing zig bee stack for some reason

@damian-kurek-wizzdev
Copy link
Author

damian-kurek-wizzdev commented Jan 15, 2025

https://github.com/damian-kurek-wizzdev/zigbee/tree/zigbee_logs_problem
You can copy 3 logging tasks to any of your project. Run 3 tasks and wait for some time e.g. 10-20 seconds. Init zigbee with stack with esp_zb_init and check that logging stack uses more memory for some reason.
@xieqinan
Check main/main.cpp for logging tasks.
This project won't compile without changing the ZigBee thermostat default value since C++ compiler won't take 0xffff as int16_t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants