-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grafana Integration #401
Comments
Hi |
Hey guys. I'm also interested in seeing what can be done here to improve diagnostics. My company has done an approach like this for many years (though not with telegraf/grafana but a similar stack). And we can visualize diagnostics metrics. It might be worth discussing the future of rosdiagnostics and figuring out what the scope is. I'd personally think this could be implemented generically to handle any stack. |
@nnarain Could you please explain more how you would set it up to handle any stack? I guess any implementation (be that prometheus, telegraf or straight to influxdb) would need it's own configuration. |
So my take on it would be a new composable node that consumes the aggregated diagnostics topic and forwards it to the desired endpoint (telegraf, elastic, a network sockets, etc). I personally wouldn't do this in the aggregator node as to not add new dependency for those that don't want to use a particular metrics stack. Maybe something like "diagnostics_telegraf". |
Sending data to either InfluxDB itself or Telegraf works by sending a small http request, with the data formatted in a special text as such:
The only extra dependency we have to add to the aggregator node is curl. Personally I do not see this as a problem to include. @ct2034 What do you think? |
Ya I'd imagine a lot of these tools just use JSON. So along the lines of what I mentioned earlier it might be a composable node that converts the DiagnosticArray into a JSON payload and sends it to an endpoint. It sounds like a good use of composition to me. But it depends on what is and is not in scope of the aggregator |
I have thought about this again. Yes, it is only a dependency to curl. But I think it should be a separate package just to separate the concerns more clearly. Then we would also be able to support other backends down the line. And it is a functionality that I think is not in the default feature set that one expects from diagnostics and so it should be in its own package. |
Allright that seals it. I have some time on my hand to start work on this, will post the fork here when i have something up and running. I'll start by naming the package "diagnostics_remote" and the node "telegraf" to start with. Any input on this naming is appreciated. |
Sounds good. :) Looking forward to look at what you came up with. For the package naming, I am thinking about something like:
I wanted to find something a little more descriptive. The node naming sounds good. Then we could have other node names for other backends. I think that makes sense. |
I'll start with diagnostics_remote_logging, if anything better comes up in this thread I'll change it |
I've prepared a working version of the diagnostics code, available at https://github.com/dwffls/diagnostics. The conversion logic for diagnostics messages to the InfluxDB line protocol is in a separate header file for reusability, such as in nodes sending data directly to InfluxDB. TestingSet up InfluxDB (e.g., InfluxDB Cloud and a local Telegraf instance. I've followed this guide. Finaly add this to [[inputs.http_listener_v2]]
service_address = "tcp://:8186"
paths = ["/telegraf"]
data_format = "influx" Once set up, data should appear in the InfluxDB UI. Feedback on the code and or it's structure is welcome! |
I'm really interested in this topic. Here is the roscon talk @dwffls talks about : https://vimeo.com/1024971769 You can see the @dwffls, I will definitely have a look at your repo 👍 |
@dwffls I tried you repo on Humble and I run into the following issue:
And.... I receive In addition to it, in the documentation of |
Could you pull the repository again? Ive added some more error handling to when it posts to Telegraf. As to the the whole
As we are now using the full influxdb input we could change the node to be a full "influxdb" node with an example in the readme to use telegraf as a proxy. Kind of on the fence about this one... Edit: I've started the "rewrite" on a seperate branch to send it directly to influxdb as an option. Readme will follow with instructions for both telegraf and influxdb itself Let me know if anything else doesn't work. |
I've switched to the |
@dwffls I still run into the same issue but at least I have a more verbose log :
I'm not sure about what I'm doing in terminal 1. I don't know if I can just remap diagnostics to diagnostics_agg (they expect the same message type so I guess it is ok ?). |
@dwffls I dive deeper into it :
Unfortunately, in terminal 2 (telegraf), I receive only toplevel_state data :
So it looks like I don't receive diagnostics_agg data (the topic exists and data is published on it). Here is an example message :
|
@avanmalleghem Just came back from holidays, so only just had time to look at your bug. I had a problem with escaping string values for the key value pairs. It's now fixed and the tested with the setup you showed above. Could you test again with the If this works I guess we are ready for a pull request. |
I am sorry but I don't know what you are referencing? Is there a branch I should look at? |
Yeah it should be under the |
During @ct2034's talk at ROSCon 2024 the idea was started to visualize /diagnostics messages in a Grafana dashboard.
I wanted to restart this conversation here in the form of a feature request.
As I'd like to contribute to this feature I guess the first thing to tackle is the structure of this integration.
My own suggestion is to implement this in the diagnostics_aggregator and piggy back of the publishing of the /diagnostics_agg topic being send. This data would then be sent to Telegraf (taking inspiration from another talk at ROSCon) to be later used in grafana.
Happy to hear feedback!
The text was updated successfully, but these errors were encountered: