-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add registry to Filebeat's diagnostic #41795
base: main
Are you sure you want to change the base?
Add registry to Filebeat's diagnostic #41795
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
…try-on-diagnostics
…try-on-diagnostics
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
|
I'll implement those changes. |
The Fleet server upload API used to get diagnostics as an action probably has a limit, I'm not sure what it is by default off the top of my head. I would expect it is ~100 MB so if we go with 20 MB we need to make sure we don't push ourselves past that in real use cases. The few compressed gzipped diagnostics I have laying around are all <5 MB now for reference. |
The Fleet settings aren't super clear on what the upload file size limit is, maybe there isn't one? The files are chunked as part of the implementation. https://github.com/elastic/fleet-server/blob/4132a505f0c759e332a890fdf701b8ff238cf91a/fleet-server.reference.yml#L200-L214 |
…try-on-diagnostics
This pull request is now in conflicts. Could you fix it? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from comments
filebeat/input/v2/compat/compat.go
Outdated
// func (r *runner) Diagnostics() []diagnostics.DiagnosticSetup { | ||
// fmt.Println("================================================== Diagnostics called!") | ||
// setup := diagnostics.DiagnosticSetup{ | ||
// Name: "registry collector", | ||
// Description: "Collect Filebeat's registry", | ||
// Filename: "registry.tar.gz", | ||
// ContentType: "application/octet-stream", | ||
// Callback: getRegistry, | ||
// } | ||
|
||
// return []diagnostics.DiagnosticSetup{setup} | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this commented out code. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes until comments get addressed.
…try-on-diagnostics
…try-on-diagnostics
…try-on-diagnostics
…try-on-diagnostics
All review comments have been addressed. The failure from TestRegistryIsInDiagnostics has been fixed |
|
||
for _, lst := range preFilesList { | ||
var path string | ||
if filepath.Separator == '\\' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid init()? This appears OK but we have constant problems with init in agentbeat and beats receivers, I wouldn't want this to encourage more init.
Can you just use the os.PathSeparator
constant instead of doing this detection here? The path separator would be known at compile time based on the target OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it on a init()
because it's just local variables for the specific purpose of archiving the registry and not having to re-do it every time a diagnostics is requested.
I agree with avoiding using init()
and any global state given all the issues it can (and is) bring. I believe it does not fall into this category, however, I can just re-do it every time it's needed, the performance implications are definitely negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use the os.PathSeparator constant instead of doing this detection here?
No, because the detection here is to actually escape the path separator in the regexp. If I just use filepath.Join
it won't be escaped in the final string and the regexp won't compile.
Did you test this with a real agent uploading diagnostics that include the registry to Fleet? If not, can you? I want to make sure this actually works end to end before merging it. |
Yes, I have tested it end-to-end multiple times and even demoed it after the On-Week. The only thing I haven't manually tested is the limit when the registry is too large. I can try testing this at some before merging. |
This commit moves the code that populates `registryFileRegExps` from a `init()` function to a place before `matchRegistyFiles` is called. Because generating diagnostics is very sporadic, there is not meaningful performance change between having it generated once on `init()` or generating it every time it's needed.
logger := logp.L().Named("diagnostics") | ||
buf := bytes.Buffer{} | ||
dataPath := paths.Resolve(paths.Data, "") | ||
registryPath := filepath.Join(dataPath, "registry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always guaranteed to exist if there isn't a filestream or log input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the registry is created very early during Filebeat's start up, even when using the benchmark input the registry folder, meta.json
and log.json
are created.
Given that they exist in this very simple case, I rather have them in the diagnostics then trying to be clever and detect when the registry is actually being used by one of the inputs.
return fmt.Errorf("cannot get full path from '%s': '%w'", src, err) | ||
} | ||
|
||
tarFile, err := os.Create(dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be empty if there are no matches, can we avoid including an empty .tar file to avoid people looking into it and wondering if it is supposed to be empty?
If the registry directory exists without a log or filestream input, the regexes will never match for agent Filebeat sub-processes that don't run either of those input types so this will be common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on #41795 (comment) the registry folder is always there and contain some data. Also other inputs can use the registry, from the top of my head: log
, filestream
and journald
all use the registry.
The tar file from this "empty" registry is 3.5kb, the gziped is 228 bytes, it's so little I don't see any problems of having it in the diagnostics. It also works as a sort of "snapshot" of the registry, even if empty, which is a helpful information to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care about the impact of the size, rather the analysis impact of knowing when its worth decompressing and looking at
A lot of inputs don't use the registry at all, my suspicion is people and tooling will waste time looking at the empty ones because there's no other way to know if its useful besides memorizing which inputs use the registry.
This commit adds a test for when Filebat's registry is empty, that's done by using the benchmark input. The benchmark input is fixed so it can run under Elastic-Agent.
Proposed commit message
This commit adds Filebeat's registry folder to the Elastic-Agent
diagnostics. It's called
registry.tar.gz
and includes all registry fileson
${path.home}/registry
.The registry is first archived into a temporary tar file. The
temporary file is created by calling
os.CreateTemp
and will use theOS's temporary folder. Then it's gziped in memory and returned to
Elastic-Agent, finally the temporary file is removed from the disk.
If the final gziped file is more than 20mb, it is skipped due to its large
size.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
None.
Author's Checklist
How to test this PR locally
The easiest way to test it locally is to package the Elastic-Agent with your local build of Beats by setting
EXTERNAL=false
when callingmage build
in the Elastic-Agent folder.Then, deploy the Elastic-Agent, let it ingest some logs and request the diagnostics. Extract the diagnostics and look for
registry.tar.gz
incomponents/filestram-*
folders.You can also run the integration tests:
Related issues
## Use cases## ScreenshotsLogs
Full log entries
Only the
message
field: