Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simple parser instead of regex for parsing resolv.conf #5439

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

geoand
Copy link
Contributor

@geoand geoand commented Jan 10, 2025

Copy link
Contributor

@tsegismont tsegismont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @geoand

Have you got a chance to write a micro-benchmark and compare results?

@geoand
Copy link
Contributor Author

geoand commented Jan 10, 2025

Not yet

@geoand
Copy link
Contributor Author

geoand commented Jan 13, 2025

Actually I am not sure what a microbenchmark would look like in this case as I've removed the old methods and replaced them with a single one that extracts the necessary data in one pass.

@franz1981 how do you usually handle cases like this? Moreover, do you think a microbenchmark is even necessary in this case?

@franz1981
Copy link
Contributor

franz1981 commented Jan 13, 2025

That's how I do it: https://github.com/franz1981/java-puzzles/blob/74d60bbb8639ccbc1158046ce8d5575720bca96c/src/main/java/red/hat/puzzles/cstatic/StaticInt.java

You can use -prof jfr:dir=profile-results;configName=jfr-profile.jfc

And use JMC to read the class loading events and compilation events or just using both -prof cl -prof comp instead (which are easier to use tbh)

These are very tricky benchmarks since:

  • class loading can be dominated by I/O files so having enough forks is necessary to make it happen on the OS page cache - but it means it will be less representative of a real use case since the OS page cache would appear like a super fast disk (whilst is not what happen on containers with rate limited disks i.e. AWS)
  • beware whatever is loaded by JMH infra which can affect measurements: e.g. if what you want to benchmark is leading an expensive Java util class A, but the JMH infra does it too, it will make the code under test to appear faster, If it was the dominant factor, but maybe they same class won't be loaded in the real use case

Which means, make sure the effect you want to measure is big enough that observing it, reproduces it with fidelity and is not absorbed by the infrastructure cost

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

In this case what I think @tsegismont was asking for was just a comparison of the parsing from String

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

Here is what a CPU time flamegraph looks like currently:

orig

And with this patched backported to 4.5.11 it looks like this:

patched

@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

It does seem however that this is causing more allocations so I'll see what I can do

@geoand geoand marked this pull request as draft January 14, 2025 12:34
@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

I pushed an update and now things look rather nice.

Here is the CPU time flamegraph:

cpu-patched

vs the original one:

cpu-orig

And now the allocation flamegraph is also better, see:

alloc-patched

vs the original one:

alloc-orig

@geoand geoand marked this pull request as ready for review January 14, 2025 13:54
@geoand
Copy link
Contributor Author

geoand commented Jan 14, 2025

Ideally I would want this in Vert.x 4 as well :)

@tsegismont
Copy link
Contributor

Actually I am not sure what a microbenchmark would look like in this case as I've removed the old methods and replaced them with a single one that extracts the necessary data in one pass.

It would have two methods comparing the code inside if (isLinux()) { ... } before and after the change.

I pushed an update and now things look rather nice.

Here is the CPU time flamegraph:

I'm not an expert but it seems both diagrams use AddressResolver init as the wide base. So the conclusion that performance has been improved is drawn from the number of samples ? (i.e 60 instead of 118)

@geoand
Copy link
Contributor Author

geoand commented Jan 17, 2025

So the conclusion that performance has been improved is drawn from the number of samples ? (i.e 60 instead of 118)

Correct.

I'm not an expert but it seems both diagrams use AddressResolver init as the wide base

I could have also used a base lower down the call stack that would show that AddressResolver#<clinit> is now narrower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HostnameResolver class initialization is pretty heavy
3 participants