Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential slowness when dealing with large number of configuration values #11

Open
ThomasWaldmann opened this issue May 9, 2019 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ThomasWaldmann
Copy link

the way the code does it right now is that it does everything per configuration key:

  • a full argparse
  • checking env vars (no big issue, this is quite fast as it is a dict already)
  • a full config file read / parse

if you have a lot of configuration keys, this will get slow.

@ThomasWaldmann
Copy link
Author

as argparse already has a quite extensive infrastructure, quite some projects will have a more or less big argparse parser setup already (including help strings, typing, defaults, etc.).

maybe you could start from doing argparse just once and then finding out what attributes it has and then checking the env vars / config based on that (also only once for all).

@ThomasWaldmann
Copy link
Author

@naorlivne
Copy link
Owner

Not sure scalability is the right phrasing for the issue you raise as it suggest that after X amount of configuration keys parse_it will cease working... I agree with your assessment that the more configuration keys needed the longer it will take so gonna change the title to something more descriptive of the problem

@naorlivne naorlivne changed the title scalability issue? potential slowness when dealing with large number of configuration values May 12, 2019
@naorlivne naorlivne added enhancement New feature or request help wanted Extra attention is needed labels May 12, 2019
@naorlivne
Copy link
Owner

It should also be noted that unless you're dealing with thousands of configuration values it should still be a matter of a fraction of a second to go over them all (or so at least the tests I've ran so far seem to suggest)... that been said there is always room for improvement, as well as the argparse pre-reading suggestion @ThomasWaldmann raised I think it may be wise to run some speed tests to find what part of the process is the slowest and focus on that as without it it's just guess works.

@ThomasWaldmann
Copy link
Author

besides speed, it is also an architectural thing.

maybe it's just not an elegant way to do it like that. it just works by brute force somehow. :-)

@naorlivne
Copy link
Owner

The slowest part seems to be when forcing an envvars to be upper-case, the following unit test takes ~114ms on my laptop:

    def test_parser_read_configuration_variable_force_envvars_uppercase_true(self):
        parser = ParseIt(force_envvars_uppercase=True)
        os.environ["TEST_ENVVAR_ESTIMATE_TRUE_INT"] = "123"
        os.environ["test_envvar_estimate_true_int"] = "456"
        reply = parser.read_configuration_variable("test_envvar_estimate_true_int")
        self.assertEqual(reply, 123)
        self.assertNotEqual(reply, 456)

where as the unit test of reading variables without them being forced to be upper-case takes <1ms on my laptop:

    def test_parser_read_configuration_variable_force_envvars_uppercase_false(self):
        parser = ParseIt(force_envvars_uppercase=False, config_folder_location=test_files_location)
        os.environ["TEST_ENVVAR_ESTIMATE_TRUE_INT"] = "123"
        os.environ["test_envvar_estimate_true_int"] = "456"
        reply = parser.read_configuration_variable("test_envvar_estimate_true_int")
        self.assertNotEqual(reply, 123)
        self.assertEqual(reply, 456)

All other tests (cli args included) takes considerably less time then test_parser_read_configuration_variable_force_envvars_uppercase_true so thinking that should be the focus of speed improvements as it will likely have the biggest pay off.

@ThomasWaldmann
Copy link
Author

uppercasing some string is a quick operation. but you need to be careful that the tests are really the same other than that (hint: they are not) and that normal timing fluctuations on you multitasking machine might be way bigger than what you are measuring.

@naorlivne
Copy link
Owner

Your right about the config_folder_location diffrence being the root cause of the slowness there, if not set it is using the current working directory as returned from os.getcwd() which is apparently takes >100ms, luckily as this is only a one time request when setting the ParseIt object it will not be called for every configuration value being read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants