Skip to content
This repository has been archived by the owner on Jan 18, 2023. It is now read-only.

CMK reconfiguration function errors #300

Open
ylhsiehitri opened this issue May 25, 2021 · 0 comments
Open

CMK reconfiguration function errors #300

ylhsiehitri opened this issue May 25, 2021 · 0 comments

Comments

@ylhsiehitri
Copy link

Hi folks,

I use CMK v1.5.2 release, and Kubernetes v.1.16.7
I tried the CMK reconfigure function but fails, it seems the code has some errors.

My patch for the issues 1~5 below is here: ylhsiehitri@c195694
With the patch, reconfigure function can resize pool successfully now.
But the problem in 5(b) is left to be solved, core affinity in cmk configmap is changed successfully but in OS it is unchanged actually.
(this case could be produced by: create two pods with one exclusive core for each --> make pod-1 completed while pod-2 keeps running --> reconfigure exclusive pool size to be only 1 --> CMK would change pod-2's core affinity to pod-1's, but actually unchanged in OS)

  1. Bug in lock:
    Within reconfigure.reconfigure(), it goes as follows into error loop
    (a) it calls config.lock():
    in the config.lock(), the "owner" annotation derived from configmap is empty, so it is set to be the name of the associated pod
    (b) then, reconfigure_directory() is called:
    (i) In reconfigure_directory(), discover.add_node_er() is called, in which config.lock() is called again
    (ii) now in the config.lock(), the "owner" is not empty (it was just set in (a)), so following the if-else statement, it goes to the "if" section and leads to infinite loop.
    ==>
    This error infinite loop could be solved, just correct the if statement in config.lock():
    if owner != "" and owner != self.owner

  2. Missing argument "namespace"
    These places miss required argument "namespace":
    (a) In reconfigure.reconfigure_directory() that calls discover.add_node_er() and discover.add_node_oir()
    (b) In describe.describe() that calls config.Config()"
    (c) In isolate.isolate() that calls config.Config()
    ==> Fixed, also the function headers of reconfigure.reconfigure() and describe.describe() are also corrected, added with argument "namespace"

  3. Non-default OS environment variable, e.g. NODE_NAME, should be given in pod yaml
    In CMK code, it often uses os.getenv("NODE_NAME") to get environment variable. However it is not given by default.
    ==> Should define env NODE_NAME in pod yaml. I added it in the cmk-isolate-pod.yaml as an example.

  4. (Discussion) Redundant augument
    In reconfigure.reconfigure() function header, the argument "node_name" is redundant since it will be reassigned in the function body
    ==>
    Or, the reconfigure function was designed to allow user to specify node_name?

  5. yaml.safe_load() fails for "Procs" object deserialization
    In reaffinitize.py, it uses yaml.safe_load() to deserialize the yaml-serialized Procs object, but fails
    (a) solved bug:
    source code: yaml.safe_load(config["config"])
    correct to : yaml.safe_load(config.data["config"])
    (b) unsolved:
    We expect the corrected yaml.safe_load() in (a) would return a Procs object, but actually it encounters exception error that it does not know how do the convertion.
    ==>
    Anyone knows how to deal with it simply?
    As far as I know, yaml.safe_load() can do deserialize for object of simple structure. But like the class Procs defined in reconfigure.py, it is not of simple structure: it has an dict, of which the values are Pid object. For such case, python provide yaml.add_constructor() mechnism to define how to reconstruct the object.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant