Skip to content

Commit

Permalink
Clean up code for observer bot
Browse files Browse the repository at this point in the history
  • Loading branch information
Querela committed Apr 23, 2020
1 parent bc46a4b commit 283c112
Show file tree
Hide file tree
Showing 5 changed files with 338 additions and 69 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
venv/
*.egg-info/
__pycache__/
__pycache__/
build/
dist/
64 changes: 64 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ It registers the following commands:
* ``dbot-file`` - (short-hand) to send a file with an message
* ``dbot-info`` - (short-hand) to send a message with system information
(*extra dependencies have to be installed!*)
* ``dbot-observe`` - a blocking script, that runs periodic system checks and notifies about shortages
(*requires extra dependencies to be installed*)

Requirements
------------
Expand Down Expand Up @@ -147,6 +149,68 @@ You may also run the bot with the python module notation. But it will only run t
python -m discord_notifier_bot [...]
System Observer Bot
~~~~~~~~~~~~~~~~~~~

As of version **0.2.***, I have included some basic system observation code.
Besides the ``dbot-info`` command that sends a summary about system information to a Discord channel,
an *observation service* with ``dbot-observe`` is included.
The command runs a looping Discord task that checks every **5 min** some predefined system conditions,
and sends a notification if a ``badness`` value is over a threshold.
This ``badness`` value serves to either immediatly notify a channel if a system resource is exhausted or after some repeated limit exceedances.

The code (checks and limits) can be found in `discord_notifier_bot.sysinfo <https://github.com/Querela/discord-notifier-bot/blob/master/discord_notifier_bot/sysinfo.py>`_.
The current limits are some less-than educated guesses, and are subject to change.
Dynamic configuration is currently not an main issue, so users may need to clone the repo, change values and install the python package from source:

.. code-block:: bash
git clone https://github.com/Querela/discord-notifier-bot.git
cd discord-notifier-bot/
# [do the modifications in discord_notifier_bot/sysinfo.py]
python3 -m pip install --user --upgrade --editable .[cpu,gpu]
The system information gathering requires the extra dependencies to be installed, at least ``cpu``, optionally ``gpu``.

I suggest that you provide a different Discord channel for those notifications and create an extra ``.dbot-observer.conf`` configuration file that can then be used like this:

.. code-block:: bash
dbot-observe [-d] -c ~/.dbot-observer.conf
Embedded in other scripts
~~~~~~~~~~~~~~~~~~~~~~~~~

Sending messages is rather straightforward.
More complex examples can be found in the CLI entrypoints, see file `discord_notifier_bot.cli <https://github.com/Querela/discord-notifier-bot/blob/master/discord_notifier_bot/cli.py>`_.
Below are some rather basic examples (extracted from the CLI code).

Basic setup (logging + config loading):

.. code-block:: python
from discord_notifier_bot.cli import setup_logging, load_config
# logging (rather basic, if needed)
setup_logging(True)
# load configuration file (provide filename or None)
configs = load_config(filename=None)
Sending a message:

.. code-block:: python
from discord_notifier_bot.bot import send_message
# message string with basic markdown support
message = "The **message** to `send`"
# bot token and channel_id (loaded from configs or hard-coded)
bot_token, channel_id = configs["token"], configs["channel"]
# send the message
send_message(bot_token, channel_id, message)
Bot Creation etc.
-----------------
Expand Down
184 changes: 122 additions & 62 deletions discord_notifier_bot/bot.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
get_gpu_info,
)
from discord_notifier_bot.sysinfo import make_observable_limits
from discord_notifier_bot.sysinfo import NotifyBadCounterManager

LOGGER = logging.getLogger(__name__)

Expand All @@ -33,19 +34,41 @@ async def on_ready(self):
await self.close()


# ---------------------------------------------------------------------------
class SendSingleMessageClient(AbstractSingleActionClient):
def __init__(self, channel_id, message, *args, **kwargs):
super().__init__(*args, **kwargs)
self.channel_id = channel_id
self.message = message

async def do_work(self):
channel = self.get_channel(self.channel_id)
LOGGER.info(f"Channel: {channel} {type(channel)} {repr(channel)}")

def send_message(token, channel_id, message):
class SendMessageClient(AbstractSingleActionClient):
async def do_work(self):
channel = self.get_channel(channel_id)
LOGGER.info(f"Channel: {channel} {type(channel)} {repr(channel)}")
result = await channel.send(self.message)
LOGGER.debug(f"MSG result: {result} {type(result)} {repr(result)}")


class SendSingleFileMessageClient(AbstractSingleActionClient):
def __init__(self, channel_id, file2send, message=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.channel_id = channel_id
self.file2send = file2send
self.message = message

async def do_work(self):
channel = self.get_channel(self.channel_id)
LOGGER.info(f"Channel: {channel} {type(channel)} {repr(channel)}")

# attach file to message
result = await channel.send(self.message, file=self.file2send)
LOGGER.debug(f"MSG result: {result} {type(result)} {repr(result)}")


# ---------------------------------------------------------------------------

result = await channel.send(message)
LOGGER.debug(f"MSG result: {result} {type(result)} {repr(result)}")

client = SendMessageClient()
def send_message(token, channel_id, message):
client = SendSingleMessageClient(channel_id, message)
client.run(token)


Expand All @@ -54,16 +77,7 @@ def send_file(token, channel_id, message, filename):
# wrap file for discord
dfile = discord.File(filename, filename=name)

class SendFileMessageClient(AbstractSingleActionClient):
async def do_work(self):
channel = self.get_channel(channel_id)
LOGGER.info(f"Channel: {channel} {type(channel)} {repr(channel)}")

# attach file to message
result = await channel.send(message, file=dfile)
LOGGER.debug(f"MSG result: {result} {type(result)} {repr(result)}")

client = SendFileMessageClient()
client = SendSingleFileMessageClient(channel_id, dfile, message=message)
client.run(token)


Expand All @@ -88,58 +102,70 @@ def make_sysinfo_embed():
# ---------------------------------------------------------------------------


class SystemResourceObserverCog(commands.Cog):
class SystemResourceObserverCog(commands.Cog, name="System Resource Observer"):
def __init__(self, bot, channel_id):
self.bot = bot
self.channel_id = channel_id
self.local_machine_name = get_local_machine_name()

self.limits = dict()
self.notified = defaultdict(int)
self.bad_checker = NotifyBadCounterManager()
self.stats = defaultdict(int)
self.num_good_needed = 3

self.init_limit()
self.init_limits()

def init_limit(self):
def init_limits(self):
# TODO: pack them in an optional file (like Flask configs) and try to load else nothing.
self.limits.update(make_observable_limits())

for name in self.limits.keys():
self.notified[name] = 0

def reset_notifications(self):
for name in self.notified.keys():
self.notified[name] = 0
self.bad_checker.reset()

@tasks.loop(minutes=5.0)
async def observe_system(self):
LOGGER.debug("Running observe system task loop ...")

for name, limit in self.limits.items():
LOGGER.debug(f"Running check: {limit.name}")
try:
cur_value = limit.fn_retrieve()
ok = limit.fn_check(cur_value, limit.threshold)
if not ok:
self.stats["num_limits_reached"] += 1
if not self.notified[name]:
self.stats["num_limits_notified"] += 1
await self.send(
limit.message.format(
cur_value=cur_value, threshold=limit.threshold
)
)
self.notified[name] = self.num_good_needed
else:
# decrease counters
if self.notified[name] > 0:
self.notified[name] = max(0, self.notified[name] - 1)
if self.notified[name] == 0:
await self.send(f"*{limit.name} has recovered*")
except Exception as ex:
LOGGER.debug(f"Failed to evaulate limit: {ex}")

self.stats["num_checks"] += 1
async with self.bot.get_channel(self.channel_id).typing():
# perform checks
for name, limit in self.limits.items():
try:
await self.run_single_check(name, limit)
except Exception as ex:
LOGGER.debug(
f"Failed to evaulate check: {limit.name}, reason: {ex}"
)

self.stats["num_checks"] += 1

async def run_single_check(self, name, limit):
LOGGER.debug(f"Running check: {limit.name}")

cur_value = limit.fn_retrieve()
ok = limit.fn_check(cur_value, limit.threshold)

if not ok:
# check of limit was "bad", now check if we have to notify someone
self.stats["num_limits_reached"] += 1
self.stats[f"num_limits_reached:{name}:{limit.name}"] += 1

# increase badness
self.bad_checker.increase_counter(name, limit)
if self.bad_checker.should_notify(name, limit):
# check if already notified (that limit reached)
# even if shortly recovered but not completely, e. g. 3->2->3 >= 3 (thres) <= 0 (not completely reset)
await self.send(
limit.message.format(cur_value=cur_value, threshold=limit.threshold)
+ f" `@{self.local_machine_name}`"
)
self.bad_checker.mark_notified(name)
self.stats["num_limits_notified"] += 1
else:
if self.bad_checker.decrease_counter(name, limit):
# get one-time True if changed from non-normal to normal
await self.send(
f"*{limit.name} has recovered*" f" `@{self.local_machine_name}`"
)
self.stats["num_normal_notified"] += 1

@observe_system.before_loop
async def before_observe_start(self):
Expand All @@ -159,8 +185,8 @@ async def start(self, ctx):
"""Starts the background system observer loop."""
# NOTE: check for is_running() only added in version 1.4.0
if self.observe_system.get_task() is None: # pylint: disable=no-member
await ctx.send("Observer started")
self.observe_system.start() # pylint: disable=no-member
await ctx.send("Observer started")
else:
self.observe_system.restart() # pylint: disable=no-member
await ctx.send("Observer restarted")
Expand All @@ -175,8 +201,42 @@ async def stop(self, ctx):
@commands.command(name="observer-status")
async def status(self, ctx):
"""Displays statistics about notifications etc."""
await ctx.send(f"{dict(self.stats)}")
await ctx.send(r"¯\_(ツ)_/¯") # TODO: make fancy ...

if not self.stats:
await ctx.send(f"N/A [`{self.local_machine_name}`] [`not-started`]")
return

len_keys = max(len(k) for k in self.stats.keys())
len_vals = max(
len(str(v))
for v in self.stats.values()
if isinstance(v, (int, float, bool))
)

try:
# pylint: disable=no-member
next_time = self.observe_system.next_iteration - datetime.datetime.now(
datetime.timezone.utc
)
# pylint: enable=no-member
except TypeError:
# if stopped, then ``next_iteration`` is None
next_time = "?"

message = "".join(
[
f"**Observer status for** `{self.local_machine_name}`",
f""" [`{"running" if self.observe_system.next_iteration is not None else "stopped"}`]""", # pylint: disable=no-member
"\n```\n",
"\n".join(
[f"{k:<{len_keys}} {v:>{len_vals}}" for k, v in self.stats.items()]
),
"\n```",
f"\nNext check in `{next_time}`",
]
)

await ctx.send(message)


def run_observer(token, channel_id):
Expand All @@ -195,9 +255,13 @@ async def on_ready(): # pylint: disable=unused-variable
f"Type `{observer_bot.command_prefix}help` to display available commands."
)

await observer_bot.change_presence(status=discord.Status.idle)

# TODO: maybe start observe_system task here (if required?)

@observer_bot.event
async def on_disconnect(): # pylint: disable=unused-variable
LOGGER.warning("Bot disconnected!")
LOGGER.warning(f"Bot {observer_bot.user} disconnected!")

@observer_bot.command()
async def ping(ctx): # pylint: disable=unused-variable
Expand All @@ -214,10 +278,6 @@ async def info(ctx): # pylint: disable=unused-variable

observer_bot.add_cog(SystemResourceObserverCog(observer_bot, channel_id))

# @commands.command()
# async def test(ctx): pass
# observer_bot.add_command(test)

LOGGER.info("Start observer bot ...")
observer_bot.run(token)
LOGGER.info("Quit observer bot.")
Expand Down
2 changes: 1 addition & 1 deletion discord_notifier_bot/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ def send_file(bot_token, channel_id, message, filename):
def parse_args(args=None):
parser = argparse.ArgumentParser()

actions = ("message", "file", "observe")
actions = ("message", "file")
if has_extra_cpu() or has_extra_gpu():
actions += ("info",)

Expand Down
Loading

0 comments on commit 283c112

Please sign in to comment.