Add initial support for Prometheus metric collection #95

masahi · 2023-12-01T10:31:24Z

This PR instruments the engine to collect Prometheus metrics. The collected metrics can be queried by the new endpoint, metrics (for example curl 'http://127.0.0.1:8000/metrics').

The following metrics have been enabled in this PR, and others can be easily added as needed.

Prefill / decode batched token counts histogram
E2E latency histogram
TTFT histogram
The number of cache eviction counter
KV cache utilization gauge

By combing this endpoint with the Prometheus server and Grafana, we can get nice visualization of metrics over time. See examples below.

KV cache utilization (from 0 to 1)

E2E latency histogram over time visualized as heatmap

sunggg

Thank you as always, @masahi!
Excited to seeing this coming, it will help us greatly to better understand the engine behavior. A few minor comments.

sunggg · 2023-12-01T21:31:45Z

serve/mlc_serve/engine/metrics.py

+        buckets_batched_decode_tokens = (1, 10, 30, 50, 75, 100, 125, 150, 175, 200, 250, 300)
+
+        for label, buckets in [
+            (E2E_LATENCY, buckets_e2e_lat),


How do we handle exception in these metrics? Is it included in e2e latency buckets for example?

We record a metric only when a request successfully reaches the point in the code where the metric is instrumented. For e2e latency, that's L165 in stage_engne_worker.py. So if an exception is raised, the request doesn't reach that point and hence it won't be recoreded.

Should we count the number of each type of exceptions raised? That would be helpful to identify abnormal behavior.

That's possible if we have a particular exception in mind. We need to be aware of where that exception could be raised, and add a counter in except: block. This could be one of follow-up work we can do.

sunggg · 2023-12-01T21:33:53Z

serve/mlc_serve/engine/metrics.py

+        buckets_ttft = (0.1, 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.0)
+        buckets_batched_prefill_tokens = (500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000)
+        buckets_batched_decode_tokens = (1, 10, 30, 50, 75, 100, 125, 150, 175, 200, 250, 300)
+


For the future, I think tracking memory & compute utilization would be helpful. With my naive understanding, this might be tricky since it may require per-gpu tracking.

If there is a tool to get such information we can certainly do that. But for memory util (not KV cache util), due to memory profiling and KV cache pre-allocation, we always use 90% of available VRAM.

serve/mlc_serve/engine/staging_engine_worker.py

commit de28200 Author: Masahiro Masuda <[email protected]> Date: Fri Nov 17 01:20:36 2023 +0000 wip

sunggg

Thank you for awesome addition, @masahi!

masahi marked this pull request as ready for review December 1, 2023 10:40

sunggg reviewed Dec 1, 2023

View reviewed changes

masahi added 8 commits December 4, 2023 20:15

Squashed commit of the following:

8f8f704

commit de28200 Author: Masahiro Masuda <[email protected]> Date: Fri Nov 17 01:20:36 2023 +0000 wip

add kv cache utilization gauge

8d65476

clean

619d9d3

add metrics for batched token counts

1a25c51

fix prefill bucket

7bfbd02

update prefill buckets

dfe0dfe

minor

ba752b2

minor

5bd19cb

masahi force-pushed the metric-prom branch from 59d2d60 to 5bd19cb Compare December 4, 2023 11:18

fix missing arrival time in sync engine

d4a0fd4

sunggg approved these changes Dec 6, 2023

View reviewed changes

sunggg merged commit 21ee0bb into octoml:batch-serving Dec 6, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for Prometheus metric collection #95

Add initial support for Prometheus metric collection #95

masahi commented Dec 1, 2023 •

edited

Loading

sunggg left a comment •

edited

Loading

sunggg Dec 1, 2023

masahi Dec 1, 2023

sunggg Dec 5, 2023 •

edited

Loading

masahi Dec 5, 2023 •

edited

Loading

sunggg Dec 1, 2023

masahi Dec 1, 2023

sunggg left a comment

Add initial support for Prometheus metric collection #95

Add initial support for Prometheus metric collection #95

Conversation

masahi commented Dec 1, 2023 • edited Loading

sunggg left a comment • edited Loading

Choose a reason for hiding this comment

sunggg Dec 1, 2023

Choose a reason for hiding this comment

masahi Dec 1, 2023

Choose a reason for hiding this comment

sunggg Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

masahi Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

sunggg Dec 1, 2023

Choose a reason for hiding this comment

masahi Dec 1, 2023

Choose a reason for hiding this comment

sunggg left a comment

Choose a reason for hiding this comment

masahi commented Dec 1, 2023 •

edited

Loading

sunggg left a comment •

edited

Loading

sunggg Dec 5, 2023 •

edited

Loading

masahi Dec 5, 2023 •

edited

Loading