Inference Logging
Inference Logging and Metrics
Overview
The inference deployment uses a split monitoring setup:
- The LiteLLM Docker Compose stack on
duodecillion.ti.bfh.chincludes aprometheuscontainer. - Prometheus regularly scrapes the LiteLLM metrics endpoint.
- Grafana runs on a separate external host,
dashboard.ti.bfh.ch. - Grafana queries Prometheus over the network to visualize metrics.
Deployment Topology
flowchart LR
subgraph DUO["duodecillion.ti.bfh.ch"]
LL["LiteLLM container"]
PR["Prometheus container"]
LL -->|/metrics scrape| PR
end
subgraph DASH["dashboard.ti.bfh.ch"]
GF["Grafana"]
end
GF -->|Prometheus datasource queries| PR
Network Access Control
Prometheus is exposed behind an NGINX server block on duodecillion.ti.bfh.ch.
In that NGINX configuration:
- Access to the Prometheus endpoint is restricted.
- The IP of
dashboard.ti.bfh.chis explicitly allowed. - Grafana on
dashboard.ti.bfh.chconnects to Prometheus using this allowed path.
This setup ensures that Prometheus data is available to Grafana while limiting inbound access to authorized sources only.