Inference Logging

Inference Logging and Metrics

Overview

The inference deployment uses a split monitoring setup:

The LiteLLM Docker Compose stack on duodecillion.ti.bfh.ch includes a prometheus container.
Prometheus regularly scrapes the LiteLLM metrics endpoint.
Grafana runs on a separate external host, dashboard.ti.bfh.ch.
Grafana queries Prometheus over the network to visualize metrics.

Deployment Topology

flowchart LR
    subgraph DUO["duodecillion.ti.bfh.ch"]
        LL["LiteLLM container"]
        PR["Prometheus container"]
        LL -->|/metrics scrape| PR
    end

    subgraph DASH["dashboard.ti.bfh.ch"]
        GF["Grafana"]
    end

    GF -->|Prometheus datasource queries| PR

Network Access Control

Prometheus is exposed behind an NGINX server block on duodecillion.ti.bfh.ch.

In that NGINX configuration:

Access to the Prometheus endpoint is restricted.
The IP of dashboard.ti.bfh.ch is explicitly allowed.
Grafana on dashboard.ti.bfh.ch connects to Prometheus using this allowed path.

This setup ensures that Prometheus data is available to Grafana while limiting inbound access to authorized sources only.