Skip to content

Inference Logging

Inference Logging and Metrics

Overview

The inference deployment uses a split monitoring setup:

  • The LiteLLM Docker Compose stack on duodecillion.ti.bfh.ch includes a prometheus container.
  • Prometheus regularly scrapes the LiteLLM metrics endpoint.
  • Grafana runs on a separate external host, dashboard.ti.bfh.ch.
  • Grafana queries Prometheus over the network to visualize metrics.

Deployment Topology

flowchart LR
    subgraph DUO["duodecillion.ti.bfh.ch"]
        LL["LiteLLM container"]
        PR["Prometheus container"]
        LL -->|/metrics scrape| PR
    end

    subgraph DASH["dashboard.ti.bfh.ch"]
        GF["Grafana"]
    end

    GF -->|Prometheus datasource queries| PR

Network Access Control

Prometheus is exposed behind an NGINX server block on duodecillion.ti.bfh.ch.

In that NGINX configuration:

  • Access to the Prometheus endpoint is restricted.
  • The IP of dashboard.ti.bfh.ch is explicitly allowed.
  • Grafana on dashboard.ti.bfh.ch connects to Prometheus using this allowed path.

This setup ensures that Prometheus data is available to Grafana while limiting inbound access to authorized sources only.