I currently use kube-prometheus-stack to monitor several kubernetes clusters. Each cluster has its own deployment of the kube-prometheus-stack, however, there is currently only one cluster (a) that has alertmanager enabled. Cluster (a) is also scraping all other clusters /federate endpoint to get some health metrics and make alerting based on those.

To eliminate a single point of failure in case cluster (a) dies, I want to have a second cluster (b) with alerting enabled that runs in high availability mode together with cluster (a).

What is the best method to achieve that?

Regarding Prometheus:

Make both (a) and (b) Prometheus exactly the same configuration besides maybe a label for identification. They should contain the same data and fire the same alerts to (a) and (b) alertmanagers.

Regarding Alertmanagers:

Make (a) and (b) Alertmanagers communicate to each other to deduplicate alerts. This can be achieved by setting

  additionalPeers: []

Regarding Grafana:

Is it even achievable to make Grafana highly available in such kind of deployment? I know from here that you can set up Grafana for HA by letting both instances use the same database but how to do that in my setup?

Would be happy if someone could provide feedback on this idea...

  • Create a reverse proxy, that will balance between two instances of Prometheus, and direct Grafana to it.
    – markalex
    Commented Jun 1, 2023 at 14:59
  • @markalex for what exactly? Grafana needs access to a common database for HA mode. I guess we need specific setup in the values of the kube-prometheus-stack chart.
    – I. Shm
    Commented Jun 5, 2023 at 5:44


You must log in to answer this question.

Browse other questions tagged .