0

Summary:

I'm using Cadvisor with Prometheus in multiple Kubernetes (k8s) clusters to monitor network traffic usage. I utilize the container_network_receive_bytes_total metric in a query to calculate the total network traffic usage. However, I'm encountering an unusual issue in one of the clusters.

Problem:

In one of my clusters, I have a non-production database that has been running smoothly for 20 days. However, starting from yesterday, the container_network_receive_bytes_total metric has shown a significant spike in usage, even though I am certain there is no increase in load. This issue is not isolated. I have encountered similar occurrences multiple times, and they all seem to happen in this particular cluster. I attempted numerous approaches to reproduce it, but I was unable to do so.

This is the query I'm using :

(
    sum (
        increase (
            container_network_transmit_bytes_total{namespace="TEST"}[2d]
        )
    ) by (node, cluster, namespace, pod)
) / 1000000000

And this is the spike : here

I believe the root cause of this issue lies within this cluster, but I am seeking guidance or clues on how to troubleshoot and resolve it.

0

You must log in to answer this question.

Browse other questions tagged .