Questions tagged [monitoring]
Applications or appliances that observe machines, systems and networks to find problems and notify administrators.
2,471
questions
1
vote
1
answer
2k
views
Prometheus auto scrape metrics from multiple kube-state-metrics in kubernetes?
I want to use a kubernetes(cluster-0)with multiple kube-state-metrics to monitor multiple other kubernetes cluster(cluster-1,2,3,4)
In the (cluster-0), I split into multiple namespaces like this:
...
0
votes
0
answers
39
views
Is it possible to ensure detection and logging of all attempts to copy data out of a system?
Say I have a server set-up for processing sensitive data. The few authorised users of the system are instructed not to copy any of the sensitive data out of the platform, but could in principle do so ...
0
votes
1
answer
830
views
How should "CPU usage per node" be interpreted in Google Cloud monitoring?
In the monitoring tab for Composer (Airflow) on Google Cloud there is a graph showing "CPU usage per node". How should the values in this graph be interpreted? What value would indicate that ...
0
votes
1
answer
314
views
How to monitor all the processes from a user in windows server 2012?
I would like to monitoring all the processes which are invoked by a user within a period of time, like a couple of hours, some process may only run less than a second.
what is the best way to do? ...
0
votes
1
answer
296
views
Using "monit" - how to detect empty reply from http process (apache2)
I would like to monitor empty replies from my apache2 process as I am running into a problem similar to "Apache gives empty reply" .
I am using monit to monitor my processes, so I am going ...
0
votes
0
answers
435
views
I want to monitor SSH with Monit but I got an error
I want to monitor SSH with Monit but I got an error. This setup works with my old Ubuntu 18.04 server but it doesn't work on my new Ubuntu 20.04 server :
ubuntu@ov-xxxx ~ $ cd /var
ubuntu@ov-xxxx /var ...
0
votes
1
answer
3k
views
Windows Event Forwarding and Sysmon
I'm dealing with a bit of an issue relating to WEF and sysmon
I have the collector server setup and 2 domain controllers are configured via GPO to send events to WEF collector.
It is configured via ...
2
votes
2
answers
4k
views
Zabbix low-level discovery - CPU usage per process - two items with identical keys
I'm trying to use Zabbix for monitoring CPU usage by different processes on Windows Server. Processes to monitor are not determined upfront. I want to use LLD to monitor top 3 CPU demanding processes.
...
0
votes
2
answers
1k
views
haproxy, is it impossible to monitor the http total request time?
using haproxy HTTP logs with http-server-close of keep-alives the time counters in the logs (TR, Tt...) are based on the beginning of the TCP connection. Which means only the first request, with ...
1
vote
2
answers
776
views
use nethogs in a script
nethogs is a great utility to monitor network traffic by process. However, it is "interactive" and not suitable to be used in script... How can I achieve the following using nethogs or ...
0
votes
1
answer
100
views
How can I monitor cli commands on one machine and execute them on another in real time?
I'm trying to make a highly available topology, by executing all of the cli commands that have been executed from one machine to another - Doing this will syncronize their configurations. Therefore, ...
1
vote
1
answer
78
views
What monitoring system should I use for offline network [closed]
I'm part of a sysadmin\DevOps team for for an application. Currently today we have about 25 - 40 vms running as different parts of the application in micro services on the
openshift container platform,...
1
vote
0
answers
39
views
Server Monitoring Tool via file transfer
I am trying to figure out a way to monitor many Windows servers with a monitoring tool like Nagios, Zabbix even PRTG. The challenge is that server are onboard vessels without a reliable internet ...
0
votes
2
answers
2k
views
How to set Munin Critical/Warning alerts when value are under a threshold and not over?
I am trying to do a simple alert in Munin checking SW RAID 1 status where a metric of 2 disks is healthy, 1 disk is Warning and 0 disks is Critical.
All the Munin monitors I've seen are triggered when ...
1
vote
2
answers
752
views
Breaking down one prometheus.yml file?
I am using Prometheus for our monitoring and I have a lot of configs (our prometheus.yml main config file is 8000+ lines long).
I would like to divide this out into logical groupings so that it ...
0
votes
0
answers
369
views
monitor CRL expiration dates for multiple nginx servers
Rephrased question: (Not sure it's really clearer)
I have a small self written script, that monitors multiple servers.
In fact my script just starts periodically tiny smalls scripts and gathers the ...
0
votes
2
answers
390
views
Cannot access monit web interface
I just installed Monit on my server. I want to access to the web interface to manage it but the web is not accessible.
The machine is an instance in AWS, the port is open. I have tried many ...
2
votes
1
answer
766
views
CWAgent Disk Space Alarms
I'm trying to implement an alarm(in Cloudformation) for disk space free using metrics from the Cloudwatch agent and I'm having issues with devices shuffling DeviceID.
I encountered this earlier when ...
0
votes
2
answers
755
views
Systemwide File Access and System Call Monitoring on Linux?
In Windows land, you can run Procmon (Process Monitor) from Sysinternals, which will show you every File access, Registry Query etc Systemwide (screenshot attached). You can then backtrack to find ...
1
vote
1
answer
573
views
How does Windows Resource Monitor report the disk I/O related to virtual memory reads/writes?
In Resource Monitor, under Disk > Disk Activity, a list of files is shown along with the disk read/write B/sec being performed on each. When memory is paged to disk (ie. virtual memory is written), ...
0
votes
1
answer
267
views
What's Azure equivalent of EventBridge working with CloudWatch to consume all alerts?
We're trying to find a way to be notified and consume (using logic apps) all alerts generated via Azure Monitor.
It seems that AWS allows that via EventBridge, so:
"Amazon EventBridge now integrates ...
0
votes
0
answers
292
views
How to configure k8s nginx external auth and exclude health check path?
We started using oauth2-proxy as external authentication for some of our cluster infrastructure components.
Our cluster is using the ingress-nginx controller and the Ingress resources are configured ...
0
votes
1
answer
188
views
Remote monitoring by Nagios Core
I am working on a project using Nagios to monitor a controller that monitors gas leaks, temperature,... remotely.
How can a Nagios Core in one city communicate and receive supervision information ...
0
votes
1
answer
630
views
get agent nodes to show on master node in icingaweb2
I installed icinga2 and icingaweb2 on master node
I installed icinga2 on 3 more servers as agent nodes.
I used icinga2 node wizard, configured them as agent and allowed them to connect to master node....
6
votes
1
answer
4k
views
create a CloudWatch Alarm when an ECS service unable to consistently start tasks successfully
If I release a new Docker image with a bug to my ECS Service, then the service will attempt to start new Tasks but will keep the old version around if the new tasks fail to start.
In that scenario, it ...
5
votes
1
answer
8k
views
GCP VM Disk space alert
How can I configure GCPs monitoring suite to look at % disk utilization (in total space used, not IOPs)?
The only "disk used" metric I see in metrics explorer seems to chart some kind of units per ...
1
vote
1
answer
786
views
Finding wasteful or over-provisioned pods on a "full" but underutilized Kubernetes cluster
I work on a Kubernetes cluster where, right now, about 95% of the CPUs and 90% of the memory have been allocated to pods. However, according to the Kubernetes Dashboard, the overall instantaneous CPU ...
1
vote
1
answer
171
views
Which source of sensor readings are most prefered? IPMI, ACPI, or from sensor chip itself?
When monitoring systems for their temperature, and fan speeds, what source of sensor readings is most preferred?
I can get all the motherboard readings from both,
IPMI and directly from the Winbond ...
1
vote
1
answer
626
views
smartctl harddisk check doesn't show Attribute
today i install on my linux server the app smartmontools, after testing my hardrive (raid1) he doesn't show the Attribute.
After the command
smartctl -a /dev/nvme0n1
,i get the result without ...
1
vote
0
answers
871
views
Monitoring SLA/SLO/SLI using Prometheus
I have done much research about monitoring SLI metrics with Prometheus. I have found only how to monitor a cluster using Kubernetes. I'm hoping to find a response here for simple monitoring.
I also ...
0
votes
1
answer
85
views
AWS solution to monitor events from external machines, reported by SNS?
We have a number of robots installed at various locations, and servicing customers. All robots get their instructions from a central cloud database with customer data, and each have an SQS queue which ...
-1
votes
1
answer
31
views
Securing System Monitoring wall display PCs
I have several windows machines which drive dashboards on wall mounted displays for system and network monitoring. I would like to be able to secure them from unauthorized access or modification. ...
0
votes
0
answers
11k
views
Zabbix active agent can't connect to server - interrupted system call
I'm running the active Zabbix agents on all the servers in my production environment, however two of them aren't able to connect to the Zabbix server. All I've got to go on in the Zabbix logs is
...
0
votes
1
answer
329
views
Two nagios instances in the same machine
After days of surfing on the net and trying by myself, I urgently need your help.
So, I have Nagios Core 4.4.3 installed in my centos machine that I use to monitor PROD and TEST environnement of one ...
0
votes
1
answer
74
views
free server monitoring tool for Java based application
I have applied couple of options like Nagios [which lead to problem after installation]-- Apache went irresponsive with lots of segmentation faults
child pid 32507 exit signal Segmentation fault (11)...
0
votes
1
answer
327
views
How to filter by status information column in Thruk
I am using Thruk as a monitoring interface. At the top left corner of the page there is a button which opens a stack with filters of which hosts/services etc. you want to apply.
You can easily add ...
2
votes
1
answer
11k
views
Promethius, group_left, and "on" vs "ignoring"
In Issue #2204, one of the Prometheus developers says:
...in principle you should be favouring ignoring over on to produce generic shareable rules...
I'm confused how the use of ignoring would ...
0
votes
1
answer
43
views
Two Factor Monitoring for ETLs
This name, similar to the 2FA security schema comes from a scenario in which I want to be sure periodically that certain ETL triggers are in place.
Not only I want to monitor whether certain ...
11
votes
3
answers
945
views
Which tool to use when monitoring machines (linux+windows) with one way communication? [closed]
I have 100+ machines which needs to be monitored, mostly linux, but there are some Windows servers too. I want to be informed when the disks are getting full, when the load is high, or a service is ...
0
votes
0
answers
192
views
openshift metrics per namespace
Hello how can i provide usage metrics like:
- cpu usage per container
- persistantVolume fill level
- network usage per container/pod
of an openshift cluster.
But individual per namespace/project ...
0
votes
4
answers
171
views
Simple host monitoring solution
I am looking for a host monitoring solution for an infrastructure I have to manage.
Since this infrastructure is on-premise, I would like to have a client-server architecture, where a client reports ...
0
votes
1
answer
5k
views
Error 404 when trying to access Kubernetes dashboard from remote laptop using SSH proxy
I have a remote cluster on a remote private Cloud to which I have only SSH access (no GUI). I started the proxy server with:
kubectl proxy --address=0.0.0.0 --accept-hosts=.*
And started a local SSH ...
0
votes
1
answer
459
views
Nagios - check procs and --metric=elapsed on the same service
After many days of working and searching onn the net, I'm getting back to you as a last chance for help.
I'm working actually on monitoring unix process with nagios core 4.4.3 with nrpe .
My goal is ...
0
votes
0
answers
27
views
Easily monitor the CPU % use of a process
I'm have a small ISP and currently we are running an application of a customer that it's creating high CPU pikes. Basically we have chrome tab running and automate a process to stresstest a web ...
1
vote
1
answer
981
views
Tomcat RequestProcessor errorCount - what counts as an error?
We have a Zabbix server that reports on Tomcat's errorCount, from the GlobalRequestProcessor. I'm trying to figure out exactly what gets counted in this errorCount. Is it any request to Tomcat the ...
0
votes
1
answer
40
views
is there something like the stem-and-leaf plot for timeseries?
When wanting to quickly take a look at the distribution of a sequence of values, the stem-and-leaf plot is an incredibly simple, yet powerful tool. It takes a few minutes to teach the computer to draw ...
1
vote
2
answers
121
views
Best practice for alerts if webpage returns white page?
We're trying to setup monitoring (zabbix), for webapps that return white page of death. Apps are PHP based.
From what I know, white screen of death can be caused by number of issues, memory issues, ...
0
votes
1
answer
102
views
Enforcing monitoring on AWS resources
We have a couple huge AWS accounts and I've been tasked with implementing guidelines for monitoring resources and ensuring that monitoring is set up for all existing and future resources.
Is there ...
1
vote
1
answer
95
views
Asserting on extended information from Nagios's check_mysql
I'm running the check_mysql plugin using NRPE on a remote DB-server, and while I can get satisfactory data on whether or not the server process is working as needed, I see that the plugin outputs a ...
2
votes
2
answers
1k
views
EC2 instance running nginx crashes, "connection refused" - how do I monitor for this?
Say nginx on an EC2 instance crashes. The instance is healthy and CloudWatch Metrics are great, but all the domains hosted on the server are now "Connection refused".
This seems like a very basic ...