Questions tagged [system-monitoring]
Questions regarding system monitoring - Nagios, Icinga, Spiceworks, Munin, Zabbix and more.
166
questions
44
votes
4
answers
29k
views
Find out which process is changing a file
I'm trying to find a reliable way of finding which process on my machine is changing a configuration file (/etc/hosts to be specific).
I know I can use lsof /etc/hosts to find out what processes ...
28
votes
6
answers
59k
views
How to find out the number of time series stored in Prometheus LevelDB
i'm responsible for maintaining the Prometheus servers in our company. The metrics however are provided by the teams.
Is there a way to find out the number of time series stored in the Prometheus ...
20
votes
9
answers
121k
views
script to automatically test if a web site is available
I'm a lone web developer with my own Centos VPS hosting a few small web sites for my clients. Today I discovered my httpd service had stopped (for no apparent reason - but that's another thread). I ...
12
votes
3
answers
6k
views
Alternative to etsy/statsd
Is there any alternative to etsy's statsd? Maybe even a complete dashboard-like solution? My research only found proprietary SaaS solutions.
For those who do not know: statsd is a deamon which ...
12
votes
2
answers
5k
views
16TB Volumes and SNMP On Windows
As volumes larger than 16TB became more common, it was recognized that the 32 bit value used to report disk size and usage within the standard "HOST-RESOURCES" MIB in SNMP was not large enough to ...
11
votes
4
answers
9k
views
How can you distinguish between a crash and a reboot on RHEL7?
Is there a way to determine whether a RHEL7 server was rebooted via systemctl (or reboot / shutdown aliases), or whether the server crashed? Pre-systemd this was fairly easy to determine with last -x ...
10
votes
1
answer
177
views
Best way to monitor Windows server? [closed]
I'm working at a company that provides our small business clients with IT support. One of my tasks is to perform service checks which includes checking the event viewer for critical errors/warnings as ...
10
votes
4
answers
4k
views
Monitoring Dell/HP Servers Running ESXi (Free)
What are you all doing to monitor ESXi servers that run the free edition? With the lack of SNMP support, it seems fairly limited to me. What'd I'd like to be able to do is get some type of alert when ...
9
votes
2
answers
19k
views
How to monitor power supply status using ipmitool on Linux/Solaris?
ipmitool differs a lot in Solaris and Linux. How can I use ipmitool in these servers (on Sun, IBM and other hardwares) to detect the power supply status?
8
votes
2
answers
6k
views
Load average is greater than the number of EC2 Compute Units
On an EC2 m1.large, with an AVG CPU Utilization graph such as this:
how is is possible that the load average is greater than the number of EC2 Compute Units (4) ?
cat /proc/loadavg
5.78 5.57 5.44 1/...
8
votes
4
answers
58k
views
SNMP service security tab is missing - Windows Server 2012 R2 - DC
I have to configure the security settings for the SNMP-Service on a Windows Server. But they are missing!
Here are the facts:
OS: Windows Server 2012 R2
I installed the SNMP feature and I believe, ...
7
votes
2
answers
2k
views
Agentless monitoring: how does it work? Advantages over traditional monitoring?
How does agentless monitoring work?
From what I understood (or not), it seems this is accomplished by logging into the node-being-monitored from a central server and uploading-then-running scripts on ...
7
votes
1
answer
3k
views
How to restart and alert if condition matches in Monit?
How can I do multiple things when condition is matched? For example if I want to restart a process and also send alert email. I know I can do it with two separate lines, but can I combine them?
if ...
7
votes
2
answers
439
views
Green-IT: How do you deal with poweroff systems in your system monitoring?
Many of you probably have completed or are contemplating Green-IT projects with the goal to power off idle or unneeded systems when demand for computer resources is low:
How you did handle this ...
7
votes
1
answer
868
views
Intermittent munin-cron error “There is nothing to do here, since there are no nodes with any plugins”
We've installed munin monitoring on one of our servers. Generally it seems to be working well but occasionally, 4 times in 2 months to be exact, munin-cron has generated the following error:
[...
6
votes
1
answer
8k
views
Full status information in Nagios email notification?
I have set up Nagios to monitor my servers and I have written a few custion checks.
When I get a notification email, I only get the first line of the status information and I have to use the web ...
6
votes
1
answer
418
views
Nagios OK notification at the beginning of the availability period
I'm monitoring an application which starts just before business hours and shuts down at the end of the day using Nagios 4.3. I've configured the notification period for it to start 3 minutes after the ...
5
votes
1
answer
8k
views
GCP VM Disk space alert
How can I configure GCPs monitoring suite to look at % disk utilization (in total space used, not IOPs)?
The only "disk used" metric I see in metrics explorer seems to chart some kind of units per ...
5
votes
4
answers
2k
views
IBM ServeRAID: how to use email alerts?
I just installed a brand old IBM server with a ServeRAID 4Lx card.
I installed the driver, and the ServeRAID manager software v9.30.
Everyting works as expected.
My problem is:
Yesterday, when not-so-...
5
votes
1
answer
5k
views
In Icinga (Nagios), how do I configure hosts with multiple IPs?
I'm setting up Icinga (Nagios fork) and I have some machines with multiple interfaces. Some services are only listening on one of them and to check them correctly, I like to know if it's possible to ...
5
votes
2
answers
4k
views
iotop does not show writes
What could be writing on the disk that iotop does not show?
# iotop -a
Total DISK READ: 8.19 M/s | Total ****DISK WRITE: 3.34 M/s****
TID PRIO USER DISK READ DISK WRITE> SWAPIN IO ...
4
votes
7
answers
3k
views
Remote Linux server monitoring
I'm looking for a solution to monitor multiple Linux servers remotely. I don't need a whole lot of granular data, just basic things like server load and critical error notifications. I'm no Linux guru,...
4
votes
2
answers
210
views
I am looking for a tool to measure or detect "unresponsiveness" of a desktop PC
I have a client that provides some server systems to a hospital, and a support ticket was raised that the desktop application was hanging waiting for the server. We did some extensive testing and its ...
4
votes
1
answer
358
views
Find network percentage of NIC [closed]
So I have created a Linux resource monitoring tool that pulls various resource information. One of the fields I am trying to pull is the percent of network throughput on my NIC. So if I have a 1 Gb(...
4
votes
2
answers
758
views
Is Collectd a good choice for gathering system metrics [closed]
I had some experience with collectd a year or so back. I remember being impressed by its speed and flexibility, however it was never adopted as the main source of collecting metrics, cron jobs running ...
4
votes
1
answer
1k
views
How to monitor GlusterFS clients?
We are doing Ok (we'd like to think) monitoring our GlusterFS servers via Icinga. We'd like to monitor the clients too.
Other than making sure, there is a glusterfs process running for each glusterfs-...
4
votes
1
answer
216
views
logstash-forward equivalent for fluentd?
Is there something equivalent to logstash-forwarder that can ship logfiles to fluentd?
I am trying to send log files from an application to a remote fluentd but have not seen whether this is ...
4
votes
3
answers
5k
views
Hiding hosts in Nagios
I would like to monitor a few hundred hosts using Nagios, yet I only want the switch fabric to show up in the statusmap.cgi. Is there a way to prevent a host from showing up in the status map, yet ...
3
votes
4
answers
2k
views
Spawn phone call from EC2 alerts
I have a system setup on AWS/EC2, it currently is using their CloudWatch alert system. The problem is this sends just to email, when ideally I would like this to be making a phone call and/or sending ...
3
votes
1
answer
2k
views
How to user monit to count the number of instances of a process
Is it possible to use monit to count the number of instances of a process (in my case Celery) and take an action accordingly.
For example if there are 4 instances of celery daemon, then take action
3
votes
2
answers
1k
views
Reporting historical system activity in FreeBSD
I'd like to record data about system activity under FreeBSD for future analysis. If I were running a SysV system, I'd just sar and its related utilities, but that doesn't exist in the BSDs. (And ...
3
votes
2
answers
7k
views
How to watch a service with multiple processes with Monit?
I'm trying to watch the mailing list manager sympa with monit. A running sympa instance consists of multiple processes for the different tasks of list management (e.g. a separate process for archiving ...
3
votes
1
answer
9k
views
Can't get Monit to check status of apache2
I am trying to configure Monit on my local machine to get a taste at how it works, but I have some issues.
What I am trying to do is to get any evidence that Monit is up and running correctly and is ...
3
votes
2
answers
8k
views
Cannot read status the monit daemon, even with allowed group
I cannot seem to get monit status or other CLI commands to work.
I've built monit v5.8 to run on a Raspberry Pi. I'm able to add services to be monitored, and the web interface can be accessed just ...
3
votes
5
answers
9k
views
Windows Services: How to schedule and monitor them?
We have about a dozen Windows Services, both in-house developed and third-party products, and have the following requirements for managing them:
Start/Stop/Bounce a given service at scheduled times ...
3
votes
1
answer
2k
views
Icinga2 dependecies of devices on HA
I would like to configure a Host-to-Host dependency on Icinga2, however, one of the Hosts has an HA configuration, so I need the to trigger it only when both HA devices are down. Suppose this scenario:...
3
votes
0
answers
236
views
Prometheus not monitoring all EC2 instances of a region
I have set up Prometheus for the monitoring of my AWS EC2 instances, but the issue is taht Prometheus is showing up only 1 instance, however in my AWS instance account there are 2 instances running. ...
3
votes
2
answers
116
views
Incident reporting and logging
I am looking into tool (or advice) that would allow me to track and log all incidents that happen on my infrastructure.
We have a few servers (50+) and that number is going to increase in the future,...
3
votes
1
answer
2k
views
Dynamically setting check_interval parameter based on Service_State in Icinga2
I have a requirement where check interval is 180 mins while notification interval is 10 mins. Means service owner wants if he miss any alert that usually comes after 180 mins if service is critical ...
3
votes
0
answers
1k
views
How to check what process or application is deleting a file without using Process Monitor? (Windows Server)
Currently I'm having an issue with a piece of software that makes use of specific files (which are basically xml), sometimes stored on a file share and sometimes stored locally.
Every so often one of ...
3
votes
0
answers
1k
views
System Center 2012 Alternatives [closed]
Are there any good alternatives to System Center 2012?
I'm looking for a system platform that allows gives us a replacement for SCCM, SCOM, EdnPoitn Protection, DPM and Global Service Monitor and can ...
3
votes
0
answers
497
views
Munin disable dynazoom.html
Doing a quick google search for "Munin dynazoom.html doesn't work" yields many results. There doesn't seem to be a solution that works -- at least not that I have seen. I have munin installed on a ...
2
votes
2
answers
848
views
Delaying a Nagios/Icinga check
When monitoring the healthy of a server, some faults or warnings are immediately urgent but others only matter if they persist. I'm thinking of things like:
Some software needs to be updated
Time ...
2
votes
2
answers
15k
views
How to reset the admin password for Observium
How can i reset the password for the user admin with MySQL or a Observium script.
MariaDB [observium]> select * from users;
+---------+----------+------------------------------------+----------+---...
2
votes
2
answers
8k
views
Nagios check_ssh returns usage information instead of status
I installed Nagios on a Ubuntu Desktop (Nagios server) and I want to monitor a Ubuntu server instance (monitored client). I can connect via SSH between both machines and SSH is not blocked. The nagios ...
2
votes
1
answer
3k
views
M/Monit how to see current disk space?
In the admin interface of M/Monit under Reports -> Analytics I can chose to show Space %.
How can I make the Monit clients submit this info?
Is there a way to display Disk Space percentage on the ...
2
votes
2
answers
3k
views
IBM x3500 Server managament/monitoring tool
I took over monitoring an older IBM x3500 7977 server and i don't have much knowledge of IBM servers.
I'm looking for the equivalent of Dell Server Administrator from IBM, just to monitor and alert on ...
2
votes
2
answers
2k
views
Nagios Basic Configuration (for quick addition of new machines)
I recently started to use Nagios to monitor about 25 servers (mainly virtual, with some standalone). Them majority of the servers (including the Nagios host itself) are running Ubuntu 14.04 LTS, with ...
2
votes
2
answers
8k
views
Nagios: turn off service checks/display on down hosts
I want to to tweak nagios in such a way that all checking stops (with services not displayed, or displayed as unknown) for any down node. Said differently I only want to see one alert for a down host ...
2
votes
2
answers
2k
views
Record SSH / Terminal into video?
If someone access to the server via Putty (SSH) or terminal - I want to record everything what they can see on the screen and what they have typed into video..
What is the solution to this, is there ...