Ok. Being new to linux and all, this took a while for me to figure out, but I finally did. So, I have a debian running Nagios Core monitoring system. My former co-worker also set the system up to use Nagios API, so that we can create a custom page that shows up the status of all our system.
Briefly, this is how the API is supposed to work:
There is a python script called nagios-api that you need to run so that the api would run as an application server on a certain port. This API takes advantage of a status file dump called status.dat, configured in /etc/nagios3/nagios.cfg, updated peridocally by Nagios. Supervisor(A linux process manager) would start this script every time the server starts. The configuration file is as follow:
[program:nagapi]directory = /home/nagapiuser = apicommand = /bin/bash -c "source /home/nagapi/.virtualenvs/nagapi/bin/activate; /home/nagapi/nagios-api/nagios-api"stdout_logfile = /home/nagapi/supervisor_nagios-api_stdout.logstderr_logfile = /home/nagapi/supervisor_nagios-api_stderr.log
Problem I was experiencing
Every time I restart the server, the following would happen:
- The API did not start automatically
- Checking the status of supervisor by running sudo supervisorctl status resulted in:
nagapi EXITED Jun 06 04:39 AM
- The supervisor error file listed in the conf file above(supervisor_nagios-api_stderr.log) mentioned the following
nagios-api: error: Status file not found: /var/cache/nagios3/status.dat
So, I went and checked to see if the file existed. AND IT DID EXIST!!! So, maybe it was permission issue. I checked, and IT DID have the right permission at the parent folder level, because i know this file gets updated periodically. I then tried to start the process manually, just to make sure that I was not crazy, by running sudo supervisorctl start nagapi. The prompt indicated :
The service started with no problem!! I was on the verge of slapping my monitor in its face. What was the file not found about?????? Why can’t Supervisor start my service???
It turns out that logs really don’t lie. After a restart takes place, it takes some time for Nagios daemon to create dump file. Supervisor was trying to use the file status.dat file before it gets created by Nagios. So, this explains why Supervisor was not able to run the api, but I was able to manually start it. All I had to do was be patient. So, I added a start delay timer in my supervisor config file for the API. The config file now looks like:
[program:nagapi]directory = /home/nagapiuser = apicommand = /bin/bash -c "source /home/nagapi/.virtualenvs/nagapi/bin/activate; /home/nagapi/nagios-api/nagios-api"startsecs = 60stdout_logfile = /home/nagapi/supervisor_nagios-api_stdout.logstderr_logfile = /home/nagapi/supervisor_nagios-api_stderr.log