RE: Rare runsv logging problem

From: James Powell <james4591_at_hotmail.com>
Date: Fri, 25 Jul 2014 22:32:35 -0700

Another thing could be that the service may not need a log. I've directed a lot of unwanted output to /dev/null.

Can you post one of your run files as an example?

Sent from my Windows Phone
________________________________
From: James Powell<mailto:james4591_at_hotmail.com>
Sent: ‎7/‎25/‎2014 9:35 PM
To: Caleb Spare<mailto:cespare_at_gmail.com>; supervision_at_list.skarnet.org<mailto:supervision_at_list.skarnet.org>
Subject: RE: Rare runsv logging problem

My question is why are you running Upstart? Runit has it's own init so Upstart is pointless. Runit's binary should maintain runsv. It also could depend on the run script also having an improper handling.

Sent from my Windows Phone
________________________________
From: Caleb Spare<mailto:cespare_at_gmail.com>
Sent: ‎7/‎25/‎2014 5:16 PM
To: supervision_at_list.skarnet.org<mailto:supervision_at_list.skarnet.org>
Subject: Rare runsv logging problem

Hi,

I've been using runit for a while now and it has been mostly
wonderful. I'm noticing a persistent issue and I'm not sure how to
debug it.

On the servers we're running Ubuntu and we use runit 2.1.1 via the
default package that comes with the distro. Upstart runs runsvdir and
we use runit to manage all of our application processes. Each
application has a simple ./run and ./log/run; the latter execs svlogd
(this is all a typical configuration, as I understand it).

The problem I'm seeing is that, very occasionally, runsv will get into
a bad state where svlogd is not running. (I'm not sure if it fails to
start svlogd or if this happens later on after it has been running
properly.) When the problem occurs, pstree shows something like this:

runsvdir-+-runsv-+-foo---5*[{foo}]
         | `-svlogd
         |-runsv-+-bar---21*[{bar}]
         | `-svlogd
         `-runsv---baz---250*[{baz}]

Here you can see that the baz process does not have an associated
svlogd process. Further:

$ sudo sv s foo
run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
$ sudo sv s baz
run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
file does not exist
; run: log: (pid 2337) 2983s

Two strange things there: the warning about supervise/ok and also that
the pid for 'log' is the same as for 'baz'.

When runsv is in this bad state, the output from baz goes right to
runsvdir and ends up in /var/log/upstart/runsvdir.log.

The fix I've been using is to 'sv d baz' and then kill the offending
runsv process. Runsvdir will quickly restart it and then everything
will be working:

runsvdir-+-runsv-+-foo---5*[{foo}]
         | `-svlogd
         |-runsv-+-baz---25*[{baz}]
         | `-svlogd
         `-runsv-+-bar---20*[{bar}]
                 `-svlogd

I'm unsure what causes this rare problem. We only do simple things
with the runit: sv {t,d,u}. When we deploy services, we rsync a
directory from elsewhere on the box into /etc/services/<name> and then
'sv t <name>'. That source dir only has ./run, ./finish, and
./log/run.

Any ideas of what we might be doing wrong, or how to otherwise avoid
this issue? Or if not, what I could do to further debug?

Sorry for the long email; I wanted to be thorough in my description
and avoid making assumptions about what could be causing this problem.

Thanks,
Caleb Spare
Received on Sat Jul 26 2014 - 05:32:35 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC