On 07/06/2016 13:23, Vasil Yordanov wrote:
> Unfortunately I'm not able to reproduce the problem.
> I decided to upgrade the Docker from 1.9 to 1.10, because 1.9 does not
> support mount of tmpfs.
> After that upgrade the s6-rc-init does not hang. I even wrote a script to
> continuously test it.
> For now its works for me, but if in future this happens I will create a
> more detailed strace to trace the forks.
Thanks. I will appreciate that.
I've also been unable to reproduce the hang; I have triple-checked the code
path and it definitely looks correct, so I have no idea what's happening.
s6-rc-init, among other things, copies service directories from the service
database to the live working copy (adding ./down files to the copies because
the services start down). It starts a s6-ftrigrd process to listen to
"s6-supervise has started" messages on all the service directories, then
it links the servicedirs into the scandir and sends a "rescan" message to
s6-svscan.
At this point s6-svscan sees the new servicedirs and spawns s6-supervise
processes on it. When a s6-supervise spawns, it notifies the listening
s6-ftrigrd process.
s6-rc-init waits until a notification has been received from all the service
directories, then exits.
What is happening when you hang is that s6-rc-init is not receiving all the
notifications - one or more are missing, and s6-rc-init will not exit until
it has gotten it.
Either a s6-supervise process is failing to start, or it has already been
started by the time the s6-ftrigrd listens to it, but since it's boot time
the scandir is empty so that's not it (unless you have given the same name
to one of your early services and one of your s6-rc longruns - don't do
that!)
If you have kept logs from your catch-all logger from the time s6-rc-init
hanged, it would be a good idea to check them - one s6-supervise might be
dying repeatedly and the catch-all logs would say so.
--
Laurent
Received on Tue Jun 07 2016 - 12:12:45 UTC