PID files are harmful

Warning

This article has not been finished…

It has been over fifteen years I wanted to write about this, and the situation is arguably still the same as it was…

The last draft of this article is from 13 January 2015; if I were paid to procrastinate I would dominate the world.

Decades ago I was already puzzled by the (over) use of PID files in “serious” distributions such as RedHat, and the problems it caused.

PID files are commonly used by initialization scripts as a convenience for operations on the processes they control. However they often do a poor job.

In the same vein, some says that init scripts are harmul; but in my opinion, as soon a PID file is involved, the solution is doomed to fail.

When does they fail

Ok, let’s take a few real-life examples on modern (as the time of writing [2015]), mainstream Linux distributions.

Syslog-ng on CentOS / RHEL 6

The configuration file was edited and the daemon reloaded, but it did not seems to be effective.

The daemon was then restarted, as some daemon do not reload everything immediately, but the issue stayed.

When listing the processes (ps(1)!) a few instances of the daemon where seen running concurrently.

A few instances? Even when the daemon was stopped?

For some reason, someday, the init scripts overwrited the PID file.

And it seems that when more than one instance of syslog-ng is running, the first one is keeping the hand on the sockets, the others failing silently to bind.

crond

I remember a few occasions when some scheduled scripts where executed more than once, sometimes conflicting together (especially when locks where involved).

Yes, it was simply caused by a rogue crond instance…

But often a simple problem takes too much time to investigate.

Alternatives

Many programs does not need a freaking PID file to know they are alive and to reload / stop; comming to my mind in no particular order are Postfix and nginx.

Binding to a socket

When the daemon is binding to a (Unix or IP) socket, it should “know” something is running here (that’s the main reason the Syslog-ng on CentOS / RHEL 6 is a WTF).