PID files are harmful

It has been over fifteen years I wanted to write about this, and the situation is arguably still the same as it was…

The first times I realised the issues with PID files and found solutions against them was around 2000-2002, with RedHat Linux 6-8.

The last draft of this article is from 13 January 2015 (damned, if I were paid to procrastinate I would rule the world).

PID files are commonly used by initialization scripts as a convenience for operations on the processes they control. However they often do a poor job.

In the same vein, some says that init scripts are harmul. In my opinion, as soon a PID file is involved, the solution is doomed to fail.

When does they fail

Ok, let’s take a few real-life examples on modern (as the time of writing [2015]), mainstream Linux distributions.

Syslog-ng on CentOS / RHEL 6

Note

Notice the difference between RedHat Linux 6 (code name Hedwig) from 1999 and RedHat Entreprise Linux 6 (code name Santiago) from 2010 (^_^).

The configuration file was edited and the daemon reloaded, but it did not seems to be effective.

The daemon was then restarted, as some daemon do not reload everything immediately, but the issue stayed.

When listing the processes (ps(1)!) a few instances of the daemon where seen running concurrently.

A few instances? Even when the daemon was stopped?

For some reason, someday, the init scripts overwrited the PID file.

And it seems that when more than one instance of syslog-ng is running, the first one is keeping the hand on the sockets, the others failing silently to bind.

crond

I remember a few occasions when some scheduled scripts where executed more than once, sometimes conflicting together (especially when locks where involved).

Yes, it was simply caused by a rogue crond instance…

But often a simple problem takes too much time to investigate.

Alternatives

Many programs does not need a freaking PID file to know they are alive and to reload / stop; comming to my mind in no particular order are Postfix and nginx.

Binding to a socket

When the daemon is binding to a (Unix or IP) socket, it should “know” something is running here (that’s the main reason the Syslog-ng on CentOS / RHEL 6 is a WTF).