Btw. I solved today a problem that brought down a complex system for a whole day by setting a single sysctl-value.

I re-installed a bunch of systems with the same function, so that they all run on the same distribution and all hell breaks loose.

I couldn’t find what the difference was to before and of course it only appeared in its whole glory after I set up all the systems.

Apparently I set that sysctl in the past and didn’t document it and didn’t put it into configuration management. I have no idea why I didn’t do that. Because I must have done it for three different distributions (don’t ask…it was a time of experimentation) and now I changed them all to yet another one but at least to the same one. And it hit hard.

One very stressful day with me trying to do a roll-back at 9 or 10 pm. And then getting up at 4am with fresh but tired eyes. At least I found the solution then.

Of course I put it into the system configuration now and told several people about it, so if something like this happens again in the future someone might remember that there is this one sysctl in this situation that will save your ass.

I am getting too old for this stuff. But hey, at least I found a solution after only a day and I didn’t even have to start reading source code, analyze a big tcpdump or do straces or whatever. But Google and ChatGPT didn’t help either. Just a sentence in the documentation I couldn’t remember. Good thing that I read it again…

In short: if you do something write documentation and make sure that you will be able to find it. And put your things in config management and commit it with a sensible commit-message. Your future self will thank you later.