Saturday, March 21, 2020

Pi Zero W Reliability Issue

For my water softener minder project, I had a regular failure that puzzled me. Every once in a while, the Pi Zero W (running stretch) would lose communication and never regain it. I had initially assumed it was a momentary processor overloading issue. The neopixels would continue to blink away and looking at the status of the service showed that it was working fine (systemctl status <service name>), except for any comms with the network. Even ping didn't work. It didn't matter how long I waited. I am sure I saw this before during previous projects, but most of the time for my silly projects, who cares? Just restart and move on.

In this particular case, the syslogs show the problem starting around 11:49AM on 16 March - screenshots below. As you can see it starts going wrong at the application level and then complains about 120s timeouts. The result is all IP comms is brought to a halt. It seemed like a bit of an overreaction.
My water softener minder is pretty low CPU usage as can be seen via TOP. Eventually I remembered that I had enabled VNC Connect, just because it always amazes me that I can log into a full graphical interface on something as tiny and wimpy as a Pi Zero W. Disabling VNC Connect seems to have resolved the issue and it has been stable since then.
I would be interested in what people think about this. There is a link to a page that was helpful. I think I will work on reducing I/O, but is there anything else that I should do? I see that you could set the timeout to 0, but is that advisable? Or set it to something much larger? Or just live with what I have as it seems to work?

No comments:

Post a Comment