If you are using the Raspberry Pi as a server you might want to enable the built-in hardware watchdog. This will automatically reboot the machine if user space fails to periodically write to /dev/watchdog within a reasonable time.
There are two major tasks. (1) Installing and configuring program to say "hi" to the heartbeat device periodically. (2) The kernel exposing the heartbeat device to users.
The hardware is simple enough. It counts down from n down to 0, one tick per second. Upon reaching zero the hardware reboots the machine. If the hardware is tickled then it resets the countdown timer to n and the countdown begins anew.
The Linux kernel exposes the hardware countdown timer as /dev/watchdog. The convention is that the countdown takes 60 seconds.
The Raspberry Pi watchdog runs for less time than this, between 1 and 14 seconds. That's understandable, at its heart the Raspberry Pi has a mobile phone CPU and no one is going to look at the blank screen for a minute wondering what will happen.
User space heart beat - Raspbian
The watchdog(8) daemon is the simplest way for Raspbian to periodically tickle /dev/watchdog.
sudo apt-get install watchdog update-rc.d watchdog enable
The watchdog daemon requires some configuration on the Raspberry Pi. Edit /etc/watchdog.conf to contain only:
watchdog-device = /dev/watchdog watchdog-timeout = 14 realtime = yes priority = 1
If you want the daemon to consume less CPU you can extend the interval between heart beats. Four seconds still gives three chances per fourteen second interval:
interval = 4
User space heartbeat - systemd
The great simplification of system utilities by systemd encompasses watchdog timers too. Edit /etc/systemd/system.conf and set:
RuntimeWatchdogSec=14
Kernel watchdog device
Configure the kernel to expose the watchdog device. Set the parameters to the kernel module by creating a new file /etc/modprobe.d/bcm2708_wdog.conf containing:
alias char-major-10-130 bcm2708_wdog alias char-major-10-131 bcm2708_wdog options bcm2708_wdog heartbeat=14 nowayout=1
The periodic writes from user space are called "heart beats". The heartbeat parameter to the kernel module is the maximum gap between heartbeats seen by the device before the hardware reboots. On the Raspberry Pi this gap can be as large as 14 seconds. That's substantially less than the common value of 60 seconds.
The nowayout parameter determines what happens when the /dev/watchdog device is closed: is a heartbeat still expected or not? A value of 0 says that no further heart beats are expected. So if the process writing the heartbeats fails then the machine will not reboot, even if that process failing is a sign that the machine is in a poor way. A value of 1 says that the countdown to a reboot keeps running and if the device is not reopened and a heartbeat written then the machine will reboot. The Raspberry Pi does not remove power to itself when halted. So setting nowayout=1 will reboot the Raspberry Pi about 14 seconds after the completion of shutdown -h now.
Normally we would put the module name into /etc/modules, but what if starting the system takes longer than the fourteen seconds available? Rather than risk a continual reboot we should let udev load the module the first time something opens /dev/watchdog. Unfortunately I can't figure out how to do that in this case :-(
The second-best option is to install the module just before it is used. The watchdog daemon on Debian allows for this in /etc/default/watchdog:
watchdog_module="bcm2708_wdog"
Start watchdog service
This will all take effect at the next reboot, or kick it off without interrupting service with:
sudo modprobe bcm2708_wdog sudo service watchdog start
Check watchdog service
Check operation in the system log. Here is the module activating /dev/watchdog:
bcm2708 watchdog, heartbeat=14 sec (nowayout=1)
Here is the start of the watchdog daemon which writes the heart beats:
watchdog[]: starting daemon (5.12): watchdog[]: int=4s realtime=yes sync=no soft=no mla=0 mem=0 watchdog[]: ping: no machine to check watchdog[]: file: no file to check watchdog[]: pidfile: no server process to check watchdog[]: interface: no interface to check watchdog[]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no watchdog[]: hardware wartchdog identity: BCM2708
no subject
Date: 2013-06-14 03:54 (UTC)my watchdog seems to work in that if I kill the watchdog process, it reboots, but if I halt the pi, it doesn't.
any ideas?
Jun 14 13:11:17 raspberrypi kernel: [ 29.656998] bcm2708 watchdog, heartbeat=14 sec (nowayout=1)
Jun 14 13:11:18 raspberrypi wd_keepalive[2378]: starting watchdog keepalive daemon (5.12):
Jun 14 13:11:18 raspberrypi wd_keepalive[2378]: int=4 alive=/dev/watchdog realtime=yes
Jun 14 13:11:18 raspberrypi wd_keepalive[2378]: hardware wartchdog identity: BCM2708
Jun 14 13:11:18 raspberrypi wd_keepalive[2378]: unable to disable oom handling!
Jun 14 13:11:22 raspberrypi kernel: [ 34.318369] wdt: WDT device closed unexpectedly. WDT will not stop!
Jun 14 13:11:22 raspberrypi wd_keepalive[2378]: stopping watchdog keepalive daemon (5.12)
Jun 14 13:45:54 raspberrypi watchdog[2551]: starting daemon (5.12):
Jun 14 13:45:54 raspberrypi watchdog[2551]: int=4s realtime=yes sync=no soft=no mla=24 mem=1
Jun 14 13:45:54 raspberrypi watchdog[2551]: ping: no machine to check
Jun 14 13:45:54 raspberrypi watchdog[2551]: file: /var/log/syslog:0
Jun 14 13:45:54 raspberrypi watchdog[2551]: pidfile: no server process to check
Jun 14 13:45:54 raspberrypi watchdog[2551]: interface: no interface to check
Jun 14 13:45:54 raspberrypi watchdog[2551]: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no
Jun 14 13:45:54 raspberrypi watchdog[2551]: hardware wartchdog identity: BCM2708
root@raspberrypi:~# egrep -v '#|^$' /etc/watchdog.conf
file = /var/log/syslog
max-load-1 = 24
max-load-5 = 18
max-load-15 = 12
min-memory = 1
watchdog-device = /dev/watchdog
watchdog-timeout = 14
realtime = yes
priority = 1
interval = 4
root@raspberrypi:~# cat /etc/modprobe.d/bcm2708_wdog.conf
alias char-major-10-130 bcm2708_wdog
alias char-major-10-131 bcm2708_wdog
options bcm2708_wdog heartbeat=14 nowayout=1
no subject
Date: 2014-05-28 15:28 (UTC)up to date software, latest pi firmware
no subject
Date: 2013-08-31 18:07 (UTC)Thanks for these excellent instructions! I am constantly grateful when doing development at finding posts like yours!
no subject
Date: 2013-10-06 13:55 (UTC)Thanks for the detailed info.
I found that in order to get mine to work, I had to change permissions on /etc/watchdog
no subject
Date: 2014-02-14 10:04 (UTC)thank you for ones time due to this fantastic read!! I definitely liked every part of it and I have you book-marked to see
new stuff on your blog.
no subject
Date: 2014-02-19 23:39 (UTC)capable of simply understand it, Thanks a lot.
no subject
Date: 2014-02-23 06:32 (UTC)