man 7 bootparam
tells us that this
can be achieved by giving the parameter
panic=N
(to be put in the ``kernel'' line
in /boot/grub/menu.lst
(on gentoo).
This ensures that the kernel will reboot (after N secs) upon panic.
The same effect can be achieved by
echo N >/proc/sys/kernel/panic
(I'm not sure whether that is really needed for the watchdog daemon, but it won't hurt either).
/dev/watchdog
device in
the kernel. This can be done in make menuconfig:
select character devices (``watchdog cards''), turn on
the main option and software watchdog (the ``softdog'' driver).
I don't want to bother with modules so I just added the driver to the
kernel (after testing as modules).
Note that activating the softdog driver does not force you
to have a watchdog daemon to ``pet the dog''. For that, the
/dev/watchdog
file must be opened first.
There are 2 possible parameters for the softdog driver as appears from the code snippet below:
#define TIMER_MARGIN 60 /* (secs) Default is 1 minute */ static int soft_margin = TIMER_MARGIN; /* in seconds */ #ifdef ONLY_TESTING static int soft_noboot = 1; #else static int soft_noboot = 0; #endif /* ONLY_TESTING */ MODULE_PARM(soft_margin,"i"); MODULE_PARM(soft_noboot,"i");Thus, without any parameters, the machine will halt if the file
/dev/watchdog
, when open, has not been written to
in 60 secs.
emerge watchdog
does
the lot. If the device /dev/watchdog
does not
appear automatically, you can create it using
mknod /dev/watchdog c 10 130Since I'd like to know what was going on when the machine got stuck, I specified a user-defined test program
/usr/sbin/watchdog-user-test
in
/etc/watchdog/watchdog.conf
which is reproduced
below.
#ping = 172.31.14.1 #ping = 172.26.1.255 #interface = eth0 #file = /var/log/messages #change = 1407 # Uncomment to enable test. Setting one of these values to '0' disables it. # These values will hopefully never reboot your machine during normal use # (if your machine is really hung, the loadavg will go much higher than 25) #max-load-1 = 24 #max-load-5 = 18 #max-load-15 = 12 # Note that this is the number of pages! # To get the real size, check how large the pagesize is on your machine. #min-memory = 1 repair-binary = /usr/sbin/repair test-binary = /usr/sbin/watchdog-user-test #watchdog-device = /dev/watchdog # Defaults compiled into the binary #temperature-device = #max-temperature = 120 # Defaults compiled into the binary #admin = root #interval = 10 #logtick = 1 # This greatly decreases the chance that watchdog won't be scheduled before # your machine is really loaded realtime = yes priority = 1 # Check if syslogd is still running by enabling the following line #pidfile = /var/run/syslogd.pidHere is
/usr/sbin/watchdog-user-test
:
#!/bin/sh # { date; ps -efl; } >/var/log/watchdog-user-test-outputFinally, to have the watchdog daemon start at boot time, it suffices (in gentoo) to do
rc-update add watchdog default
.