I recently found that ksh will deliberately crash if the last command in a script crashes. Other shells just set the appropriate exit code, but ksh sends itself the same signal that its child received. This can confuse the unwary (i.e. me) when trying to track down what crashed with abrt logs.
While investigating reports of random and unpredictable process crashes, I noticed that /bin/ksh93
was crashing far more often than I’d expect; it’s a shell, and I don’t expect shells ever to crash.
What I found was lots of these:
Mar 20 17:01:49 myhost abrt[31006]: Saved core dump of pid 30950 (/bin/ksh93) to /var/spool/abrt/ccpp-2017-03-20-17:01:49-30950 (1126400 bytes)
The backtrace in the core file looked like this:
#0 0x000000326e2328c7 in kill () from /lib64/libc.so.6 #1 0x000000000041ab99 in sh_done (ptr=0x76e420, sig=7) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/fault.c:664 #2 0x00000000004079eb in exfile (shp=0x76e420, iop=0x2aaed9ff6ca0, fno=11) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/main.c:602 #3 0x0000000000407e70 in sh_main (ac=, av=0x7fff6701bdc8, userinit= ) at /usr/src/debug/ksh-20120801/src/cmd/ksh93/sh/main.c:353 #4 0x000000326e21ed1d in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000406b39 in _start ()
That sig=7
stuck out. I knew because I’ve been doing this far too long that signal 7 is SIGBUS
, so there was some special handling going on. To cut a long story short: sh_done()
gets called when the shell exits, figures out if its child died from a signal, and then does—
signal(sig,SIG_DFL); sigrelease(sig); kill(getpid(),sig);
It makes sure that the signal isn’t masked or caught, then it sends that signal to itself. In other words, if the child died of a SIGBUS, those three lines ensure that the shell goes the same way.
This effect can be seen clearly in an strace. First here’s what other shells do:
26032 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=26032, si_uid=40274} --- 26032 +++ killed by SIGSEGV (core dumped) +++ 26031 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], 0, NULL) = 26032 ... 26031 exit_group(139) = ? 26031 +++ exited with 139 +++
…and here’s ksh’s behaviour:
26870 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_TKILL, si_pid=26870, si_uid=40274} --- 26870 +++ killed by SIGSEGV (core dumped) +++ 26869 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WSTOPPED|WCONTINUED, NULL) = 26870 ... 26869 rt_sigaction(SIGSEGV, {SIG_DFL, [], SA_RESTORER|SA_INTERRUPT, 0x3b70832660}, {SIG_DFL, [], SA_RESTORER|SA_INTERRUPT, 0x3b70832660}, 8) = 0 26869 rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0 26869 rt_sigprocmask(SIG_UNBLOCK, [SEGV], NULL, 8) = 0 26869 kill(26869, SIGSEGV) = 0 26869 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_USER, si_pid=26869, si_uid=40274} --- 26869 +++ killed by SIGSEGV (core dumped) +++
I tried bash, dash, and tcsh, and none of them exhibit the same behaviour. I couldn’t find any documentation on this quirk of ksh so I hope this post is helpful.
Leave a Reply