Thread: BUG #13900: stop standby failed with writer process hang(happen 3 times in 2 days)
BUG #13900: stop standby failed with writer process hang(happen 3 times in 2 days)
From
amutu@amutu.com
Date:
The following bug has been logged on the website: Bug reference: 13900 Logged by: Jov Email address: amutu@amutu.com PostgreSQL version: 9.3.7 Operating system: FreeBSD 10.2 amd64 Description: I am updating my 3 database from pg9.3 to pg9.5,but may find a bug for the bgwriter of pg9.3.I can't stop all the stand by process,even for immediate stop mode and kill -9,the writer process still there,with ps state "Ds" (D Marks a process in disk (or other short term, uninterruptible) wait) .google say the only method to clean the "Ds" process is rebooting the system. truss say no info for the process,and procstat say the process is calling the poll system call in the kernel. These is the detail info: pg_ctl -D ./slave stop -m fast waiting for server to shut down............................................................... failed pg_ctl: server does not shut down psql postgres psql: FATAL: the database system is shutting down pg_ctl -D ./slave stop -m immediate waiting for server to shut down.... done server stopped ps auxwww | grep postgres jovz 976 0.0 0.3 28840 5232 - Is 17 116 0:00.04 postgres: logger process (postgres) jovz 979 0.0 0.7 196940 13552 - Ds 17 116 0:06.03 postgres: writer process (postgres) log: 2016-01-30 14:23:22.350 CST,,,947,,569b1bc2.3b3,3,,2016-01-17 12:42:42 CST,,0,LOG,00000,"received fast shutdown request",,,,,,,,,"" 2016-01-30 14:23:22.350 CST,,,947,,569b1bc2.3b3,4,,2016-01-17 12:42:42 CST,,0,LOG,00000,"aborting any active transactions",,,,,,,,,"" 2016-01-30 14:25:35.271 CST,,,64815,"",56ac575f.fd2f,1,"",2016-01-30 14:25:35 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"" 2016-01-30 14:25:35.274 CST,"jovz","f",64815,"[local]",56ac575f.fd2f,2,"",2016-01-30 14:25:35 CST,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"" 2016-01-30 14:25:38.324 CST,,,64817,"",56ac5762.fd31,1,"",2016-01-30 14:25:38 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"" 2016-01-30 14:25:38.324 CST,"jovz","f",64817,"[local]",56ac5762.fd31,2,"",2016-01-30 14:25:38 CST,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"" 2016-01-30 14:47:36.727 CST,,,65457,"",56ac5c88.ffb1,1,"",2016-01-30 14:47:36 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,,"" 2016-01-30 14:47:36.727 CST,"jovz","postgres",65457,"[local]",56ac5c88.ffb1,2,"",2016-01-30 14:47:36 CST,,0,FATAL,57P03,"the database system is shutting down",,,,,,,,,"" 2016-01-30 14:50:04.564 CST,,,947,,569b1bc2.3b3,5,,2016-01-17 12:42:42 CST,,0,LOG,00000,"received immediate shutdown request",,,,,,,,,"" truss -p 979 ^Ctruss: Unexpect stop in waitpid: Interrupted system call root@fblax:~ # procstat -kk 979 PID TID COMM TDNAME KSTACK 979 100688 postgres - mi_switch+0xe1 sleepq_timedwait_sig+0x8b _cv_timedwait_sig_sbt+0x18b seltdwait+0xa4 kern_poll+0x464 sys_poll+0x61 amd64_syscall+0x357 Xfast_syscall+0xfb root@fb:~ # kill -9 979 root@fb:~ # procstat -kk 979 PID TID COMM TDNAME KSTACK 979 100688 postgres - mi_switch+0xe1 sleepq_timedwait_sig+0x8b _cv_timedwait_sig_sbt+0x18b seltdwait+0xa4 kern_poll+0x464 sys_poll+0x61 amd64_syscall+0x357 Xfast_syscall+0xfb
Re: BUG #13900: stop standby failed with writer process hang(happen 3 times in 2 days)
From
Kevin Grittner
Date:
On Sat, Jan 30, 2016 at 8:13 AM, <amutu@amutu.com> wrote: > I am updating my 3 database from pg9.3 to pg9.5,but may find a bug for the > bgwriter of pg9.3.I can't stop all the stand by process,even for immediate > stop mode and kill -9,the writer process still there,with ps state "Ds" (D > Marks a process in disk (or other short term, uninterruptible) wait) .google > say the only method to clean the "Ds" process is rebooting the system. > truss say no info for the process,and procstat say the process is calling > the poll system call in the kernel. There is no way for PostgreSQL to cause this. In all cases where I have personally seen such behavior there was either failing storage hardware or a driver with a bug (which I was always able to fix with the appropriate firmware or driver upgrade). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #13900: stop standby failed with writer process hang(happen 3 times in 2 days)
From
Jov
Date:
this happens on 2 machines,and both use zfs,scrub say no error. By now 2 machines,3 standby instances,I will try to reproduce it on linux. 2016=E5=B9=B41=E6=9C=8830=E6=97=A5 3:22 PM=EF=BC=8C"Kevin Grittner" <kgritt= n@gmail.com>=E5=86=99=E9=81=93=EF=BC=9A > On Sat, Jan 30, 2016 at 8:13 AM, <amutu@amutu.com> wrote: > > > I am updating my 3 database from pg9.3 to pg9.5,but may find a bug for > the > > bgwriter of pg9.3.I can't stop all the stand by process,even for > immediate > > stop mode and kill -9,the writer process still there,with ps state "Ds" > (D > > Marks a process in disk (or other short term, uninterruptible) wait) > .google > > say the only method to clean the "Ds" process is rebooting the system. > > truss say no info for the process,and procstat say the process is calli= ng > > the poll system call in the kernel. > > There is no way for PostgreSQL to cause this. In all cases where I have > personally seen such behavior there was either failing storage hardware o= r > a > driver with a bug (which I was always able to fix with the appropriate > firmware > or driver upgrade). > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: BUG #13900: stop standby failed with writer process hang(happen 3 times in 2 days)
From
John R Pierce
Date:
On 1/29/2016 11:35 PM, Jov wrote: > this happens on 2 machines,and both use zfs,scrub say no error. I've used various versions of postgres on both solaris 10 and freebsd 9.3 (actually FreeNAS, postgresql in a jail) using zfs, without any such problems. do you have a reproducable case? -- john r pierce, recycling bits in santa cruz