Thread: BUG #17226: hot standby mode, create database and drop database cause standby restrat failed
BUG #17226: hot standby mode, create database and drop database cause standby restrat failed
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17226 Logged by: Bo Chen Email address: bchen90@163.com PostgreSQL version: 11.8 Operating system: Suse linux 11.3 x86_64 Description: Hi, all Recently, I got a FATAL while restarts standby, which can be reproduced by the following steps, based on pg 11: 1. create a database test 2. create another database test1 using test as template 3. drop database test 4. stop standby using 'pg_ctl stop -mi' 5. start standby, start failed 6. stop primary using 'pg_ctl stop -mi' 7. start primary, start OK The starting log for standby failed is: 2021-10-13 15:57:43.294 CST startup 2420219 1/0 0 %LOG: 00000: redo starts at 0/6000024 xlog.c (7043 StartupXLOG) 2021-10-13 15:57:43.297 CST startup 2420219 1/0 0 %FATAL: 58P01: could not open directory "base/16385": No such file or directory fd.c (2714 ReadDirExtended) 2021-10-13 15:57:43.297 CST startup 2420219 1/0 0 %CONTEXT: WAL redo at 0/7001200 for Database/CREATE: copy dir 16385/1663 to 16386/1663 2021-10-13 15:57:43.298 CST postmaster 2420180 0 %LOG: 00000: startup process (PID 2420219) exited with exit code 1 postmaster.c (3772 LogChildExit) I think the reason for the above error is as follows: For standby, the redo of XLOG_DBASE_DROP and XLOG_DBASE_CREATE just remove and copy the databse directory but not peform checkpoint like create or drop database in primary. So, when we restart, it redo from the log of create database but now the directory of the source database has been deleted, so redo failed. So, do we need a force chechpoint in the dbase_redo as we do for create database or drop database. Hope to get your reply Thanks & Best Regard
Re: BUG #17226: hot standby mode, create database and drop database cause standby restrat failed
From
Masahiko Sawada
Date:
On Wed, Oct 13, 2021 at 6:49 PM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 17226 > Logged by: Bo Chen > Email address: bchen90@163.com > PostgreSQL version: 11.8 > Operating system: Suse linux 11.3 x86_64 > Description: > > Hi, all > > Recently, I got a FATAL while restarts standby, which can be reproduced by > the following steps, based on pg 11: > 1. create a database test > 2. create another database test1 using test as template > 3. drop database test > 4. stop standby using 'pg_ctl stop -mi' > 5. start standby, start failed > 6. stop primary using 'pg_ctl stop -mi' > 7. start primary, start OK > > The starting log for standby failed is: > > 2021-10-13 15:57:43.294 CST startup 2420219 1/0 0 %LOG: 00000: redo starts > at 0/6000024 xlog.c (7043 StartupXLOG) > 2021-10-13 15:57:43.297 CST startup 2420219 1/0 0 %FATAL: 58P01: could not > open directory "base/16385": No such file or directory fd.c (2714 > ReadDirExtended) > 2021-10-13 15:57:43.297 CST startup 2420219 1/0 0 %CONTEXT: WAL redo at > 0/7001200 for Database/CREATE: copy dir 16385/1663 to 16386/1663 > 2021-10-13 15:57:43.298 CST postmaster 2420180 0 %LOG: 00000: startup > process (PID 2420219) exited with exit code 1 postmaster.c (3772 > LogChildExit) > > > I think the reason for the above error is as follows: > For standby, the redo of XLOG_DBASE_DROP and XLOG_DBASE_CREATE just remove > and copy the databse directory but not peform checkpoint like create or drop > database in primary. So, when we restart, it redo from the log of create > database but now the directory of the source database has been deleted, so > redo failed. Thank you for the report! I could reproduce this issue. If WAL records are produced in the following order and the standby restarts the crash recovery from #2 after replaying all records, the replay of #3 fails due to missing the database directory (base/16384) used as a template: 1. CREATEDB copy 1/1663 to 16384/1663 2. CHECKPOINT (and becomes the restart point for the standby) 3. CREATEDB copy 16384/1663 to 16385/1663 4. DROPDB 16384/1663 This issue is related to the discussion on pgsql-hackers[1] and the patch is already proposed[2]. I’ve applied v12-0003 patch and confirmed that it fixes this issue as well (although not looked at how to fix this issue). So probably this issue can be fixed together with that issue. Regards, [1] https://www.postgresql.org/message-id/CAEET0ZGx9AvioViLf7nbR_8tH9-%3D27DN5xWJ2P9-ROH16e4JUA%40mail.gmail.com [2] https://www.postgresql.org/message-id/SA1PR05MB85821DEC48A6841277BDBF57C3F29%40SA1PR05MB8582.namprd05.prod.outlook.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/