Home > mailing lists

PITR on DROP DATABASE, deleting of the database directory despite therecovery_target_time set before. - Mailing list pgsql-hackers

From	Nicolas Lutic
Subject	PITR on DROP DATABASE, deleting of the database directory despite therecovery_target_time set before.
Date	November 18, 2019 13:48:37
Msg-id	aac5c3e6-2805-ba06-143b-070478aa9434@loxodata.com Whole thread Raw
Responses	Re: PITR on DROP DATABASE, deleting of the database directory despitethe recovery_target_time set before.
List	pgsql-hackers

Tree view

Dear Hackers,

A customer reported a strange behaviour on a PITR restoration.

After a drop database, he tried to recover the data on the last inserted transaction by using the recovery_target_time.

The issue is the database is present in the system catalog but the directory was still deleted.

Here the technical information of the database

version 11

default postgresql.conf except for this options

wal_level = replica

archive_mode = on

archive_command = 'cp %p /tmp/wal_archive/%f '

log_statement = 'all'

log_min_messages = debug5

The following method was used

create cluster

create database

create 1 table

create 1 index on 1 column

insert 1 rows

backup with pg_base_backup

insert 2 rows

drop database

stop instance

found the last inserted transaction timestamp ('2019-11-13 11:49:08.413744+01') before drop database

replace $datadir by a pg_base_backup archive

edit recovery.conf

restore_command = 'cp /tmp/wal_archive/%f "%p"'

recovery_target_time = '2019-11-13 11:49:08.413744+01'

recovery_target_inclusive = true

restart cluster

I tried to understand what's happening, when we analyse the postgresql.log (log_min_message = debug5), we can see that

the recovery stopped before transaction 574 (repository database) so at transaction 573 being the last insert, but the database directory was still deleted.

2019-11-13 11:55:12.732 CET [30666] DEBUG: remove KnownAssignedXid 572

2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000178 for Transaction/COMMIT: 2019-11-13 11:49:08.248928+01

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 573 latestObservedXid 572

2019-11-13 11:55:12.732 CET [30666] CONTEXT: WAL redo at 0/30001A0 for Heap/INSERT: off 3

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 573 latestObservedXid 573

2019-11-13 11:55:12.732 CET [30666] CONTEXT: WAL redo at 0/30001F8 for Btree/INSERT_LEAF: off 3

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 573 latestObservedXid 573

2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for Transaction/COMMIT: 2019-11-13 11:49:08.413744+01

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 573 latestObservedXid 573

2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for Transaction/COMMIT: 2019-11-13 11:49:08.413744+01

2019-11-13 11:55:12.732 CET [30666] DEBUG: remove KnownAssignedXid 573

2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for Transaction/COMMIT: 2019-11-13 11:49:08.413744+01

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 574 latestObservedXid 573

2019-11-13 11:55:12.732 CET [30666] CONTEXT: WAL redo at 0/3000260 for Heap/DELETE: off 4 KEYS_UPDATED

2019-11-13 11:55:12.732 CET [30666] DEBUG: prune KnownAssignedXids to 574

2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000730 for Standby/RUNNING_XACTS: nextXid 575 latestCompletedXid 573 oldestRunningXid 574; 1 xacts: 574

2019-11-13 11:55:12.732 CET [30666] DEBUG: record known xact 574 latestObservedXid 574

2019-11-13 11:55:12.732 CET [30666] CONTEXT: WAL redo at 0/30007D8 for Database/DROP: dir 16384/1663

2019-11-13 11:55:12.738 CET [30666] LOG:  recovery stopping before commit of transaction 574, time 2019-11-13 11:49:10.683426+01

2019-11-13 11:55:12.738 CET [30666] LOG: recovery has paused

By analysing the wal file with pg_waldump

rmgr: Heap        len (rec/tot):     54/   198, tx:        572, lsn: 0/03000028, prev 0/020000F8, desc: INSERT off 2, blkref #0: rel 1663/16384/16385 blk 0 FPW

rmgr: Btree       len (rec/tot):     53/   133, tx:        572, lsn: 0/030000F0, prev 0/03000028, desc: INSERT_LEAF off 2, blkref #0: rel 1663/16384/16388 blk 1 FPW

rmgr: Transaction len (rec/tot):     34/    34, tx:        572, lsn: 0/03000178, prev 0/030000F0, desc: COMMIT 2019-11-13 11:49:08.248928 CET

rmgr: Heap        len (rec/tot):     87/    87, tx:        573, lsn: 0/030001A0, prev 0/03000178, desc: INSERT off 3, blkref #0: rel 1663/16384/16385 blk 0

rmgr: Btree       len (rec/tot):     64/    64, tx:        573, lsn: 0/030001F8, prev 0/030001A0, desc: INSERT_LEAF off 3, blkref #0: rel 1663/16384/16388 blk 1

rmgr: Transaction len (rec/tot):     34/    34, tx:        573, lsn: 0/03000238, prev 0/030001F8, desc: COMMIT 2019-11-13 11:49:08.413744 CET

rmgr: Heap        len (rec/tot):     59/  1227, tx:        574, lsn: 0/03000260, prev 0/03000238, desc: DELETE off 4 KEYS_UPDATED , blkref #0: rel 1664/0/1262 blk 0 FPW

rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn: 0/03000730, prev 0/03000260, desc: RUNNING_XACTS nextXid 575 latestCompletedXid 573 oldestRunningXid 574; 1 xacts: 574

rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn: 0/03000768, prev 0/03000730, desc: CHECKPOINT_ONLINE redo 0/3000730; tli 1; prev tli 1; fpw true; xid 0:575; oid 24576; multi 1; offset 0; oldest xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 574; online

rmgr: Database    len (rec/tot):     34/    34, tx:        574, lsn: 0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663

rmgr: Transaction len (rec/tot):     66/    66, tx:        574, lsn: 0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426 CET; inval msgs: catcache 21; sync

We notice that the following log

rmgr: Database    len (rec/tot):     34/    34, tx:        574, lsn: 0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663

is executed between the last commit that we are interested in and the next record with a timestamp

rmgr: Transaction len (rec/tot):     66/    66, tx:        574, lsn: 0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426 CET; inval msgs: catcache 21; sync

We understand that the drop database command is not transactional but the drop dir is attached to the xact whose xid has a commit with timestime out of recovery_target_time bound.

On the other hand, DBA role is to determine which at which xact recovery should stop and define recovery_target_xid rather than recovery_target_time.

Humans are prone to use natural things such as time to define "when" to stop or start things.

We know that this rarely happens in production, because you can't drop a database if users are still connected. But with the new force drop database option, it might be a reasonable choice to improve the situation with that recovery_target_time directive.

It turns out there are two different choices we can make :

Change recovery behaviour in that case to prevent all xact operation to perform until COMMIT timestamp is checked against recovery_time bound (but it seems to be difficult as state https://www.postgresql.org/message-id/flat/20141125160629.GC21475%40msg.df7cb.de which also identifies the problem and tries to give some solutions. Maybe another way, as a trivial guess (all apologises) is to buffer immediate xacts until we have the commit for each and apply the whole buffer xact once the timestamp known (and checked agains recovery_target_time value);

The other way to improve this is to update PostgreSQL documentation by specifying that recovery_target_time cannot be used in this case. There should be multiple places where it can be stated. The best one (if only one) seems to be in https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=f83770350eda5625179526300c652f23ff29c9fe;hb=HEAD#l3400

We are willing to help on this case either with code patching or documentation improvement.

Best regards,

LOXODATA

Nicolas Lutic

pgsql-hackers by date:

From: Amit Langote
Date: 18 November 2019, 11:53:11
Subject: Re: adding partitioned tables to publications

From: Dilip Kumar
Date: 18 November 2019, 14:32:32
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

PITR on DROP DATABASE, deleting of the database directory despite therecovery_target_time set before. - Mailing list pgsql-hackers

Previous

Next