Re: base backup vs. concurrent truncation - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: base backup vs. concurrent truncation |
Date | |
Msg-id | CAJ7c6TMGG=7AYkPtmo7jagwXS4_R=hDoOUejusnDLfB6j_fURQ@mail.gmail.com Whole thread Raw |
In response to | Re: base backup vs. concurrent truncation (Andres Freund <andres@anarazel.de>) |
Responses |
Re: base backup vs. concurrent truncation
|
List | pgsql-hackers |
Hi, > I think that to reproduce the scenario, you want the truncate to happen in > its own checkpoint cycle. OK, let's try this again. In order to effectively disable the checkpointer I added the following lines to postgresql.conf: ``` checkpoint_timeout = 3600 max_wal_size = 100G ``` I'm also keeping an eye on `logfile` in order to make sure the system doesn't do anything unexpected. Then: ``` -- Just double-checking show checkpoint_timeout; checkpoint_timeout -------------------- 1h show max_wal_size; max_wal_size -------------- 100GB create table truncateme(id integer, val varchar(1024)); alter table truncateme set (autovacuum_enabled = off); select relfilenode from pg_class where relname = 'truncateme'; relfilenode ------------- 16385 -- takes ~30 seconds insert into truncateme select id, ( select string_agg(chr((33+random()*(126-33)) :: integer), '') from generate_series(1,1000) ) from generate_series(1,2*1024*1024) as id; delete from truncateme where id > 1024*1024; select count(*) from truncateme; count --------- 1048576 -- Making a checkpoint as pg_basebackup would do. -- Also, making sure truncate will happen in its own checkpoint cycle. checkpoint; ``` Again I see 3 segments: ``` $ ls -lah 16385* -rw------- 1 eax eax 1.0G May 1 19:24 16385 -rw------- 1 eax eax 1.0G May 1 19:27 16385.1 -rw------- 1 eax eax 293M May 1 19:27 16385.2 -rw------- 1 eax eax 608K May 1 19:24 16385_fsm ``` Making a backup of .2 as if I'm pg_basebackup: ``` cp 16385.2 ~/temp/16385.2 ``` Truncating the table: ``` vacuum truncateme; ``` ... and killing postgres: ``` $ pkill -9 postgres ``` Now I see: ``` $ ls -lah 16385* -rw------- 1 eax eax 1.0G May 1 19:30 16385 -rw------- 1 eax eax 147M May 1 19:31 16385.1 -rw------- 1 eax eax 0 May 1 19:31 16385.2 -rw------- 1 eax eax 312K May 1 19:31 16385_fsm -rw------- 1 eax eax 40K May 1 19:31 16385_vm $ cp ~/temp/16385.2 ./16385.2 ``` Starting postgres: ``` LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-11.3.0, 64-bit LOG: listening on IPv4 address "0.0.0.0", port 5432 LOG: listening on IPv6 address "::", port 5432 LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" LOG: database system was interrupted; last known up at 2023-05-01 19:27:22 MSK LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/8AAB36B0 LOG: invalid record length at 0/CE9BDE60: expected at least 24, got 0 LOG: redo done at 0/CE9BDE28 system usage: CPU: user: 6.51 s, system: 2.45 s, elapsed: 8.97 s LOG: checkpoint starting: end-of-recovery immediate wait LOG: checkpoint complete: wrote 10 buffers (0.0%); 0 WAL file(s) added, 0 removed, 68 recycled; write=0.026 s, sync=1.207 s, total=1.769 s; sync files=10, longest=1.188 s, average=0.121 s; distance=1113129 kB, estimate=1113129 kB; lsn=0/CE9BDE60, redo lsn=0/CE9BDE60 LOG: database system is ready to accept connections ``` ``` $ ls -lah 16385* -rw------- 1 eax eax 1.0G May 1 19:33 16385 -rw------- 1 eax eax 147M May 1 19:33 16385.1 -rw------- 1 eax eax 0 May 1 19:33 16385.2 -rw------- 1 eax eax 312K May 1 19:33 16385_fsm -rw------- 1 eax eax 40K May 1 19:33 16385_vm ``` ``` select count(*) from truncateme; count --------- 1048576 ``` So I'm still unable to reproduce the described scenario, at least on PG16. -- Best regards, Aleksander Alekseev
pgsql-hackers by date: