BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17744: Fail Assert while recoverying from pg_basebackup
Date
Msg-id 17744-2c95e2b7783d7232@postgresql.org
Whole thread Raw
Responses Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17744
Logged by:          bowenshi
Email address:      zxwsbg12138@gmail.com
PostgreSQL version: 15.1
Operating system:   centos
Description:

Dears,

There may be some problems in recovery. The following step can stably
reproducing the problem:

Firstly, run following script in master. To make sure that we have at least
20GB data.
1. create table t(a int);
2. echo "insert into t select generate_series(1,5000);">script.sql
3. pgbench --no-vacuum --client=25 -U postgres --transactions=10000 --file
script.sql  
 
Secondly, using pg_basebackup with stream mode
1. pg_basebackup --checkpoint=fast -h localhost -U postgres -p 5432  -Xs -Ft
 -v -P -D /data2/sqpg/inst/data_b
2. pgbench --no-vacuum --client=25 -U postgres --transactions=3000 --file
script.sql   (Run this script concurrently during pg_basebackup)

Thirdly, we start the backup instance
cd /data2/sqpg/inst/data_b
tar xvf base.tar
mv pg_wal.tar pg_wal/
cd pg_wal
tar xvf pg_wal.tar
cd ../
echo "port=5433">>postgresql.conf
echo "log_min_messages=debug1">>postgresql.conf
echo "checkpoint_timeout=30s">>postgresql.conf
cd /data2/sqpg/inst/bin
./pg_ctl start -D ../data_b -l logfile_b

Then during crash recovery, the checkpoint process will do a restart point ,
and this would fail in Assert. The stack is following
TRAP: FailedAssertion("TransactionIdIsValid(initial)", File: "procarray.c",
Line: 1750, PID: 2063)
postgres: checkpointer (ExceptionalCondition+0xb9)[0xb378bc]
postgres: checkpointer [0x962195]
postgres: checkpointer
(GetOldestTransactionIdConsideredRunning+0x14)[0x9628a3]
postgres: checkpointer (CreateRestartPoint+0x698)[0x5972bf]
postgres: checkpointer (CheckpointerMain+0x5b7)[0x8cae37]
postgres: checkpointer (AuxiliaryProcessMain+0x165)[0x8c8b01]
postgres: checkpointer [0x8d32b2]
postgres: checkpointer (PostmasterMain+0x11dd)[0x8ce559]
postgres: checkpointer [0x7d38e7]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f80352c9555]
postgres: checkpointer [0x48d1b9]
2023-01-10 07:25:42.368 UTC [2062] LOG:  checkpointer process (PID 2063) was
terminated by signal 6: Aborted

(gdb) bt
#0  0x00007f80352dd3d7 in raise () from /lib64/libc.so.6
#1  0x00007f80352deac8 in abort () from /lib64/libc.so.6
#2  0x0000000000b378e9 in ExceptionalCondition (
    conditionName=0xd13697 "TransactionIdIsValid(initial)", 
    errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",

    lineNumber=1750) at assert.c:69
#3  0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
    at procarray.c:1750
#4  0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
    at procarray.c:2050
#5  0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
#6  0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
#7  0x00000000008c8b01 in AuxiliaryProcessMain
(auxtype=CheckpointerProcess)
    at auxprocess.c:153
#8  0x00000000008d32b2 in StartChildProcess (type=CheckpointerProcess)
    at postmaster.c:5430
#9  0x00000000008ce559 in PostmasterMain (argc=3, argv=0x18149a0)
    at postmaster.c:1463
#10 0x00000000007d38e7 in main (argc=3, argv=0x18149a0) at main.c:202


pgsql-bugs by date:

Previous
From: Ron Wilson
Date:
Subject: Postgres connection growing memory usage over time! This right after the connections in the pool are closed and opened again.
Next
From: Ruslan Talpa
Date:
Subject: set_config with is_local parameter true escapes transaction boundaries