Re: Race conditions in 019_replslot_limit.pl - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Race conditions in 019_replslot_limit.pl |
Date | |
Msg-id | CAD21AoCmphWmVNWHa3uPDF=12z3z8yqQnBMpfSGVb+kGkvux2A@mail.gmail.com Whole thread Raw |
In response to | Re: Race conditions in 019_replslot_limit.pl (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Race conditions in 019_replslot_limit.pl
Re: Race conditions in 019_replslot_limit.pl |
List | pgsql-hackers |
On Wed, Feb 16, 2022 at 2:26 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 15 Feb 2022 15:51:57 -0800, Andres Freund <andres@anarazel.de> wrote in > > I think what happened is that there was no WAL to receive between the start of > > the primary and the $node_primary3->wait_for_catchup($node_standby3); > > > > Because the slot is created without reserving WAL that allows the primary to > > remove the WAL segments without ever creating a slot based conflict. I think > > that should be fixable by reserving the slot at creation time? > > Agreed. Doing this att all slot creation seems fine. > > > > and: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2022-02-15%2011%3A00%3A08 > > > > > > # Failed test 'have walsender pid 3682154 > > > # 3682136' > > > # at t/019_replslot_limit.pl line 335. > > > # '3682154 > > > # 3682136' > > > # doesn't match '(?^:^[0-9]+$)' > > > > > > The latter looks like there are two walsenders active, which confuses the > > > test. > > > > Too bad there's no plain pid in the log_line_prefix. Looks like that used to be the > > buildfarm default, and I haven't fixed that animals configuration... > > > > %c apparently is hex(process startup time).hex(pid) in hex, so we're looking > > for 382f58... Confirmed by the slot name: > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:5] LOG: received replication command: CREATE_REPLICATION_SLOT "pg_basebackup_3682136"TEMPORARY PHYSICAL ( RESERVE_WAL) > > which pg_basebackup builds using the backend pid: > > replication_slot = psprintf("pg_basebackup_%d", (int) PQbackendPID(param->bgconn)); > > > > The logs for that pid are: > > 2022-02-15 12:10:20.873 CET [620b8a1c.382f58:1] LOG: connection received: host=[local] > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:2] LOG: replication connection authorized: user=bf application_name=019_replslot_limit.pl > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:3] LOG: received replication command: SHOW data_directory_mode > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:4] STATEMENT: SHOW data_directory_mode > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:5] LOG: received replication command: CREATE_REPLICATION_SLOT "pg_basebackup_3682136"TEMPORARY PHYSICAL ( RESERVE_WAL) > > 2022-02-15 12:10:20.874 CET [620b8a1c.382f58:6] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_3682136" TEMPORARYPHYSICAL ( RESERVE_WAL) > > 2022-02-15 12:10:20.875 CET [620b8a1c.382f58:7] LOG: received replication command: IDENTIFY_SYSTEM > > 2022-02-15 12:10:20.875 CET [620b8a1c.382f58:8] STATEMENT: IDENTIFY_SYSTEM > > 2022-02-15 12:10:20.875 CET [620b8a1c.382f58:9] LOG: received replication command: START_REPLICATION SLOT "pg_basebackup_3682136"0/600000 TIMELINE 1 > > 2022-02-15 12:10:20.875 CET [620b8a1c.382f58:10] STATEMENT: START_REPLICATION SLOT "pg_basebackup_3682136" 0/600000TIMELINE 1 > > > > Even though the node has log_disconnect = true, and other processes indeed log > > their disconnection, there's no disconnect for the above session until the > > server is shut down. Even though pg_basebackup clearly finished? Uh, huh? > > It seems to me so, too. > > > I guess it's conceivable that the backend was still working through process > > shutdown? But it doesn't seem too likely, given that several other connections > > manage to get through entire connect / disconnect cycles? > > Yes, but since postmaster seems thinking that process is gone. Or it's possible that the process took a time to clean up the temporary replication slot? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: