BUG #17389: pg_repack creates race conditions on streaming replicas - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17389: pg_repack creates race conditions on streaming replicas
Date
Msg-id 17389-c866083cf152593e@postgresql.org
Whole thread Raw
Responses Re: BUG #17389: pg_repack creates race conditions on streaming replicas  (Andres Freund <andres@anarazel.de>)
Re: BUG #17389: pg_repack creates race conditions on streaming replicas  (Nick Cleaton <nick@cleaton.net>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17389
Logged by:          Ben Chobot
Email address:      bench@silentmedia.com
PostgreSQL version: 12.9
Operating system:   Linux (Ubuntu)
Description:

We've noticed that at least since 9.5, running pg_repack causes a race
conditions on our streaming replicas, but _not_ on the primary where
pg_repack is running. This manifests itself as a client briefly unable to
open the relation getting repacked - but, in our testing and experience,
only on the replica. I would blame pg_repack - its whole purpose for being
is to transparently remake tables, and quite possibly it got some of the
details wrong - except that if its behavior appears atomic to clients on the
primary, then surely it should on the replicas too?

Using these steps below, I can reliably get the client on the replica to
have an OID error within 30 minutes. The same steps fail to generate an
error when I query in a loop on the primary.

1. Put some data in a table with a single btree index on a primary db:
create table public.simple_test (id int primary key);
insert into public.simple_test(id) (select generate_series(1,1000));

2. Set up streaming replication to a secondary db.

3. Add pg_repack
create extension pg_repack ;

4. In a loop on the primary, have pg_repack repack the indices of that
table:
while `true`; do psql -d canvas -tAc "select now(),relfilenode from pg_class
where relname='simple_test_pkey'" >> log;
/usr/lib/postgresql/12/bin/pg_repack -d canvas -x -t public.simple_test;
done

5. In a loop on the secondary, have psql query the secondary db for an
indexed value of that table:
while `true`; do psql -tAc "select count(*),now() from simple_test where
id='3'" || break; done; date


pgsql-bugs by date:

Previous
From: B Ganesh Kishan
Date:
Subject: RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER
Next
From: Andres Freund
Date:
Subject: Re: BUG #17389: pg_repack creates race conditions on streaming replicas