Thread: BUG #17846: pg_dump doesn't properly dump with paused WAL replay

BUG #17846: pg_dump doesn't properly dump with paused WAL replay

From

PG Bug reporting form

Date:

16 March 2023, 14:01:42

The following bug has been logged on the website:

Bug reference:      17846
Logged by:          Francisco Reinolds
Email address:      francisco.reinolds@channable.com
PostgreSQL version: 13.8
Operating system:   Ubuntu 22.04.2 LTS
Description:

In our setup, we have primary and secondary databases, taking advantage of
physical streaming replication to keep the secondary up to date, so that we
can perform backups from it, lessening the load on the primary.

For backups, we use pg_dump to perform a full database dump. Before we start
a backup, we pause the WAL replay on the secondary, unpausing it after it is
concluded. This was done since we previously encountered problems with
pg_dump failing when an AccessExclusiveLock was held on a table that pg_dump
was going to dump.

For some time we faced no problems with this setup, but starting some months
ago, we started witnessing sporadic failures when we attempted to restore
the dumps of one of our databases, to verify the dump's integrity. These
restore failures would occur due to a key not being present in a table:


```
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3279; 2606 203326243 FK CONSTRAINT table_2
table_2_table_1_id_fkey realtime_alembic
pg_restore: error: could not execute query: ERROR:  insert or update on
table "table_2" violates foreign key constraint "table_2_table_1_id_fkey"
DETAIL:  Key (table_1_id)=(34553066) is not present in table "table_1".
Command was: ALTER TABLE ONLY public.table_2
    ADD CONSTRAINT table_2_table_1_id_fkey FOREIGN KEY (table_1_id) REFERENCES
public.table_1(id);
```


This has been puzzling us for some time since:
- this is the only database of our infrastructure where this occurs, despite
this setup being replicated across it;
- the failures started occurring sporadically, but have become more
consistent;

We have managed, with some help from the Postgres IRC channel (special
thanks to user nickb), to work around the problem. The solution was to begin
a transaction, and extract a snapshot that'd be passed as a pg_dump
argument, and only then pause WAL replay. From our understanding, pg_dump
should already implicitly pick a suitable point to start the dump but it
apparently is not the case, hence the bug report.

Thanks for the attention,

Francisco

Re: BUG #17846: pg_dump doesn't properly dump with paused WAL replay

From

Tom Lane

Date:

16 March 2023, 18:10:57

PG Bug reporting form <noreply@postgresql.org> writes:
> For backups, we use pg_dump to perform a full database dump. Before we start
> a backup, we pause the WAL replay on the secondary, unpausing it after it is
> concluded. This was done since we previously encountered problems with
> pg_dump failing when an AccessExclusiveLock was held on a table that pg_dump
> was going to dump.

> For some time we faced no problems with this setup, but starting some months
> ago, we started witnessing sporadic failures when we attempted to restore
> the dumps of one of our databases, to verify the dump's integrity. These
> restore failures would occur due to a key not being present in a table:

I really have no idea what's going on there, but can you show the exact
pg_dump command(s) being issued?  I'm particularly curious whether you
are using parallel dump.  The same for the failing pg_restore.

Also, are all the moving parts (primary server, secondary server,
pg_dump, pg_restore) exactly the same PG version?

> We have managed, with some help from the Postgres IRC channel (special
> thanks to user nickb), to work around the problem. The solution was to begin
> a transaction, and extract a snapshot that'd be passed as a pg_dump
> argument, and only then pause WAL replay. From our understanding, pg_dump
> should already implicitly pick a suitable point to start the dump but it
> apparently is not the case, hence the bug report.

It's the other way around: the replay mechanism should not damage
any data that's visible to an open snapshot.  So I agree this smells
like a bug, but we don't have enough info here to reproduce it.

            regards, tom lane

Re: BUG #17846: pg_dump doesn't properly dump with paused WAL replay

From

Peter Geoghegan

Date:

16 March 2023, 23:32:46

On Thu, Mar 16, 2023 at 8:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I really have no idea what's going on there, but can you show the exact
> pg_dump command(s) being issued?  I'm particularly curious whether you
> are using parallel dump.  The same for the failing pg_restore.
>
> Also, are all the moving parts (primary server, secondary server,
> pg_dump, pg_restore) exactly the same PG version?

I have heard multiple internal reports of incorrect hint bits being
set on standbys where exported snapshots are used (a capability first
added in 2017, by commit 6c2003f8). These were cases that didn't
involve pg_dump at all, though; they involved a third party utility
that happens to use exported snapshots to parallelize a process that
synchronizes a remote database system (not a Postgres database) with
the user's Postgres database. This was also a 13 database, though I
believe we've seen it on an 11 database too. Both systems had
suspiciously similar symptoms, and both used this utility that exports
snapshots.

I never got to the bottom of the problem despite spending some time on
it. I never personally had the opportunity to directly examine the
incorrectly set hint bits on the standby. However, I am quite
confident that spuriously set hint bits were involved. A coworker had
the opportunity to examine affected pages forensically at one point.
They clearly demonstrated incorrectly set hint bits on affected
standbys. The original user visible symptom was duplicate entries in
unique indexes on affected standbys, that came and went sporadically.

This is quite difficult to debug, since all it takes is an FPI on the
primary to "fix" the issue on the affected standby (actually there are
a couple of other things that could do it, like freezing, but FPIs
seem to be most likely). As you can imagine, there are various
practical constraints on accessing affected systems. I rate the
chances of this being due to some undiscovered bug in this area as
high.

--
Peter Geoghegan

Re: BUG #17846: pg_dump doesn't properly dump with paused WAL replay

From

Tom Lane

Date:

20 March 2023, 18:39:44

[ please keep the mailing list cc'd ]

Francisco Reinolds <francisco.reinolds@channable.com> writes:
> On 16-03-2023 16:10, Tom Lane wrote:
>> I really have no idea what's going on there, but can you show the exact
>> pg_dump command(s) being issued?  I'm particularly curious whether you
>> are using parallel dump.  The same for the failing pg_restore.

> Of course:

> - pg_dump: pg_dump --port 5432 --host localhost --verbose
> --format=directory --jobs=8 --file=<random_directory> --dbname=<dbname>
> - pg_restore: pg_restore --exit-on-error --cluster 13/<cluster_name>
> --dbname=<dbname> --port <port> --format=directory --jobs=8
> --use-list=/tmp/tmpsote5wvm --clean --if-exists <random directory>

Hmm, so the fact that the dump is being done in parallel is very likely
relevant.  Perhaps parallelism on the restore is also relevant, not
sure.  Can you try running each of those steps not-parallel to see
if the problem goes away?

I'm also slightly troubled by the --use-list option, and am wondering
if faulty creation of the restore list could be a contributing
factor.  The error looks like missing data row(s) not missing schema
objects; but perhaps if the problematic table(s) are partitioned
then one could lead to the other?  Could we see the DDL definition
for the problematic table(s)?

>> Also, are all the moving parts (primary server, secondary server,
>> pg_dump, pg_restore) exactly the same PG version?

> So, the version of both the primary and the secondary servers match, 13.8,
> but the server of the instance where we run the backup verifications does
> not, it's currently sitting at 13.6

Hmm.  With some unsupported assumptions about your schema, I could
believe that some of the 13.9 bug fixes are relevant, particularly

    * Fix construction of per-partition foreign key constraints while doing
    ALTER TABLE ATTACH PARTITION (Jehan-Guillaume de Rorthais, Álvaro
    Herrera)

    Previously, incorrect or duplicate constraints could be constructed
    for the newly-added partition.

            regards, tom lane