[WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL - Mailing list pgsql-hackers

From Dimitrios Apostolou
Subject [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL
Date
Msg-id c61263f2-7472-5dd8-703d-01e683421f61@gmx.net
Whole thread Raw
In response to [WIP] Implement "pg_restore --data-only --clean" as a way to skip WAL  (Dimitrios Apostolou <jimis@gmx.net>)
List pgsql-hackers
On Mon, 14 Apr 2025, Dimitrios Apostolou wrote:

> Hello list,
>
> I implemented --clean support for --data-only, in order to avoid logging to
> the WAL while populating the database. The attached patch issues a TRUNCATE
> before COPY on each worker process, and provides a significant speed
> advantage if the cluster is configure with wal_level=minimal.
>
> It also provides a safer way to load the database, as avoiding WAL logging
> also avoids potential and painful ENOSPACE on the WAL partition as I
> experienced in [1]. In other words it makes things much better for my use
> case.
>
> [1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net
>

Rebased and attached v2 of the patch.

It needed some adjustments for the new flags --with-schema and
--with-data.

I have used this patch several times to pg_restore terabytes of tables
without logging through the WAL, and it performs great.

> But it has some rough edges. I would appreciate guidance and feedback.

The rough edges remain: TRUNCATE fails if there are foreign keys. So if
you try pg_restore --data-only --clean to a table referenced via foreign
keys, the patch will not work, as mentioned below.

>
> * When the table-to-be-TRUNCATEd is referenced as foreign key from other
>   table, the whole transaction fails with:
>
>    ERROR: cannot truncate a table referenced in a foreign key constraint
>
>   1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
>      instead, which has more chances of succeeding, and continuing with
>      the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
>      the alternative without failing the whole transaction?
>
>   2. Why doesn't --disable-triggers help?
>      To test this, I have manually issued
>
>       ALTER TABLE x DISABLE TRIGGER ALL
>
>      to every table and issued manual TRUNCATE still fails. Shouldn't
>      postgres skip the referential integrity checks?
>
>   3. In my tests, all my tables start empty since I have just created the
>      schema. Then pg_restore --data-only --clean first populates
>      the /referencing/ tables, which is allowed because of disabled
>      triggers, and then it tries to load the /referenced/ table.
>
>      At this point the referential integrity is already broken. Getting an
>      error when TRUNCATing the empty /referenced/ table doesn't make
>      sense.

So is there a way to turn off the referential checks for a TRUNCATE?
Do you have any other feedback for this patch?

Thanks,
Dimitris

Attachment

pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: add function for creating/attaching hash table in DSM registry
Next
From: Shinya Kato
Date:
Subject: Re: Extend COPY FROM with HEADER to skip multiple lines