Thread: [WIP] Implement "pg_restore --data-only --clean" as a way to skip WAL
[WIP] Implement "pg_restore --data-only --clean" as a way to skip WAL
From
Dimitrios Apostolou
Date:
Hello list, I implemented --clean support for --data-only, in order to avoid logging to the WAL while populating the database. The attached patch issues a TRUNCATE before COPY on each worker process, and provides a significant speed advantage if the cluster is configure with wal_level=minimal. It also provides a safer way to load the database, as avoiding WAL logging also avoids potential and painful ENOSPACE on the WAL partition as I experienced in [1]. In other words it makes things much better for my use case. [1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net But it has some rough edges. I would appreciate guidance and feedback. * When the table-to-be-TRUNCATEd is referenced as foreign key from other table, the whole transaction fails with: ERROR: cannot truncate a table referenced in a foreign key constraint 1. As a first step, when TRUNCATE fails I want to try a DELETE FROM instead, which has more chances of succeeding, and continuing with the COPY. How to detect the failure of ahprintf("TRUNCATE") and do the alternative without failing the whole transaction? 2. Why doesn't --disable-triggers help? To test this, I have manually issued ALTER TABLE x DISABLE TRIGGER ALL to every table and issued manual TRUNCATE still fails. Shouldn't postgres skip the referential integrity checks? 3. In my tests, all my tables start empty since I have just created the schema. Then pg_restore --data-only --clean first populates the /referencing/ tables, which is allowed because of disabled triggers, and then it tries to load the /referenced/ table. At this point the referential integrity is already broken. Getting an error when TRUNCATing the empty /referenced/ table doesn't make sense. What do you think? Thank you in advance, Dimitris
Attachment
[WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL
From
Dimitrios Apostolou
Date:
On Mon, 14 Apr 2025, Dimitrios Apostolou wrote: > Hello list, > > I implemented --clean support for --data-only, in order to avoid logging to > the WAL while populating the database. The attached patch issues a TRUNCATE > before COPY on each worker process, and provides a significant speed > advantage if the cluster is configure with wal_level=minimal. > > It also provides a safer way to load the database, as avoiding WAL logging > also avoids potential and painful ENOSPACE on the WAL partition as I > experienced in [1]. In other words it makes things much better for my use > case. > > [1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net > Rebased and attached v2 of the patch. It needed some adjustments for the new flags --with-schema and --with-data. I have used this patch several times to pg_restore terabytes of tables without logging through the WAL, and it performs great. > But it has some rough edges. I would appreciate guidance and feedback. The rough edges remain: TRUNCATE fails if there are foreign keys. So if you try pg_restore --data-only --clean to a table referenced via foreign keys, the patch will not work, as mentioned below. > > * When the table-to-be-TRUNCATEd is referenced as foreign key from other > table, the whole transaction fails with: > > ERROR: cannot truncate a table referenced in a foreign key constraint > > 1. As a first step, when TRUNCATE fails I want to try a DELETE FROM > instead, which has more chances of succeeding, and continuing with > the COPY. How to detect the failure of ahprintf("TRUNCATE") and do > the alternative without failing the whole transaction? > > 2. Why doesn't --disable-triggers help? > To test this, I have manually issued > > ALTER TABLE x DISABLE TRIGGER ALL > > to every table and issued manual TRUNCATE still fails. Shouldn't > postgres skip the referential integrity checks? > > 3. In my tests, all my tables start empty since I have just created the > schema. Then pg_restore --data-only --clean first populates > the /referencing/ tables, which is allowed because of disabled > triggers, and then it tries to load the /referenced/ table. > > At this point the referential integrity is already broken. Getting an > error when TRUNCATing the empty /referenced/ table doesn't make > sense. So is there a way to turn off the referential checks for a TRUNCATE? Do you have any other feedback for this patch? Thanks, Dimitris
Attachment
Re: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL
From
Dimitrios Apostolou
Date:
I wonder about the following in pg_restore.c. Right now my implementation covers only parallel restore. In the case of non-parallel restore, I want to make the behaviour similar, i.e. each worker to issue a TRUNCATE before COPY starts. But then the StartTransaction() doesn't make sense, as everything might already be in a transaction because of --single-transaction. Should I completely skip StartTransaction() if !is_parallel? - use_truncate = is_parallel && te->created && + use_truncate = is_parallel && + (te->created || (ropt->dumpData && ropt->clean)) && !is_load_via_partition_root(te); if (use_truncate) { + pg_log_debug("BEGIN transaction and TRUNCATE table \"%s.%s\"", + te->namespace, te->tag); + /* * Parallel restore is always talking directly to a * server, so no need to see if we should issue BEGIN. */ StartTransaction(&AH->public); /* * Issue TRUNCATE with ONLY so that child tables are * not wiped. */ ahprintf(AH, "TRUNCATE TABLE ONLY %s;\n\n", fmtQualifiedId(te->namespace, te->tag));
Re: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL
From
Greg Sabino Mullane
Date:
I think the overall idea is sound. But we need a better solution for the truncate fk failure. Can we introspect somehow and do a truncate or do a delete as necessary? I don't like the idea of simply ignoring the constraint, or of throwing an error.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support