Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized - Mailing list pgsql-performance

From Dimitrios Apostolou
Subject Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
Date
Msg-id q479op22-n5pp-n434-n25n-44or777p5r80@tzk.arg
Whole thread Raw
In response to Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized  (Stepan Neretin <slpmcf@gmail.com>)
List pgsql-performance
Hello Stepan!

I can see you tested my patch by itself, and with the following command:

On Monday 2025-07-21 07:58, Stepan Neretin wrote:

>      pg_restore -d "$DB_NAME" -j "$JOBS" --clean --if-exists --freeze "$DUMP_FILE" > /dev/null

On the other hand, I have tested combining --freeze with --data-only
--clean, as implemented by another patch of mine at:


https://www.postgresql.org/message-id/flat/413c1cd8-1d6d-90ba-ac7b-b226a4dad5ed%40gmx.net#44f06ce85a14a6238125a0ad8c9767db

In fact I was considering that patch as a requirement to this one, but I
see that you are using my patch independently to issue --clean --freeze
without --data-only. As far as I understand, this will also issue a
TRUNCATE before each COPY, and the benefits of FREEZE will be the same.
Thank you for revealing this to me.


>The main area of concern is the implementation method in `pg_backup_archiver.c`. The patch introduces a `do_copy`
functionthat modifies the `COPY` statement string to inject the `WITH (FREEZE)` option. 
>
>```c
>if (cp_end - 10 > cp &&
>    strncmp(cp_end - 10, "FROM stdin", 10) == 0
>    )
>    cp_is_well_formed = true;
>```
>
>This approach of string parsing is quite fragile. It makes a strong assumption that the `COPY` statement generated by
`pg_dump`will always end with `... FROM stdin;` (preceded by optional whitespace). While this is true now, it creates a
tightand brittle coupling between `pg_dump`'s output format and 
>`pg_restore`'s parsing logic. Any future changes to `pg_dump` that might add options or slightly alter the `COPY`
syntaxcould break this feature silently or with a non-obvious warning. 

It troubled me too. At least I have a pg_log_warning() message in this
case, and nothing detrimental will happen, it's just that the FREEZE
option will not be used and the message will be logged.

>
>A more robust solution would be to avoid string manipulation entirely. `pg_restore` should assemble the `COPY` command
fromits constituent parts (table name, column list, etc.) and then conditionally append the `WITH (FREEZE)` clause
beforethe final `FROM stdin;`. This would decouple the feature from 
>the exact string representation in the dump archive.

Totally agree, but this sounds impossible to implement. Unfortunately
pg_dump generates SQL commands, and pg_restore edits them and executes
them. I wouldn't know where to start to change this whole architecture.

>
>An alternative—and arguably cleaner—approach might be to shift this logic to pg_dump.

Thinking it more thoroughly, pg_dump is not the place to compose the
restoration SQL commands. The proper place is pg_restore.
I guess it's being done in pg_dump for historical reasons, but I don't
think pg_dump should complicate its commands further (e.g. by adding
WITH FREEZE), as this is a choice to make during restore.
I think I've seen more places in pg_restore that edit the
commands before executing them.


>One important consideration that needs to be highlighted in the documentation for this feature is the impact on WAL
generation.`COPY ... WITH (FREEZE)` is known to generate significantly more WAL traffic because it must log the
freezingof tuples, which can be a surprise for users. Maybe we can insert 
>already frozen pages?

There should be no WAL traffic at all. If the transaction starts with a
TRUNCATE TABLE then COPY skips the wal completely. This gives a big
performance and stability gain when bulk-loading data, and this is what
my 2 patches try to leverage in more cases.

And as far as I can tell, if there is no TRUNCATE in the same
transaction, then pg_restore will output error like the following:

ERROR:  cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction

I hope such an erroris acceptable, since the sysadmin can just remove
--freeze then and re-run the command.

>Additionally, it should be noted that the freeze option only works correctly when performing a fresh load of data into
emptytables. 

Do you think this patch has chances of going in by itself? If so then
yes, I should definitely update the docs and submit it again
individually.


Thank you for the rebase and review.

Dimitris

pgsql-performance by date:

Previous
From: Stepan Neretin
Date:
Subject: Re: [PATCH] ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized