Thread: Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized


On 6/3/25 17:34, Dimitrios Apostolou wrote:
> The backend process for each of the above ALTER TABLE commands, does not
>    parallelize the foreign key checks for the different partitions. I
>    know, because in the logs I see gigabytes of temporary files being
>    written, with the CONTEXT showing queries issued incrementally on
>    all the different partitions:
> 
>    :LOG:      temporary file: path "pg_tblspc/16390/PG_17_202406281/ 
> pgsql_tmp/pgsql_tmp3363462.579", size 1073741824
>    :CONTEXT:  SQL statement "SELECT fk."columnX" FROM ONLY 
> "public"."table_partition_214" fk
>               LEFT OUTER JOIN ONLY "public"."another_table" pk
>                   ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
>               WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"
> 
>    Why can't the backend issue these queries in parallel workers?

This has been discussed here: 
https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com

Perhaps we should exhume this patch, but I believe the optimal strategy 
is to perform a VACUUM between the data and post-data to build the 
visibility map. The anti-join can then use an efficient index-only scan.

Best regards,
Frédéric




Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized

From
Dimitrios Apostolou
Date:
On Wed, 4 Jun 2025, Frédéric Yhuel wrote:

> On 6/3/25 17:34, Dimitrios Apostolou wrote:
>>  The backend process for each of the above ALTER TABLE commands, does not
>>     parallelize the foreign key checks for the different partitions. I
>>     know, because in the logs I see gigabytes of temporary files being
>>     written, with the CONTEXT showing queries issued incrementally on
>>     all the different partitions:
>>
>>    :LOG:      temporary file: path "pg_tblspc/16390/PG_17_202406281/
>>  pgsql_tmp/pgsql_tmp3363462.579", size 1073741824
>>    :CONTEXT:  SQL statement "SELECT fk."columnX" FROM ONLY
>>  "public"."table_partition_214" fk
>>                LEFT OUTER JOIN ONLY "public"."another_table" pk
>>                    ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX")
>>                WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)"
>>
>>     Why can't the backend issue these queries in parallel workers?
>
> This has been discussed here:
> https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com
>
> Perhaps we should exhume this patch, but I believe the optimal strategy is to
> perform a VACUUM between the data and post-data to build the visibility map.
> The anti-join can then use an efficient index-only scan.

Thanks for pointing to this patch.
Since I run each of the pg_restore sections separately, I will try to
manually do a VACUUM after the "data" and before the "post-data" section.

In general I have noticed most operations are slower after a succesful
pg_restore until VACUUM is complete, which is unfortunate as the database
is huge and it takes days to run. Something I have on my list to try, is
whether a COPY FREEZE would alleviate all this trouble, since all tuples
are immediately visible then. Maybe a patch for a new pg_restore
option --freeze is a better solution. Are my assumptions right?

Thanks,
Dimitris


On 6/4/25 16:12, Dimitrios Apostolou wrote:
> In general I have noticed most operations are slower after a succesful 
> pg_restore until VACUUM is complete, which is unfortunate as the 
> database is huge and it takes days to run. Something I have on my list 
> to try, is whether a COPY FREEZE would alleviate all this trouble, since 
> all tuples are immediately visible then. Maybe a patch for a new 
> pg_restore option --freeze is a better solution. Are my assumptions right?

It seems that the idea has already been discussed: 

https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b

I've CCed Bruce Mojman, in the hope that he can tell us more about it.






On 6/5/25 16:13, Frédéric Yhuel wrote:
> 
> 
> On 6/4/25 16:12, Dimitrios Apostolou wrote:
>> In general I have noticed most operations are slower after a succesful 
>> pg_restore until VACUUM is complete, which is unfortunate as the 
>> database is huge and it takes days to run. Something I have on my list 
>> to try, is whether a COPY FREEZE would alleviate all this trouble, 
>> since all tuples are immediately visible then. Maybe a patch for a new 
>> pg_restore option --freeze is a better solution. Are my assumptions 
>> right?
> 
> It seems that the idea has already been discussed: https:// 
> www.postgresql.org/message-id/flat/ 
> CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr- 
> xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b
> 
> I've CCed Bruce Mojman, in the hope that he can tell us more about it.
> 
> 

(It might be more interesting now than 12 years ago thanks to this 
patch: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7db0cd2145f2bce84cac92402e205e4d2b045bf2)