Thread: Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
On 6/3/25 17:34, Dimitrios Apostolou wrote: > The backend process for each of the above ALTER TABLE commands, does not > parallelize the foreign key checks for the different partitions. I > know, because in the logs I see gigabytes of temporary files being > written, with the CONTEXT showing queries issued incrementally on > all the different partitions: > > :LOG: temporary file: path "pg_tblspc/16390/PG_17_202406281/ > pgsql_tmp/pgsql_tmp3363462.579", size 1073741824 > :CONTEXT: SQL statement "SELECT fk."columnX" FROM ONLY > "public"."table_partition_214" fk > LEFT OUTER JOIN ONLY "public"."another_table" pk > ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX") > WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)" > > Why can't the backend issue these queries in parallel workers? This has been discussed here: https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com Perhaps we should exhume this patch, but I believe the optimal strategy is to perform a VACUUM between the data and post-data to build the visibility map. The anti-join can then use an efficient index-only scan. Best regards, Frédéric
Re: ALTER TABLE ADD FOREIGN KEY to partitioned table, is not parallelized
From
Dimitrios Apostolou
Date:
On Wed, 4 Jun 2025, Frédéric Yhuel wrote: > On 6/3/25 17:34, Dimitrios Apostolou wrote: >> The backend process for each of the above ALTER TABLE commands, does not >> parallelize the foreign key checks for the different partitions. I >> know, because in the logs I see gigabytes of temporary files being >> written, with the CONTEXT showing queries issued incrementally on >> all the different partitions: >> >> :LOG: temporary file: path "pg_tblspc/16390/PG_17_202406281/ >> pgsql_tmp/pgsql_tmp3363462.579", size 1073741824 >> :CONTEXT: SQL statement "SELECT fk."columnX" FROM ONLY >> "public"."table_partition_214" fk >> LEFT OUTER JOIN ONLY "public"."another_table" pk >> ON ( pk."columnX" OPERATOR(pg_catalog.=) fk."columnX") >> WHERE pk."columnX" IS NULL AND (fk."columnX" IS NOT NULL)" >> >> Why can't the backend issue these queries in parallel workers? > > This has been discussed here: > https://www.postgresql.org/message-id/flat/0d21e3b4-dcde-290c-875e-6ed5013e8e52%40dalibo.com > > Perhaps we should exhume this patch, but I believe the optimal strategy is to > perform a VACUUM between the data and post-data to build the visibility map. > The anti-join can then use an efficient index-only scan. Thanks for pointing to this patch. Since I run each of the pg_restore sections separately, I will try to manually do a VACUUM after the "data" and before the "post-data" section. In general I have noticed most operations are slower after a succesful pg_restore until VACUUM is complete, which is unfortunate as the database is huge and it takes days to run. Something I have on my list to try, is whether a COPY FREEZE would alleviate all this trouble, since all tuples are immediately visible then. Maybe a patch for a new pg_restore option --freeze is a better solution. Are my assumptions right? Thanks, Dimitris
On 6/4/25 16:12, Dimitrios Apostolou wrote: > In general I have noticed most operations are slower after a succesful > pg_restore until VACUUM is complete, which is unfortunate as the > database is huge and it takes days to run. Something I have on my list > to try, is whether a COPY FREEZE would alleviate all this trouble, since > all tuples are immediately visible then. Maybe a patch for a new > pg_restore option --freeze is a better solution. Are my assumptions right? It seems that the idea has already been discussed: https://www.postgresql.org/message-id/flat/CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr-xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b I've CCed Bruce Mojman, in the hope that he can tell us more about it.
On 6/5/25 16:13, Frédéric Yhuel wrote: > > > On 6/4/25 16:12, Dimitrios Apostolou wrote: >> In general I have noticed most operations are slower after a succesful >> pg_restore until VACUUM is complete, which is unfortunate as the >> database is huge and it takes days to run. Something I have on my list >> to try, is whether a COPY FREEZE would alleviate all this trouble, >> since all tuples are immediately visible then. Maybe a patch for a new >> pg_restore option --freeze is a better solution. Are my assumptions >> right? > > It seems that the idea has already been discussed: https:// > www.postgresql.org/message-id/flat/ > CA%2BU5nM%2BXvkUu9ran%2B5cY%3DTWQquLTpvzte4KVMK%3DaDfbr- > xfNXA%40mail.gmail.com#b61a7fee06e10e61afa68712bc0b3c5b > > I've CCed Bruce Mojman, in the hope that he can tell us more about it. > > (It might be more interesting now than 12 years ago thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7db0cd2145f2bce84cac92402e205e4d2b045bf2)