Re: pg_upgrade failing for 200+ million Large Objects - Mailing list pgsql-hackers
| From | PP L |
|---|---|
| Subject | Re: pg_upgrade failing for 200+ million Large Objects |
| Date | |
| Msg-id | CAGzayZug6y2+3fz=5G6Vs2wr5hCnyc+i2tysgTXc70E52aoCfg@mail.gmail.com Whole thread |
| In response to | Re: pg_upgrade failing for 200+ million Large Objects (Alexander Korotkov <aekorotkov@gmail.com>) |
| List | pgsql-hackers |
Hello hackers,
I wanted to revive this thread specifically around the attislocal optimization discussion. As part of https://github.com/postgres/postgres/commit/b3f0e0503f3, we now batch all the attislocal UPDATEs together, hence making it more performant.
I think we might be able to go one step further and completely skip the attislocal UPDATE for partition tables. This is because the attislocal UPDATE is done immediately after 'CREATE TABLE', during the 'ATTACH PARTITION' step(see attislocal being set to false in MergeAttributesIntoExisting). The UPDATEs emitted by pg_dump are therefore redundant. Even with batching, the single UPDATE still modifies N(no of columns) rows causing N relcache invalidations. This same workflow is then repeated by ATTACH PARTITION causing another N relcache invalidations.
Skipping the attislocal UPDATE definitely speeds up the runtime if there are a lot of partition tables because we will avoid quite a lot of relcache invalidations and rebuild calls. Since this optimization removes the attislocal UPDATE completely, the effect will be even more pronounced for wider partition tables.
There's already precedent for this:
* attinhcount is never explicitly set by pg_dump. It is only modified by MergeAttributesIntoExisting during ATTACH PARTITION
* conislocal for CHECK constraints is explicitly not fixed for partitions. See comment "No need to fix conislocal: ATTACH PARTITION does that" in dumpTableSchema
The only risk I can foresee is the window between CREATE TABLE and ATTACH PARTITION where attislocal will be incorrectly set to true. But I think this window is small enough to not worry about since ATTACH PARTITION immediately succeeds CREATE TABLE(maybe barring some other minor updates)
Here is a simple patch
```
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 137161aa5e0..d3d7403228a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -17734,7 +17734,8 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo)
for (j = 0; j < tbinfo->numatts; j++)
{
if (!tbinfo->attisdropped[j] &&
- !tbinfo->attislocal[j])
+ !tbinfo->attislocal[j] &&
+ !tbinfo->ispartition)
{
if (firstitem)
{
```
I tried a few experiments on my local macbook and noticed an improvement of around ~33-36% (% can vary mostly depending on the number of columns)
Setup(master branch): 300 partitioned root tables with 200 leaves each = 60000 partition tables
300 columns per partition:
baseline : ~30 mins
with patch : ~20 mins (~33% faster)
700 columns per partition:
baseline : ~66 mins
with patch : ~42 mins (~36% faster)
Thanks
Nikhil
Broadcom Inc.
Nikhil
Broadcom Inc.
On Mon, 23 Mar 2026 at 11:50, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jul 29, 2024 at 12:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I'm forced to the conclusion that we'd better make the transaction
> size adaptive as per Alexander's suggestion.
>
> In addition to the patches attached, I experimented with making
> dumpTableSchema fold all the ALTER TABLE commands for a single table
> into one command. That's do-able without too much effort, but I'm now
> convinced that we shouldn't. It would break the semicolon-counting
> hack for detecting that tables like these involve extra work.
> I'm also not very confident that the backend won't have trouble with
> ALTER TABLE commands containing hundreds of subcommands. That's
> something we ought to work on probably, but it's not a project that
> I want to condition v17 pg_upgrade's stability on.
>
> Anyway, proposed patches attached. 0001 is some trivial cleanup
> that I noticed while working on the failed single-ALTER-TABLE idea.
> 0002 merges the catalog-UPDATE commands that dumpTableSchema issues,
> and 0003 is Alexander's suggestion.
Nice to see you picked up my idea. I took a look over the patchset.
Looks good to me.
------
Regards,
Alexander Korotkov
Supabase
pgsql-hackers by date: