Re: Proposal: Global Index for PostgreSQL - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Proposal: Global Index for PostgreSQL
Date
Msg-id CAD21AoCHenor-AXY5QsEdiVBPPTJm+dResmRvNzZLPeUotWyaw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Global Index for PostgreSQL  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Proposal: Global Index for PostgreSQL
List pgsql-hackers
On Sat, Jun 14, 2025 at 2:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jun 9, 2025 at 3:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> > > 4) Update-heavy partitioned tables that should run vacuum frequently. Significant
> > > vacuum slowdown would result in going beyond SLAs without corresponding
> > > significant improvements.
> > >
> > You've got it. I'm on board with prioritizing a VACUUM optimization
> > solution for partitioned tables with global indexes. My initial
> > proposal touched on a proof-of-concept experiment, which indicated no
> > significant performance hit with global index after the optimization.
> > I'll share the detailed VACUUM optimization proposal in this thread
> > within the next couple of days.
>
> As discussed earlier the one of main problems is that the global
> indexes are vacuumed along with each partition whereas logically it
> should be vacuumed only once when all the partitions are vacuum in an
> ideal world.
>
> So my proposal is that we make some infrastructure change in vacuum
> api's such that there is option to tell heap_vacuum_rel() to skip the
> global index vacuum and also return back the deadtid_store, vacuum
> layer will store these deadtid_store, hash it by the reloid until it
> vacuum all the partitions which are covered by a global index.  Once
> that is done it will vacuum the global index, we also need to modify
> the vac_tid_reaped() so that it can take additional input of reloid so
> that it can find the appropriate deadtid_store by looking into the
> hash and then look up the dead tid in respective deadtid_store.

Does it need to keep holding dead TIDs for each partition until it
completes vacuuming all partitions that are covered by the global
index? If so, it would end up holding a huge amount of memory in cases
where there are many partitions. How does maintanence_work_mem (or
autovacuum_work_mem) work in this context? Also, what if the
autovacuum worker who is processing the partitioned table with global
indexes gets cancelled? I guess that we would need to scan the
partitions again in order to collect dead TIDs to vacuum the global
index but it would be very expensive.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Fujii Masao
Date:
Subject: pg_dump misses comments on NOT NULL constraints