Home > mailing lists

Re: Proposal: Global Index for PostgreSQL - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: Proposal: Global Index for PostgreSQL
Date	July 2 16:48:22
Msg-id	CA+HiwqFMZER3H8SK599pjOmwCN2rxBDPMNZF2iTnYXyCDCd1cQ@mail.gmail.com Whole thread Raw
In response to	Re: Proposal: Global Index for PostgreSQL (Dilip Kumar <dilipbalaut@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Jul 2, 2025 at 1:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Tue, Jul 1, 2025 at 7:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > I’ve been working on improving how we handle partition locking during
> > execution of generic plans. Specifically, I committed a patch to defer
> > locking of partitions until after pruning during ExecutorStart(), so
> > we avoid taking locks on partitions that aren’t actually needed --
> > even when the plan contains scans on all partitions. That patch was
> > later reverted, as Tom pointed out that the plan invalidation logic
> > wasn't cleanly handled.
>
> Yes I was following that thread, and at times when I was working on
> locking for global index I had in mind that I would have to do
> something to the locking after that patch is in.  Unfortunately that
> got reverted and then I didn't put any effort in reconsidering how
> locking is handled for global index.
...
>  But the goal remains: to avoid locking
> > unnecessary partitions, particularly in high-partition-count OLTP
> > setups that use PREPARE/EXECUTE.
>
> That makes sense.

Ok, good to know you were keeping a tab on it.

> > The proposed global index design, IIUC, requires locking all leaf
> > partitions up front during planning, and I guess during
> > AcquireExecutorLocks() when using a cached plan, because the index
> > scan could return tuples from any partition. This seems to directly
> > undercut that effort: we'd be back to generic plans causing broad
> > locking regardless of actual runtime needs.
>
> Just trying to understand the locking difference more with/without
> global index, let's assume we have normal partitioned index on non
> partition key column, and if we issue a scan on the partitioned table
> then internally from expand_partitioned_rtentry() we will take lock on
> all the partitions, because we can not prune any partition.
> Similarly, with the global index also all the child partitions under
> the top partition on which we have global index will be locked.  So in
> this case we do not have a difference.

Just to clarify -- I was hoping that, at least for SELECTs, we
wouldn’t need to lock all leaf partitions up front.

One of the potential selling points of global indexes (compared to
partitioned indexes) is that we can avoid expand_partitioned_rtentry()
and the full scan path setup per partition, though that's admittedly
quite an undertaking.  So I was imagining we could just lock the
parent and the global index during planning, and only lock individual
heap relations at execution time -- once we know which partition the
returned tuple belongs to.

Locking isn’t cheap -- and in workloads with thousands of partitions,
it becomes a major source of overhead, especially when the query
doesn't contain pruning quals and ends up touching only a few
partitions in practice. So I think it’s worth seeing if we can avoid
planning-time locking of all partitions in at least the SELECT case,
even if INSERTs may require broader locking due to uniqueness checks,
but see below...

> > I understand that this is currently necessary, given that a global
> > index scan is a single node without per-partition awareness. But it
> > might be worth considering whether the scan could opportunistically
> > defer heap relation locking until it returns a tuple that actually
> > belongs to a particular partition -- similar to how inserts into
> > partitioned tables only lock the target partition at execution time.
> > Or did I miss that inserts also need to lock all partitions up front
> > when global indexes are present, due to cross-partition uniqueness
> > checks?
> >
> > Let me know if I’ve misunderstood the design.
>
> So there difference is in the cases, where we are directly operating
> on the leaf table, e.g. if you inserting directly on the leaf
> relation, currently we just need to lock that partition, but if there
> is global index we need to lock other siblings as well (in short all
> the leaf under the parent which has global index) because if the
> global index is unique we might need to check unique conflict in other
> leafs as well.  I believe when the table is partitioned, it might not
> be the most preferred way to operate directly on the leaf, and with
> global index only this case will be impacted where we are doing DML
> directly on the leaf.  I am not sure in this case how much delay we
> can do in locking, because e.g. for insert we will only identify which
> partition has a duplicate key while inserting in the btree.

Hmm, it’s my understanding that with a true global index, meaning a
single btree structure spanning all partitions, uniqueness conflicts
are detected by probing the index after inserting the tuple into the
heap. So unless we find a matching key in the index, there is no need
to consult any other partitions.

Even if a match is found, we only need to access the heap page for
that specific TID to check visibility, and that would involve just one
partition.

Why then do we need to lock all leaf partitions in advance? It seems
like we could defer locking until the uniqueness check identifies a
partition that actually contains a conflicting tuple, and only then
lock that one heap.

I understand that in some earlier floated ideas for enforcing global
uniqueness (perhaps only briefly mentioned in past discussions), the
approach was to scan all per-partition indexes, which would have
required locking everything up front. But with a unified global index,
that overhead seems avoidable.

Is there something subtle that I am missing that still requires
locking all heaps in advance?

--
Thanks, Amit Langote

pgsql-hackers by date:

From: Fujii Masao
Date: 02 July, 16:24:49
Subject: Re: Don't keep closed WAL segment in page cache after replay

From: Fujii Masao
Date: 02 July, 16:55:16
Subject: Re: Add progressive backoff to XactLockTableWait functions

Re: Proposal: Global Index for PostgreSQL - Mailing list pgsql-hackers

Previous

Next