Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Date
Msg-id CAH2-Wz=DMXNWN4ew+-KR30dLW9viC9GTD4ExFf+2YjhA7c6KZg@mail.gmail.com
Whole thread Raw
In response to Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Andres Freund <andres@anarazel.de>)
Responses Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
List pgsql-hackers
On Wed, Oct 30, 2019 at 12:03 PM Andres Freund <andres@anarazel.de> wrote:
> I'd much rather not entrench this further, even leaving global indexes
> aside. The 4 byte block number is a significant limitation for heap
> tables too, and we should lift that at some point not too far away.
> Then there's also other AMs that could really use a wider tid space.

I agree that that limitation is a problem that should be fixed before
too long. But the solution probably shouldn't be a radical departure
from what we have today. The vast majority of tables are not affected
by the TID space limitation. Those tables that are can tolerate
supporting fixed width "long" TIDs (perhaps 8 bytes long?) that are
used for the higher portion of the heap TID space alone.

The idea here is that TID is varwidth, but actually uses the existing
heap TID format most of the time. For larger tables it uses a wider
fixed width struct that largely works the same as the old 6 byte
struct.

> > Though I suppose a posting list almost has to have fixed width TIDs to
> > perform acceptably.
>
> Hm. It's not clear to me why that is?

Random access matters for things like determining the correct offset
to split a posting list at. This is needed in the event of an
overlapping insertion of a new duplicate tuple whose heap TID falls
within the range of the posting list. Also, I want to be able to scan
posting lists backwards for backwards scans. In general, fixed-width
TIDs make the page space accounting fairly simple, which matters a lot
in nbtree.

I can support varwidth TIDs in the future pretty well if the TID
doesn't have to be *arbitrarily* wide. Individual posting lists can
themselves either use 6 byte or 8 byte TIDs, preserving the ability to
access a posting list entry at random using simple pointer arithmetic.
This makes converting over index AMs a lot less painful -- it'll be
pretty easy to avoid mixing together the 6 byte and 8 byte structs.

> > Can we steal some bits that are currently used for offset number
> > instead? 16 bits is far more than we ever need to use for heap offset
> > numbers in practice.
>
> I think that's a terrible idea. For one, some AMs will have significant
> higher limits, especially taking compression and larger block sizes into
> account. Also not all AMs need identifiers tied so closely to a disk
> position, e.g. zedstore does not.  We shouldn't hack evermore
> information into the offset, given that background.

Fair enough, but somebody needs to cut some scope here.

> Having to walk through the index tuple might be acceptable - in all
> likelihood we'll have to do so anyway.  It does however not *really*
> resolve the issue that we still need to pass something tid back from the
> indexam, so we can fetch the associated tuple from the heap, or add the
> tid to a bitmap. But that could be done separately from the index
> internal data structures.

I agree.

> > Generalizing the nbtree AM to be able to work with an arbitrary type
> > of table row identifier that isn't at all like a TID raises tricky
> > definitional questions.

> Hm. I don't see why a different types of TID would imply them being
> stable?

It is unclear what it means. I would like to see a sketch of a design
for varwidth TIDs that balances everybody's concerns. I don't think
"indirect" indexes are a realistic goal for Postgres. VACUUM is just
too messy there (as is any other garbage collection mechanism).
Zedstore and Zheap don't change this.

> > Frankly I am not very enthusiastic about working on a project that has
> > unclear scope and unclear benefits for users.
>
> Why would properly supporting AMs like zedstore, global indexes,
> "indirect" indexes etc benefit users?

Global indexes seem doable.

I don't see how "indirect" indexes can ever work in Postgres. I don't
know exactly what zedstore needs here, but maybe it can work well with
a less ambitious design for varwidth TIDs along the lines I've
sketched.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Parallel leader process info in EXPLAIN
Next
From: Tomas Vondra
Date:
Subject: Re: MarkBufferDirtyHint() and LSN update