Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation |
Date | |
Msg-id | CAH2-Wz=DMXNWN4ew+-KR30dLW9viC9GTD4ExFf+2YjhA7c6KZg@mail.gmail.com Whole thread Raw |
In response to | Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
|
List | pgsql-hackers |
On Wed, Oct 30, 2019 at 12:03 PM Andres Freund <andres@anarazel.de> wrote: > I'd much rather not entrench this further, even leaving global indexes > aside. The 4 byte block number is a significant limitation for heap > tables too, and we should lift that at some point not too far away. > Then there's also other AMs that could really use a wider tid space. I agree that that limitation is a problem that should be fixed before too long. But the solution probably shouldn't be a radical departure from what we have today. The vast majority of tables are not affected by the TID space limitation. Those tables that are can tolerate supporting fixed width "long" TIDs (perhaps 8 bytes long?) that are used for the higher portion of the heap TID space alone. The idea here is that TID is varwidth, but actually uses the existing heap TID format most of the time. For larger tables it uses a wider fixed width struct that largely works the same as the old 6 byte struct. > > Though I suppose a posting list almost has to have fixed width TIDs to > > perform acceptably. > > Hm. It's not clear to me why that is? Random access matters for things like determining the correct offset to split a posting list at. This is needed in the event of an overlapping insertion of a new duplicate tuple whose heap TID falls within the range of the posting list. Also, I want to be able to scan posting lists backwards for backwards scans. In general, fixed-width TIDs make the page space accounting fairly simple, which matters a lot in nbtree. I can support varwidth TIDs in the future pretty well if the TID doesn't have to be *arbitrarily* wide. Individual posting lists can themselves either use 6 byte or 8 byte TIDs, preserving the ability to access a posting list entry at random using simple pointer arithmetic. This makes converting over index AMs a lot less painful -- it'll be pretty easy to avoid mixing together the 6 byte and 8 byte structs. > > Can we steal some bits that are currently used for offset number > > instead? 16 bits is far more than we ever need to use for heap offset > > numbers in practice. > > I think that's a terrible idea. For one, some AMs will have significant > higher limits, especially taking compression and larger block sizes into > account. Also not all AMs need identifiers tied so closely to a disk > position, e.g. zedstore does not. We shouldn't hack evermore > information into the offset, given that background. Fair enough, but somebody needs to cut some scope here. > Having to walk through the index tuple might be acceptable - in all > likelihood we'll have to do so anyway. It does however not *really* > resolve the issue that we still need to pass something tid back from the > indexam, so we can fetch the associated tuple from the heap, or add the > tid to a bitmap. But that could be done separately from the index > internal data structures. I agree. > > Generalizing the nbtree AM to be able to work with an arbitrary type > > of table row identifier that isn't at all like a TID raises tricky > > definitional questions. > Hm. I don't see why a different types of TID would imply them being > stable? It is unclear what it means. I would like to see a sketch of a design for varwidth TIDs that balances everybody's concerns. I don't think "indirect" indexes are a realistic goal for Postgres. VACUUM is just too messy there (as is any other garbage collection mechanism). Zedstore and Zheap don't change this. > > Frankly I am not very enthusiastic about working on a project that has > > unclear scope and unclear benefits for users. > > Why would properly supporting AMs like zedstore, global indexes, > "indirect" indexes etc benefit users? Global indexes seem doable. I don't see how "indirect" indexes can ever work in Postgres. I don't know exactly what zedstore needs here, but maybe it can work well with a less ambitious design for varwidth TIDs along the lines I've sketched. -- Peter Geoghegan
pgsql-hackers by date: