Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: MaxOffsetNumber for Table AMs |
Date | |
Msg-id | 20210504050142.bhpoff7rsdpacnrq@alap3.anarazel.de Whole thread Raw |
In response to | Re: MaxOffsetNumber for Table AMs (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: MaxOffsetNumber for Table AMs
|
List | pgsql-hackers |
Hi, On 2021-04-30 11:51:07 -0700, Peter Geoghegan wrote: > I think that it's reasonable to impose some cost on index AMs here, > but that needs to be bounded sensibly and unambiguously. For example, > it would probably be okay if you had either 6 byte or 8 byte TIDs, but > no other variations. You could require index AMs (the subset of index > AMs that are ever able to store 8 byte TIDs) to directly encode which > width they're dealing with at the level of each IndexTuple. That would > create some problems for nbtree deduplication, especially in boundary > cases, but ISTM that you can manage the complexity by sensibly > restricting how the TIDs work across the board. > For example, the TIDs should always work like unsigned integers -- the > table AM must be willing to work with that restriction. Isn't that more a question of the encoding than the concrete representation? > You'd then have posting lists tuples in nbtree whose TIDs were all > either 6 bytes or 8 bytes wide, with a mix of each possible (though > not particularly likely) on the same leaf page. Say when you have a > table that exceeds the current MaxBlockNumber restrictions. It would > be relatively straightforward for nbtree deduplication to simply > refuse to mix 6 byte and 8 byte datums together to avoid complexity in > boundary cases. The deduplication pass logic has the flexibility that > this requires already. Which nbtree cases do you think would have an easier time supporting switching between 6 or 8 byte tids than supporting fully variable width tids? Given that IndexTupleData already is variable-width, it's not clear to me why supporting two distinct sizes would be harder than a fully variable size? I assume it's things like BTDedupState->htids? > > What's wrong with varlena headers? It would end up being a 1-byte > > header in practically every case, and no variable-width representation > > can do without a length word of some sort. I'm not saying varlena is > > as efficient as some new design could hypothetically be, but it > > doesn't seem like it'd be a big enough problem to stress about. If you > > used a variable-width representation for integers, you might actually > > save bytes in a lot of cases. An awful lot of the TIDs people store in > > practice probably contain several zero bytes, and if we make them > > wider, that's going to be even more true. > > Maybe all of this is true, and maybe it works out to be the best path > forward in the long term, all things considered. But whether or not > that's true is crucially dependent on what real practical table AMs > (of which there will only ever be a tiny number) actually need to do. > Why should we assume that the table AM cannot accept some > restrictions? What good does it do to legalistically define the > problem as a problem for index AMs to solve? I don't think anybody is arguing that AMs cannot accept any restrictions? I do think it's pretty clear that it's not entirely obvious what the concrete set of proper restrictions would be, where we won't end up needing to re-evaluate limits in a few years are. If you add to that the fact that variable-width tids will often end up considerably smaller than our current tids, it's not obvious why we should use bitspace somewhere to indicate an 8 byte tid instead of a a variable-width tid? Greetings, Andres Freund
pgsql-hackers by date: