Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: MaxOffsetNumber for Table AMs |
Date | |
Msg-id | CAH2-WzmKfo58B1Jg-NOucmDZvPbFayArWrQeVd=_=wKwZvJeTw@mail.gmail.com Whole thread Raw |
In response to | Re: MaxOffsetNumber for Table AMs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: MaxOffsetNumber for Table AMs
|
List | pgsql-hackers |
On Fri, Apr 30, 2021 at 11:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Apr 30, 2021 at 2:05 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I agree in principle, but making that work well is very hard in > > practice because of the format of IndexTuple -- which bleeds into > > everything. That TID is special is probably a natural consequence of > > the fact that we don't have an offset-based format of the kind you see > > in other DB systems -- systems that don't emphasize extensibility. We > > cannot jump to a hypothetical TID attribute inexpensively inside code > > like _bt_compare() because we don't have a cheap way to jump straight > > to the datum for any attribute. So we just store TID in IndexTuple > > directly instead. Imagine how much more expensive VACUUM would be if > > it had to grovel through the IndexTuple format. > > I can't imagine that, so maybe you want to enlighten me? I see that > there's a potential problem there, and I'm glad you pointed it out > because I hadn't thought about it previously ... but if you always put > the column or columns that VACUUM would need first, it's not obvious > to me that it would be all that expensive. Maybe. The point is that it is a problem that needs to be solved. > Deforming the tuple to a > sufficient degree to extract the first column, which would even be > fixed-width, shouldn't take much work. I think that it's reasonable to impose some cost on index AMs here, but that needs to be bounded sensibly and unambiguously. For example, it would probably be okay if you had either 6 byte or 8 byte TIDs, but no other variations. You could require index AMs (the subset of index AMs that are ever able to store 8 byte TIDs) to directly encode which width they're dealing with at the level of each IndexTuple. That would create some problems for nbtree deduplication, especially in boundary cases, but ISTM that you can manage the complexity by sensibly restricting how the TIDs work across the board. For example, the TIDs should always work like unsigned integers -- the table AM must be willing to work with that restriction. You'd then have posting lists tuples in nbtree whose TIDs were all either 6 bytes or 8 bytes wide, with a mix of each possible (though not particularly likely) on the same leaf page. Say when you have a table that exceeds the current MaxBlockNumber restrictions. It would be relatively straightforward for nbtree deduplication to simply refuse to mix 6 byte and 8 byte datums together to avoid complexity in boundary cases. The deduplication pass logic has the flexibility that this requires already. > > I wonder how the same useful performance characteristics can be > > maintained with a variable-width TID design. If you solve the problem > > by changing IndexTuple, then you are kind of obligated to not use > > varlena headers to keep the on-disk size manageable. Who knows where > > it all ends? > > What's wrong with varlena headers? It would end up being a 1-byte > header in practically every case, and no variable-width representation > can do without a length word of some sort. I'm not saying varlena is > as efficient as some new design could hypothetically be, but it > doesn't seem like it'd be a big enough problem to stress about. If you > used a variable-width representation for integers, you might actually > save bytes in a lot of cases. An awful lot of the TIDs people store in > practice probably contain several zero bytes, and if we make them > wider, that's going to be even more true. Maybe all of this is true, and maybe it works out to be the best path forward in the long term, all things considered. But whether or not that's true is crucially dependent on what real practical table AMs (of which there will only ever be a tiny number) actually need to do. Why should we assume that the table AM cannot accept some restrictions? What good does it do to legalistically define the problem as a problem for index AMs to solve? -- Peter Geoghegan
pgsql-hackers by date: