Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation |
Date | |
Msg-id | CAH2-WzmqQPoRXwMP=ko8wDj49evFqU+kYYu_vB4S=6BiLcbnDg@mail.gmail.com Whole thread Raw |
In response to | Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
|
List | pgsql-hackers |
On Fri, Apr 16, 2021 at 2:20 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Interesting approach. I think that in an ideal world we would have a > > tuple format with attribute lengths/offsets right in the header. > > I believe that that would indeed be ideal w.r.t. access speed, but > also quite expensive w.r.t. amount of data stored. This would add 2 > bytes per attribute in the current infrastructure (11 bits at least > for each attribute to store offsets), on the current 12 bytes of > overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is > probably always going to be a non-starter, seeing that we can > relatively easily optimize our current attribute access patterns. I don't think that that's why it's a non-starter. This design assumes a world in which everything has already been optimized for this layout. You no longer get to store the varlena header inline, which would break a lot of things in Postgres were it ever to be attempted. The space efficiency issues don't really apply because you have an offset for fixed-length types -- their presence is always implied. I think that you need to encode NULLs differently, which is a lot less space efficient when there are a lot of NULLs. But on the whole this design seems more efficient than what we have currently. This representation of index tuples would be a totally reasonable design were we in a green field situation. (Which is pretty far from the situation we're actually in, of course.) > I understand and appreciate that the "-INF" truncation that is > currently in place is being relied upon in quite some places. Part of > the effort for "+INF" truncation would be determining where and how to > keep the benefits of the "-INF" truncation. I also believe that for > internal pages truncating to "+INF" would be perfectly fine; the > optimizations that I know of only rely on it at the leaf level. I don't doubt that there is nothing special about -inf from a key space point of view. Actually...you could say that -inf is special to the limited extent that we know it only appears in pivot tuples and exploit that property when the !pivotsearch case/optimization is used. But that isn't much of an exception at a high level, so whatever. Anyway, it follows that +inf could in principle be used instead in some or all cases -- all that is truly essential for correctness is that the invariants always be respected. We're still in agreement up until here. > Yes, I also read and appreciate your comments on +inf vs -inf when > this came up in [0]. I'm impressed that you've done your homework on this. > However, if we could choose, I think that having > both options could be quite beneficial, especially when dealing with > many duplicates or duplicate prefixes. This is where things are much less clear -- maybe we're not in agreement here. Who knows, though -- maybe you're right. But you haven't presented any kind of argument. I understand that it's hard to articulate what effects might be in play with stuff like this, so I won't force the issue now. Strong evidence is of course the only way that you'll reliably convince me of this. I should point out that I am a little confused about how this +inf business could be both independently useful and pivotal to implementing [dynamic] prefix truncation/compression. Seems...weird to discuss them together, except maybe to mention in passing that this +inf thing is notable for particularly helping dynamic prefix stuff -- which is it? It is my strong preference that nbtsplitloc.c continue to know approximately nothing about compression or deduplication. While it is true that nbtsplitloc.c's _bt_recsplitloc() is aware of posting lists, this is strictly an optimization that is only justified by the fact that posting lists are sometimes very large, and therefore worth considering directly -- just to get a more accurate idea of how a relevant split point choice affects the balance of free space (we don't bother to do the same thing with non-key INCLUDE columns because they're generally small and equi-sized). And so this _bt_recsplitloc() thing no exception to the general rule, which is: deduplication/posting list maintenance should be *totally* orthogonal to the page split choice logic (the design of posting list splits helps a lot with that). We can afford to have complicated split point choice logic because the question of which split point is optimal is totally decoupled from the question of which are correct -- in particular, from the correctness of the space accounting used to generate candidate split points. It may interest you to know that I once thought that it would be nice to have the *option* of +inf too, so that we could use it in very rare cases like the pathological SPLIT_MANY_DUPLICATES case that _bt_bestsplitloc() has some defenses against. It would perhaps be nice if we could use +inf selectively in that case. I never said anything about this publicly before now, mostly because it wasn't that important -- pathological behaviors like this have never been reported on by users a full 18 months after the release of 12.0, so it's unlikely to be a real concern. > > If you're going to pursue full prefix compression anyway, maybe you > > should use a low key on the leaf level in cases where the optimization > > is in use. This creates complexity during page deletion, because the > > low key in the subtree to the right of the deletion target subtree may > > need to be updated. Perhaps you can find a way to make that work that > > isn't too complicated. > > That would be an interesting research path as well, the cost/benefit > analysis would be much trickier when comparing to the status quo. I'd say that's unclear right now. > > You should probably account for index size here. I have lots of my own > > tests for space utilization, using data from a variety of sources. > > I'd like to mention that the current (and measured) patchset only does > _logical_ dynamic prefix truncation, not the physical prefix > truncation that is described on the wiki page. If you change how _bt_truncate() behaves in any way (e.g. sometimes it's lastleft/+inf based now), and nothing else, you're still bound to change the space utilization with the tests that I maintain -- though perhaps only at the level of noise. I sometimes call these tests "wind tunnel tests". It turns out that you can simulate rather a lot about a real complicated workload with simple, deterministic, serial test cases -- provided you're only interested in the space utilization. This helped a lot for both the Postgres 12 and Postgres 13 stuff (though not the Postgres 14 stuff). > These patches are > useful on their own for multi-key-column btree performance (and some > GIST), regardless of later patches implementing physical dynamic > prefix truncation in the btree AM. Have you isolated the performance impact of the first patch at all? Can you characterize how well it works on its own, perhaps just informally? It would be convenient if the first patch could be treated as an independent thing. -- Peter Geoghegan
pgsql-hackers by date: