Building infrastructure for B-Tree deduplication that recognizes whenopclass equality is also equivalence - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Building infrastructure for B-Tree deduplication that recognizes whenopclass equality is also equivalence |
Date | |
Msg-id | CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence |
List | pgsql-hackers |
Anastasia's nbtree deduplication patch [1] has an open problem that I would like to find a solution for: it currently assumes that there is no difference between binary equality and opclass equality. That won't work for opclasses such as btree/numeric, because compressing equal numeric datums could destroy display scale if equal numeric datums were naively lumped together (actually, the deduplication patch doesn't work like that, but it has other subtle problems due to not having worked out fundamental definitional issues). We don't need to be able to assume that binary equality is exactly the same thing as opclass equality at the level of individual tuples. We only need to be able to assume that the user cannot observe any differences when they are shown output for two datums that are opclass-equal for any opclass that supports deduplication (i.e. cases like the numeric_ops case just won't work, so we shouldn't event try). I believe that it would be okay if we treated two IndexTuples as equivalent and therefore targets to store together in the same posting list when they happen to have distinct binary representations due to the original datums having different TOAST input state. In short, the deduplication patch cannot tolerate being unable to store opclass-equal IndexTuples in the same posting list when the opclass (or the underlying type being indexed) somehow allows that equality isn't equivalence -- that's simply unsupportable. The opclass gets one chance to say whether or not it vetoes the use of deduplication: at CREATE INDEX time. Consumers of this new infrastructure probably won't be limited to the deduplication feature; the same infrastructure will be needed for a B-Tree prefix compression patch (I'm thinking of a configurable CREATE INDEX prefix compression feature). GIN never had to solve this problem because its indexes are always lossy, and cannot support index-only scans. It seems likely that a scheme like the one I have in mind can work for the vast majority of Postgres B-Tree indexes in practice, so I don't think that the user-visible restrictions I'm considering will make the patch significantly less useful (it's already very useful). The most notable restriction for users will almost certainly be not supporting deduplication within indexes that use nondeterministic collations. They were already paying a performance penalty during hashing, though. I would like to: * Get some buy-in on whether or not the precise distinctions I would like to make are correct for deduplication in particular, and as useful as possible for other cases that we may need to add later on. * Figure out the exact interface through which opclass/opfamily authors can represent that their notion of equality is compatible with deduplication/compression. (I think that the use of nondeterministic collations should disable deduplication without explicit action from the operator class -- that should just be baked in.) * Mark most existing btree operator classes as being compatible with deduplication as part of making the patch committable. As I said, I believe that their semantics are already compatible with what we need for deduplication to work sensibly, aside from a handful of specific exceptions. In any case, I'm certain that problems like the btree/numeric display scale problem are simply not worth solving directly. That would add a huge amount of complexity for very little benefit. [1] https://commitfest.postgresql.org/24/2202/ -- Peter Geoghegan
pgsql-hackers by date: