Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
Date
Msg-id CAH2-Wzn_Zx6=iFbbow9xO85M=Av4qaT8DcHW5oM-QMd0_ttCsQ@mail.gmail.com
Whole thread Raw
In response to Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
List pgsql-hackers
On Sun, Aug 25, 2019 at 1:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that teaching opclasses to say whether this is okay is a
> reasonable approach.

I've begun working on this, with help from Anastasia.

My working assumption is that I only need to care about
opclass-declared input data types (pg_opclass.opcintype), plus the
corresponding collations -- the former can be used to lookup an
appropriate pg_amproc entry (i.e. B-Tree support function 4), while
the latter are passed to the support function to get an answer about
whether or not it's okay to use deduplication. This approach seems to
be good enough as far as the deduplication project's needs are
concerned. However, I think that I probably need to take a broader
view of the problem than that. Any guidance would be much appreciated.

> > Consumers of this new infrastructure probably won't be limited to the
> > deduplication feature;
>
> Indeed, we run up against this sort of thing all the time in, eg, planner
> optimizations.  I think some sort of "equality is precise" indicator
> would be really useful for a lot of things.

Suppose I wanted to add support for deduplication of a B-Tree index on
an array of integers. This probably wouldn't be very compelling, but
just suppose. It's not clear how this could work within the confines
of the type and operator class systems.

I can hardly determine that it's safe or unsafe to do so at CREATE
INDEX time, since the opclass-declared input data type is always the
pg_type.oid corresponding to 'anyarray' -- I am forced to make a
generic assumption that deduplication is not safe. I must make this
conservative assumption since, in general, the indexed column could
turn out to be an array of numeric datums -- a "transitively unsafe"
anyarray (numeric's display scale issue could leak into anyarray). I'm
not actually worried about any practical downside that this may create
for users of the B-Tree deduplication feature; a B-Tree index on an
array *is* a pretty niche thing. Does seem like I should make sure
that I get this right, though.

Code like the 'anyarray' B-Tree support function 1 (i.e.
btarraycmp()/array_cmp()) doesn't hint at a solution -- it merely does
a lookup of the underlying type's comparator using the typcache. That
depends on having actual anyarray datums to do something with, which
isn't something that this new infrastructure can rely on in any way.

I suppose that the only thing that would work here would be to somehow
look through the pg_attribute entry for the index column, which will
have the details required to distinguish between (say) an array of
integers (which is safe, I think) from an array of numerics (which is
unsafe). From there, the information about the element type could
(say) be passed to the anyarray default opclass' support function 4,
which could do its own internal lookup. That seems like it might be a
solution in search of a problem, though.

BTW, I currently forbid cross-type support function 4 entries for an
opclass, on the grounds that that isn't sensible for deduplication. Do
you think that that restriction is appropriate in general, given the
likelihood that this support function will be used in several other
areas?

Thanks
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Internal key management system
Next
From: Amit Kapila
Date:
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files