Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
Date
Msg-id daca43e3-3857-b933-4194-64d4c8ff261f@postgrespro.ru
Whole thread Raw
In response to Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence  (Antonin Houska <ah@cybertec.at>)
Responses Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence
List pgsql-hackers
26.08.2019 14:15, Antonin Houska wrote:
> Peter Geoghegan <pg@bowt.ie> wrote:
>
>> Consumers of this new infrastructure probably won't be limited to the
>> deduplication feature;
> It'd also solve an open problem of the aggregate push-down patch [1], in
> particular see the mention of pg_opclass in [2]: the partial aggregate
> node below the final join must not put multiple opclass-equal values of
> which are not byte-wise equal into the same group because some
> information needed by WHERE or JOIN/ON condition may be lost this
> way. The scale of the numeric type is the most obvious example.
>
>> I would like to:
>>
>> * Get some buy-in on whether or not the precise distinctions I would
>> like to make are correct for deduplication in particular, and as
>> useful as possible for other cases that we may need to add later on.
>>
>> * Figure out the exact interface through which opclass/opfamily
>> authors can represent that their notion of equality is compatible with
>> deduplication/compression.
> It's not entirely clear to me whether opclass or opfamily should carry
> this information. opclass probably makes more sense for index related
> problems and the aggregate push-down patch can live with that. I don't
> see particular reason to add any flag to opfamily. (Planner uses uses
> both pg_opclass and pg_opfamily catalogs.)
>
> I think the fact that the aggregate push-down would benefit from this
> enhancement should affect choice of the new catalog attribute name,
> i.e. it should be not mention words as concrete as "deduplication" or
> "compression".


The patch implementing new opclass option is attached.

It adds new attribute pg_opclass.opcisbitwise, which is set to true if 
opclass equality is the same as binary equality.
By default it is true. It is set to false for numeric and float4, float8.

Does anyarray opclasses need special treatment?

New syntax for create opclass is  "CREATE OPERATOR CLASS NOT BITWISE ..."

Any ideas on better names?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Ryan Lambert
Date:
Subject: Re: FETCH FIRST clause PERCENT option
Next
From: Alvaro Herrera
Date:
Subject: Re: Two pg_rewind patches (auto generate recovery conf and ensureclean shutdown)