Thread: pgsql: Generalize hash and ordering support in amapi

pgsql: Generalize hash and ordering support in amapi

From
Peter Eisentraut
Date:
Generalize hash and ordering support in amapi

Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ce62f2f2a0a48d021f250ba84dfcab5d45ddc914

Modified Files
--------------
contrib/bloom/blutils.c                          |  2 ++
doc/src/sgml/indexam.sgml                        |  4 +++
src/backend/access/brin/brin.c                   |  2 ++
src/backend/access/gin/ginutil.c                 |  2 ++
src/backend/access/gist/gist.c                   |  2 ++
src/backend/access/hash/hash.c                   |  2 ++
src/backend/access/nbtree/nbtree.c               |  2 ++
src/backend/access/spgist/spgutils.c             |  2 ++
src/backend/commands/opclasscmds.c               | 34 ++++++++++++------------
src/backend/executor/nodeIndexscan.c             |  4 +--
src/backend/utils/cache/lsyscache.c              |  8 +++---
src/include/access/amapi.h                       |  4 +++
src/test/modules/dummy_index_am/dummy_index_am.c |  2 ++
src/test/regress/expected/alter_generic.out      |  6 ++---
14 files changed, 50 insertions(+), 26 deletions(-)


Re: pgsql: Generalize hash and ordering support in amapi

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> Generalize hash and ordering support in amapi
> Stop comparing access method OID values against HASH_AM_OID and
> BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
> if it advertises its ability to perform the necessary ordering,
> hashing, or cross-type comparing functionality.  A field amcanorder
> already existed, this uses it more widely.  Fields amcanhash and
> amcancrosscompare are added for the other purposes.

AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:

-       /* must be btree or hash */
-       if (op_form->amopmethod == BTREE_AM_OID ||
-           op_form->amopmethod == HASH_AM_OID)
+       if (amroutine->amcancrosscompare)

More generally, I think that "amcancrosscompare" is a horribly
underspecified and misleadingly-named concept.  Most of our
AMs are capable of supporting cross-type operators, though for
some it's more about what the per-opclass infrastructure can do
than what the AM does.  So what does that flag *really* mean?

I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree or hash opfamily.

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree opfamily.

I do not think this was in shape to be committed.  For stuff
like this, clarity of thought and precision of specification
are critical, and this patch has neither.

            regards, tom lane



Re: pgsql: Generalize hash and ordering support in amapi

From
Mark Dilger
Date:

On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
> Generalize hash and ordering support in amapi
> Stop comparing access method OID values against HASH_AM_OID and
> BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
> if it advertises its ability to perform the necessary ordering,
> hashing, or cross-type comparing functionality.  A field amcanorder
> already existed, this uses it more widely.  Fields amcanhash and
> amcancrosscompare are added for the other purposes.

AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:

-       /* must be btree or hash */
-       if (op_form->amopmethod == BTREE_AM_OID ||
-           op_form->amopmethod == HASH_AM_OID)
+       if (amroutine->amcancrosscompare)

It seems you are right.  hashhandler()'s amroutine should have this true, also.
 

More generally, I think that "amcancrosscompare" is a horribly
underspecified and misleadingly-named concept.  Most of our
AMs are capable of supporting cross-type operators, though for
some it's more about what the per-opclass infrastructure can do
than what the AM does.  So what does that flag *really* mean?

The comments in amapi.h seem to have gotten shortened since v19 of the patch which had them as:

+   /*
+    * Does AM support hashing the indexed column's value, including providing
+    * all hash semantics functions for HASHSTANDARD_PROC and HASHEXTENDED_PROC
+    * conforming to the same calling conventions as those of the hash AM?
+    */
+   bool        amcanhash;
+   /*
+    * Does AM have equality semantics that are compatible across all equality
+    * operators within an operator family?
+    */
+   bool        amcancrosscompare;

The logic in equality_ops_are_compatible() was trusting that equality operators found in an opfamily for btree or hash were ok, but not trusting operators found in opfamilies of other AMs.  Now, after the patch, other AMs can be marked as suitable.  That's really the core of what the flag means:  "Can the system trust that equality operators found in opfamilies of the AM are well-behaved", or something like that.  I agree that this should really be more a feature of an opfamily than an AM, but the system already does it on AM-level granularity, and I don't think it's the responsibility of this patch to rearchitect all that.


I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree or hash opfamily.

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree opfamily.

I agree these comments need updating.

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Generalize hash and ordering support in amapi

From
Peter Eisentraut
Date:
On 27.02.25 23:17, Mark Dilger wrote:
> 
> On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     Peter Eisentraut <peter@eisentraut.org
>     <mailto:peter@eisentraut.org>> writes:
>      > Generalize hash and ordering support in amapi
>      > Stop comparing access method OID values against HASH_AM_OID and
>      > BTREE_AM_OID, and instead check the IndexAmRoutine for an index
>     to see
>      > if it advertises its ability to perform the necessary ordering,
>      > hashing, or cross-type comparing functionality.  A field amcanorder
>      > already existed, this uses it more widely.  Fields amcanhash and
>      > amcancrosscompare are added for the other purposes.
> 
>     AFAICS, this patch sets amcancrosscompare true only for btree,
>     which means this change to equality_ops_are_compatible is surely wrong:
> 
>     -       /* must be btree or hash */
>     -       if (op_form->amopmethod == BTREE_AM_OID ||
>     -           op_form->amopmethod == HASH_AM_OID)
>     +       if (amroutine->amcancrosscompare)
> 
> 
> It seems you are right.  hashhandler()'s amroutine should have this 
> true, also.

I have fixed that.  I will come back to the rest of the discussion in a bit.




Re: pgsql: Generalize hash and ordering support in amapi

From
Peter Eisentraut
Date:
On 27.02.25 23:17, Mark Dilger wrote:
> The logic in equality_ops_are_compatible() was trusting that equality 
> operators found in an opfamily for btree or hash were ok, but not 
> trusting operators found in opfamilies of other AMs.  Now, after the 
> patch, other AMs can be marked as suitable.  That's really the core of 
> what the flag means:  "Can the system trust that equality operators 
> found in opfamilies of the AM are well-behaved", or something like 
> that.

Yeah, what might be a good English identifier for that?

>     I also object strongly to the fact that the comments for
>     equality_ops_are_compatible and comparison_ops_are_compatible
>     were not modified:
> 
>       * This is trivially true if they are the same operator.  Otherwise,
>       * we look to see if they can be found in the same btree or hash
>     opfamily.
> 
>       * This is trivially true if they are the same operator.  Otherwise,
>       * we look to see if they can be found in the same btree opfamily.
> 
> I agree these comments need updating.

Mark, can you suggest updated wording for those?




Re: pgsql: Generalize hash and ordering support in amapi

From
Mark Dilger
Date:


On Tue, Mar 4, 2025 at 8:46 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 27.02.25 23:17, Mark Dilger wrote:
> The logic in equality_ops_are_compatible() was trusting that equality
> operators found in an opfamily for btree or hash were ok, but not
> trusting operators found in opfamilies of other AMs.  Now, after the
> patch, other AMs can be marked as suitable.  That's really the core of
> what the flag means:  "Can the system trust that equality operators
> found in opfamilies of the AM are well-behaved", or something like
> that.

Yeah, what might be a good English identifier for that?

>     I also object strongly to the fact that the comments for
>     equality_ops_are_compatible and comparison_ops_are_compatible
>     were not modified:
>
>       * This is trivially true if they are the same operator.  Otherwise,
>       * we look to see if they can be found in the same btree or hash
>     opfamily.
>
>       * This is trivially true if they are the same operator.  Otherwise,
>       * we look to see if they can be found in the same btree opfamily.
>
> I agree these comments need updating.

Mark, can you suggest updated wording for those?


Yes, happily, though I think I already did, in the v21 patch set.  Here is the meat of that patch:

- * This is trivially true if they are the same operator.  Otherwise,
- * we look to see if they can be found in the same btree or hash opfamily.
- * Either finding allows us to assume that they have compatible notions
- * of equality.  (The reason we need to do these pushups is that one might
- * be a cross-type operator; for instance int24eq vs int4eq.)
+ * This is trivially true if they are the same operator.  Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM which
+ * advertises amcancrosscompare.  Either finding allows us to assume that they
+ * have compatible notions of equality.  (The reason we need to do these
+ * pushups is that one might be a cross-type operator; for instance int24eq vs
+ * int4eq.)

and

- * This is trivially true if they are the same operator.  Otherwise,
- * we look to see if they can be found in the same btree opfamily.
- * For example, '<' and '>=' ops match if they belong to the same family.
+ * This is trivially true if they are the same operator.  Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM that advertises
+ * both amcancrosscompare and amcanorder.  For example, '<' and '>=' ops match
+ * if they belong to the same family.
  *
- * (This is identical to equality_ops_are_compatible(), except that we
- * don't bother to examine hash opclasses.)
+ * (This is identical to equality_ops_are_compatible(), except that we don't
+ * bother to examine opclasses for index AMs which cannot do ordering, such as
+ * the hash index AM.)

See v21-0003-Update-syscache-code-comments.patch for the whole thing, including a commit message about why this is needed.

--

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Generalize hash and ordering support in amapi

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 27.02.25 23:17, Mark Dilger wrote:
>> The logic in equality_ops_are_compatible() was trusting that equality 
>> operators found in an opfamily for btree or hash were ok, but not 
>> trusting operators found in opfamilies of other AMs.  Now, after the 
>> patch, other AMs can be marked as suitable.  That's really the core of 
>> what the flag means:  "Can the system trust that equality operators 
>> found in opfamilies of the AM are well-behaved", or something like 
>> that.

> Yeah, what might be a good English identifier for that?

Looking at the call sites, the callers of equality_ops_are_compatible
basically are interested in whether the two operators agree on which
values are distinct.  That can be rephrased as "equality satisfies
the transitive law": if A = B according to one operator, while B = C
according to the other operator, then A = C according to some relevant
member of the opfamily (which might be yet a third operator).

The single caller of comparison_ops_are_compatible is
ineq_histogram_selectivity, which knows it is dealing with
inequality operators (<, >, <=, or >=), and what it wants to know
is whether the two operators use the same sort ordering.

So really the properties we want the AM to promise are "all operators
within one of my opfamilies have the same notion of equality" or "all
operators within one of my opfamilies use the same sort ordering".
You could reduce this to one property that means slightly different
things depending on whether amcanorder, perhaps, since "same sort
ordering" implies "same notion of equality".  Maybe call it
"amconsistentsemantics"?  OTOH, requiring amcanorder might be
unpleasantly much, since an AM might have btree-like operator
semantics and yet not support ordered retrieval.  So perhaps two
separate flags "amconsistentequality" and "amconsistentordering"
would be better.

In any case, my gripe was less about the name of the flag and more
about the lack of a clear specification of what it means.  "does AM
support cross-type comparisons" doesn't get the job done.  Maybe
we can fit

    /* do operators within an opfamily have consistent equality semantics? */
    bool        amconsistentequality;
    /* do operators within an opfamily have consistent ordering semantics? */
    bool        amconsistentordering;

>>  * This is trivially true if they are the same operator.  Otherwise,
>>  * we look to see if they can be found in the same btree or hash
>> opfamily.
>> 
>>  * This is trivially true if they are the same operator.  Otherwise,
>>  * we look to see if they can be found in the same btree opfamily.
>> 
>> I agree these comments need updating.

> Mark, can you suggest updated wording for those?

Maybe "Otherwise, we look to see if they both belong to an opfamily
that guarantees compatible semantics for equality" (or "for ordering"
in the second case).

BTW, the header comment for query_is_distinct_for also needs updated:

 * would use itself.  We use equality_ops_are_compatible() to check
 * compatibility.  That looks at btree or hash opfamily membership, and so
 * should give trustworthy answers for all operators that we might need
 * to deal with here.)

Also, I'm thinking that equality_ops_are_compatible and
comparison_ops_are_compatible now have a bit of a performance problem.
The original coding was intended to provide a cheap check before
expending the cycles to test op_in_opfamily.  This patch has
completely blown that up, since GetIndexAmRoutineByAmId is *more*
expensive than op_in_opfamily.  I suggest reversing things so that we
test op_in_opfamily first and only bother to look up the AM details
when we've verified that both operators belong to the same family.

            regards, tom lane



Re: pgsql: Generalize hash and ordering support in amapi

From
Peter Eisentraut
Date:
On 04.03.25 18:35, Tom Lane wrote:
> In any case, my gripe was less about the name of the flag and more
> about the lack of a clear specification of what it means.  "does AM
> support cross-type comparisons" doesn't get the job done.  Maybe
> we can fit
> 
>      /* do operators within an opfamily have consistent equality semantics? */
>      bool        amconsistentequality;
>      /* do operators within an opfamily have consistent ordering semantics? */
>      bool        amconsistentordering;

> Also, I'm thinking that equality_ops_are_compatible and
> comparison_ops_are_compatible now have a bit of a performance problem.
> The original coding was intended to provide a cheap check before
> expending the cycles to test op_in_opfamily.  This patch has
> completely blown that up, since GetIndexAmRoutineByAmId is *more*
> expensive than op_in_opfamily.  I suggest reversing things so that we
> test op_in_opfamily first and only bother to look up the AM details
> when we've verified that both operators belong to the same family.

I have committed fixes for these issues along the lines you suggested.




Re: pgsql: Generalize hash and ordering support in amapi

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> I have committed fixes for these issues along the lines you suggested.

Thanks.  There is a typo in hashhandler:

-   amroutine->amcancrosscompare = true;
+   amroutine->amconsistentequality = true;
+   amroutine->amconsistentequality = false;

The second line should be setting amconsistentordering = false.

Also, may I suggest one more thing?  I think the test in
comparison_ops_are_compatible should be just

-           if (amroutine->amcanorder && amroutine->amconsistentordering)
+           if (amroutine->amconsistentordering)

(and the comment for it needs adjustment too).  To my mind,
amconsistentordering is a static declaration that operators
within one of the AM's opfamilies are expected to have this
property.  That could be true whether or not the AM is capable
of returning tuples in order.  So although these flags might
commonly be set together, I think they are independent
properties.

            regards, tom lane