Thread: pgsql: Generalize hash and ordering support in amapi
Generalize hash and ordering support in amapi Stop comparing access method OID values against HASH_AM_OID and BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see if it advertises its ability to perform the necessary ordering, hashing, or cross-type comparing functionality. A field amcanorder already existed, this uses it more widely. Fields amcanhash and amcancrosscompare are added for the other purposes. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/ce62f2f2a0a48d021f250ba84dfcab5d45ddc914 Modified Files -------------- contrib/bloom/blutils.c | 2 ++ doc/src/sgml/indexam.sgml | 4 +++ src/backend/access/brin/brin.c | 2 ++ src/backend/access/gin/ginutil.c | 2 ++ src/backend/access/gist/gist.c | 2 ++ src/backend/access/hash/hash.c | 2 ++ src/backend/access/nbtree/nbtree.c | 2 ++ src/backend/access/spgist/spgutils.c | 2 ++ src/backend/commands/opclasscmds.c | 34 ++++++++++++------------ src/backend/executor/nodeIndexscan.c | 4 +-- src/backend/utils/cache/lsyscache.c | 8 +++--- src/include/access/amapi.h | 4 +++ src/test/modules/dummy_index_am/dummy_index_am.c | 2 ++ src/test/regress/expected/alter_generic.out | 6 ++--- 14 files changed, 50 insertions(+), 26 deletions(-)
Peter Eisentraut <peter@eisentraut.org> writes: > Generalize hash and ordering support in amapi > Stop comparing access method OID values against HASH_AM_OID and > BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see > if it advertises its ability to perform the necessary ordering, > hashing, or cross-type comparing functionality. A field amcanorder > already existed, this uses it more widely. Fields amcanhash and > amcancrosscompare are added for the other purposes. AFAICS, this patch sets amcancrosscompare true only for btree, which means this change to equality_ops_are_compatible is surely wrong: - /* must be btree or hash */ - if (op_form->amopmethod == BTREE_AM_OID || - op_form->amopmethod == HASH_AM_OID) + if (amroutine->amcancrosscompare) More generally, I think that "amcancrosscompare" is a horribly underspecified and misleadingly-named concept. Most of our AMs are capable of supporting cross-type operators, though for some it's more about what the per-opclass infrastructure can do than what the AM does. So what does that flag *really* mean? I also object strongly to the fact that the comments for equality_ops_are_compatible and comparison_ops_are_compatible were not modified: * This is trivially true if they are the same operator. Otherwise, * we look to see if they can be found in the same btree or hash opfamily. * This is trivially true if they are the same operator. Otherwise, * we look to see if they can be found in the same btree opfamily. I do not think this was in shape to be committed. For stuff like this, clarity of thought and precision of specification are critical, and this patch has neither. regards, tom lane
On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
> Generalize hash and ordering support in amapi
> Stop comparing access method OID values against HASH_AM_OID and
> BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
> if it advertises its ability to perform the necessary ordering,
> hashing, or cross-type comparing functionality. A field amcanorder
> already existed, this uses it more widely. Fields amcanhash and
> amcancrosscompare are added for the other purposes.
AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:
- /* must be btree or hash */
- if (op_form->amopmethod == BTREE_AM_OID ||
- op_form->amopmethod == HASH_AM_OID)
+ if (amroutine->amcancrosscompare)
It seems you are right. hashhandler()'s amroutine should have this true, also.
More generally, I think that "amcancrosscompare" is a horribly
underspecified and misleadingly-named concept. Most of our
AMs are capable of supporting cross-type operators, though for
some it's more about what the per-opclass infrastructure can do
than what the AM does. So what does that flag *really* mean?
The comments in amapi.h seem to have gotten shortened since v19 of the patch which had them as:
+ /*
+ * Does AM support hashing the indexed column's value, including providing
+ * all hash semantics functions for HASHSTANDARD_PROC and HASHEXTENDED_PROC
+ * conforming to the same calling conventions as those of the hash AM?
+ */
+ bool amcanhash;
+ /*
+ * Does AM have equality semantics that are compatible across all equality
+ * operators within an operator family?
+ */
+ bool amcancrosscompare;
+ * Does AM support hashing the indexed column's value, including providing
+ * all hash semantics functions for HASHSTANDARD_PROC and HASHEXTENDED_PROC
+ * conforming to the same calling conventions as those of the hash AM?
+ */
+ bool amcanhash;
+ /*
+ * Does AM have equality semantics that are compatible across all equality
+ * operators within an operator family?
+ */
+ bool amcancrosscompare;
The logic in equality_ops_are_compatible() was trusting that equality operators found in an opfamily for btree or hash were ok, but not trusting operators found in opfamilies of other AMs. Now, after the patch, other AMs can be marked as suitable. That's really the core of what the flag means: "Can the system trust that equality operators found in opfamilies of the AM are well-behaved", or something like that. I agree that this should really be more a feature of an opfamily than an AM, but the system already does it on AM-level granularity, and I don't think it's the responsibility of this patch to rearchitect all that.
I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:
* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree or hash opfamily.
* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree opfamily.
I agree these comments need updating.
On 27.02.25 23:17, Mark Dilger wrote: > > On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Peter Eisentraut <peter@eisentraut.org > <mailto:peter@eisentraut.org>> writes: > > Generalize hash and ordering support in amapi > > Stop comparing access method OID values against HASH_AM_OID and > > BTREE_AM_OID, and instead check the IndexAmRoutine for an index > to see > > if it advertises its ability to perform the necessary ordering, > > hashing, or cross-type comparing functionality. A field amcanorder > > already existed, this uses it more widely. Fields amcanhash and > > amcancrosscompare are added for the other purposes. > > AFAICS, this patch sets amcancrosscompare true only for btree, > which means this change to equality_ops_are_compatible is surely wrong: > > - /* must be btree or hash */ > - if (op_form->amopmethod == BTREE_AM_OID || > - op_form->amopmethod == HASH_AM_OID) > + if (amroutine->amcancrosscompare) > > > It seems you are right. hashhandler()'s amroutine should have this > true, also. I have fixed that. I will come back to the rest of the discussion in a bit.
On 27.02.25 23:17, Mark Dilger wrote: > The logic in equality_ops_are_compatible() was trusting that equality > operators found in an opfamily for btree or hash were ok, but not > trusting operators found in opfamilies of other AMs. Now, after the > patch, other AMs can be marked as suitable. That's really the core of > what the flag means: "Can the system trust that equality operators > found in opfamilies of the AM are well-behaved", or something like > that. Yeah, what might be a good English identifier for that? > I also object strongly to the fact that the comments for > equality_ops_are_compatible and comparison_ops_are_compatible > were not modified: > > * This is trivially true if they are the same operator. Otherwise, > * we look to see if they can be found in the same btree or hash > opfamily. > > * This is trivially true if they are the same operator. Otherwise, > * we look to see if they can be found in the same btree opfamily. > > I agree these comments need updating. Mark, can you suggest updated wording for those?
On Tue, Mar 4, 2025 at 8:46 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 27.02.25 23:17, Mark Dilger wrote:
> The logic in equality_ops_are_compatible() was trusting that equality
> operators found in an opfamily for btree or hash were ok, but not
> trusting operators found in opfamilies of other AMs. Now, after the
> patch, other AMs can be marked as suitable. That's really the core of
> what the flag means: "Can the system trust that equality operators
> found in opfamilies of the AM are well-behaved", or something like
> that.
Yeah, what might be a good English identifier for that?
> I also object strongly to the fact that the comments for
> equality_ops_are_compatible and comparison_ops_are_compatible
> were not modified:
>
> * This is trivially true if they are the same operator. Otherwise,
> * we look to see if they can be found in the same btree or hash
> opfamily.
>
> * This is trivially true if they are the same operator. Otherwise,
> * we look to see if they can be found in the same btree opfamily.
>
> I agree these comments need updating.
Mark, can you suggest updated wording for those?
Yes, happily, though I think I already did, in the v21 patch set. Here is the meat of that patch:
- * This is trivially true if they are the same operator. Otherwise,
- * we look to see if they can be found in the same btree or hash opfamily.
- * Either finding allows us to assume that they have compatible notions
- * of equality. (The reason we need to do these pushups is that one might
- * be a cross-type operator; for instance int24eq vs int4eq.)
+ * This is trivially true if they are the same operator. Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM which
+ * advertises amcancrosscompare. Either finding allows us to assume that they
+ * have compatible notions of equality. (The reason we need to do these
+ * pushups is that one might be a cross-type operator; for instance int24eq vs
+ * int4eq.)
- * we look to see if they can be found in the same btree or hash opfamily.
- * Either finding allows us to assume that they have compatible notions
- * of equality. (The reason we need to do these pushups is that one might
- * be a cross-type operator; for instance int24eq vs int4eq.)
+ * This is trivially true if they are the same operator. Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM which
+ * advertises amcancrosscompare. Either finding allows us to assume that they
+ * have compatible notions of equality. (The reason we need to do these
+ * pushups is that one might be a cross-type operator; for instance int24eq vs
+ * int4eq.)
and
- * This is trivially true if they are the same operator. Otherwise,
- * we look to see if they can be found in the same btree opfamily.
- * For example, '<' and '>=' ops match if they belong to the same family.
+ * This is trivially true if they are the same operator. Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM that advertises
+ * both amcancrosscompare and amcanorder. For example, '<' and '>=' ops match
+ * if they belong to the same family.
*
- * (This is identical to equality_ops_are_compatible(), except that we
- * don't bother to examine hash opclasses.)
+ * (This is identical to equality_ops_are_compatible(), except that we don't
+ * bother to examine opclasses for index AMs which cannot do ordering, such as
+ * the hash index AM.)
- * we look to see if they can be found in the same btree opfamily.
- * For example, '<' and '>=' ops match if they belong to the same family.
+ * This is trivially true if they are the same operator. Otherwise, we look to
+ * see if they can be found in the same opfamily of an index AM that advertises
+ * both amcancrosscompare and amcanorder. For example, '<' and '>=' ops match
+ * if they belong to the same family.
*
- * (This is identical to equality_ops_are_compatible(), except that we
- * don't bother to examine hash opclasses.)
+ * (This is identical to equality_ops_are_compatible(), except that we don't
+ * bother to examine opclasses for index AMs which cannot do ordering, such as
+ * the hash index AM.)
See v21-0003-Update-syscache-code-comments.patch for the whole thing, including a commit message about why this is needed.
Peter Eisentraut <peter@eisentraut.org> writes: > On 27.02.25 23:17, Mark Dilger wrote: >> The logic in equality_ops_are_compatible() was trusting that equality >> operators found in an opfamily for btree or hash were ok, but not >> trusting operators found in opfamilies of other AMs. Now, after the >> patch, other AMs can be marked as suitable. That's really the core of >> what the flag means: "Can the system trust that equality operators >> found in opfamilies of the AM are well-behaved", or something like >> that. > Yeah, what might be a good English identifier for that? Looking at the call sites, the callers of equality_ops_are_compatible basically are interested in whether the two operators agree on which values are distinct. That can be rephrased as "equality satisfies the transitive law": if A = B according to one operator, while B = C according to the other operator, then A = C according to some relevant member of the opfamily (which might be yet a third operator). The single caller of comparison_ops_are_compatible is ineq_histogram_selectivity, which knows it is dealing with inequality operators (<, >, <=, or >=), and what it wants to know is whether the two operators use the same sort ordering. So really the properties we want the AM to promise are "all operators within one of my opfamilies have the same notion of equality" or "all operators within one of my opfamilies use the same sort ordering". You could reduce this to one property that means slightly different things depending on whether amcanorder, perhaps, since "same sort ordering" implies "same notion of equality". Maybe call it "amconsistentsemantics"? OTOH, requiring amcanorder might be unpleasantly much, since an AM might have btree-like operator semantics and yet not support ordered retrieval. So perhaps two separate flags "amconsistentequality" and "amconsistentordering" would be better. In any case, my gripe was less about the name of the flag and more about the lack of a clear specification of what it means. "does AM support cross-type comparisons" doesn't get the job done. Maybe we can fit /* do operators within an opfamily have consistent equality semantics? */ bool amconsistentequality; /* do operators within an opfamily have consistent ordering semantics? */ bool amconsistentordering; >> * This is trivially true if they are the same operator. Otherwise, >> * we look to see if they can be found in the same btree or hash >> opfamily. >> >> * This is trivially true if they are the same operator. Otherwise, >> * we look to see if they can be found in the same btree opfamily. >> >> I agree these comments need updating. > Mark, can you suggest updated wording for those? Maybe "Otherwise, we look to see if they both belong to an opfamily that guarantees compatible semantics for equality" (or "for ordering" in the second case). BTW, the header comment for query_is_distinct_for also needs updated: * would use itself. We use equality_ops_are_compatible() to check * compatibility. That looks at btree or hash opfamily membership, and so * should give trustworthy answers for all operators that we might need * to deal with here.) Also, I'm thinking that equality_ops_are_compatible and comparison_ops_are_compatible now have a bit of a performance problem. The original coding was intended to provide a cheap check before expending the cycles to test op_in_opfamily. This patch has completely blown that up, since GetIndexAmRoutineByAmId is *more* expensive than op_in_opfamily. I suggest reversing things so that we test op_in_opfamily first and only bother to look up the AM details when we've verified that both operators belong to the same family. regards, tom lane
On 04.03.25 18:35, Tom Lane wrote: > In any case, my gripe was less about the name of the flag and more > about the lack of a clear specification of what it means. "does AM > support cross-type comparisons" doesn't get the job done. Maybe > we can fit > > /* do operators within an opfamily have consistent equality semantics? */ > bool amconsistentequality; > /* do operators within an opfamily have consistent ordering semantics? */ > bool amconsistentordering; > Also, I'm thinking that equality_ops_are_compatible and > comparison_ops_are_compatible now have a bit of a performance problem. > The original coding was intended to provide a cheap check before > expending the cycles to test op_in_opfamily. This patch has > completely blown that up, since GetIndexAmRoutineByAmId is *more* > expensive than op_in_opfamily. I suggest reversing things so that we > test op_in_opfamily first and only bother to look up the AM details > when we've verified that both operators belong to the same family. I have committed fixes for these issues along the lines you suggested.
Peter Eisentraut <peter@eisentraut.org> writes: > I have committed fixes for these issues along the lines you suggested. Thanks. There is a typo in hashhandler: - amroutine->amcancrosscompare = true; + amroutine->amconsistentequality = true; + amroutine->amconsistentequality = false; The second line should be setting amconsistentordering = false. Also, may I suggest one more thing? I think the test in comparison_ops_are_compatible should be just - if (amroutine->amcanorder && amroutine->amconsistentordering) + if (amroutine->amconsistentordering) (and the comment for it needs adjustment too). To my mind, amconsistentordering is a static declaration that operators within one of the AM's opfamilies are expected to have this property. That could be true whether or not the AM is capable of returning tuples in order. So although these flags might commonly be set together, I think they are independent properties. regards, tom lane