Re: Hash join not finding which collation to use for string hashing - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Hash join not finding which collation to use for string hashing
Date
Msg-id CA+HiwqFjz+SyxU2cGLWz6nnkw5Q2xdjDuOCg4_ryH5eNCw9PEw@mail.gmail.com
Whole thread Raw
In response to Re: Hash join not finding which collation to use for string hashing  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Fri, Jan 31, 2020 at 6:15 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> > On Jan 30, 2020, at 12:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Mark Dilger <mark.dilger@enterprisedb.com> writes:
> >> Would it make sense to catch a qual with unassigned collation
> >> somewhere in the planner, where the qual's operator family is
> >> estatblished, by checking if the operator family behavior is sensitive
> >> to collations?
> >
> >> I’m not sure how to do that.  pg_opfamily doesn’t seem to have a field for that.  Can you recommend how I would
proceedthere? 
> >
> > There's no such information attached to opfamilies, which is more or less
> > forced by the fact that individual operators don't expose it either.
> > There's not much hope of making that better without incompatible changes
> > in the requirements for extensions to define operators and/or operator
> > families.
>
> Thanks, Tom, for confirming this.

Just for the record, I will explain why I brought up doing anything
with operator families at all.  What I was really imagining is putting
a hard-coded check somewhere in the middle of equivalence processing
to see if a given qual's operator would be sensitive to collation
based *only* on whether it belongs to a text operator family, such as
TEXT_BTREE_FAM_OID, whereas the qual expression's inputcollid is 0
(parser failed to resolve collation conflict among its arguments) and
erroring out if so.  If we do that, maybe we won't need
check_collation_set() that's used in various text operators.  Also,
erroring out sooner might allow us to produce more specific error
message, which as I understand it, would help with one of the Mark's
complaints that the error message is too ambiguous due to emitted as
such a low layer.

I thought of the idea after running into a recent commit relevant to
non-deterministic collations:

commit 2810396312664bdb941e549df7dfa75218d73a1c
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Sep 21 16:29:17 2019 -0400

    Fix up handling of nondeterministic collations with pattern_ops opclasses.

    text_pattern_ops and its siblings can't be used with nondeterministic
    collations, because they use the text_eq operator which will not behave
    as bitwise equality if applied with a nondeterministic collation.  The
    initial implementation of that restriction was to insert a run-time test
    in the related comparison functions, but that is inefficient, may throw
    misleading errors, and will throw errors in some cases that would work.
    It seems sufficient to just prevent the combination during CREATE INDEX,
    so do that instead.

    Lacking any better way to identify the opclasses involved, we need to
    hard-wire tests for them, which requires hand-assigned values for their
    OIDs, which forces a catversion bump because they previously had OIDs
    that would be assigned automatically.  That's slightly annoying in the
    v12 branch, but fortunately we're not at rc1 yet, so just do it.

    Discussion: https://postgr.es/m/22566.1568675619@sss.pgh.pa.us

IIUC, the above commit removes a check_collation_set() call from a
operator class comparison function in favor of ensuring that an index
using that operator class can only be defined with a deterministic
collation in the first place.  But as the above commit is purportedly
only a stop-gap solution due to lacking operator class infrastructure
to consider collation in operator semantics, maybe we shouldn't spread
such a hack in other places.

Thanks,
Amit



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: SimpleLruTruncate() mutual exclusion
Next
From: Dean Rasheed
Date:
Subject: Re: Marking some contrib modules as trusted extensions