Re: Hash join not finding which collation to use for string hashing - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hash join not finding which collation to use for string hashing
Date
Msg-id 14129.1580417421@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hash join not finding which collation to use for string hashing  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> I assume that what would have to happen to implement this is that an
> SQL-callable function would be passed more than one collation OID,
> perhaps one per argument or something like that. Notice, however, that
> this would require changing the way that functions get called. See the
> DirectFunctionCall{1,2,3,...}Coll() and
> FunctionCall{0,1,2,3,...}Coll() and the definition of
> FunctionCallInfoBaseData -- there's only one spot for an OID available
> right now. Allowing for more would likely have a noticeable impact on
> the cost of calling SQL-callable functions, and that's already
> expensive enough that people have been unhappy about it. It seems
> unlikely that it would be worth incurring more overhead here for every
> query all the time just to make this case work.

The implementation I was visualizing was replacing, eg,
FuncExpr.inputcollid with an OID List, and then teaching PG_GET_COLLATION
to throw an error if the list is longer than one element.  I agree that
the performance implications of that would be pretty troublesome, though.

In the end, it seems like the only solution that would be remotely
practical from a performance standpoint is to redefine things so that
collation-sensitive functions have to be labeled as such in pg_proc,
and then we can have the parser throw the appropriate error if it
can't resolve an input collation for such a function.  Perhaps the
backwards-compatibility hit wouldn't be as bad as it first seems,
since the whole thing can be ignored for functions that haven't got at
least one collatable input, and most of those would likely be all right
with a default assumption that they are collation sensitive.  Or maybe
better, we could make the default assumption be that they aren't
sensitive, with the same error still being thrown at runtime if they are,
so that extensions have to take positive action to get the better error
behavior but if they don't then things are no worse than today.

Mark, obviously, would then lobby for the pg_proc marking to
include one state that identifies functions that only care about
collation when it's nondeterministic.  But I'm still not very
sure how that would work as soon as you look anyplace except at
what texteq() itself would do.  The questions of whether such a
query matches a given index, or could be implemented via mergejoin,
etc, remain.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: standby apply lag on inactive servers
Next
From: Robert Haas
Date:
Subject: Re: Enabling B-Tree deduplication by default