Thread: Explicit deterministic COLLATE fails with pattern matching operationson column with non-deterministic collation

Explicit deterministic COLLATE fails with pattern matching operationson column with non-deterministic collation

From

James Lucas

Date:

27 May 2020, 15:23:06

Hi all,

Wanted to call out what seems like a possible bug in non-deterministic
collation handling with pattern matching operators.  Per the
documentation, non-deterministic collations are not supported with
pattern matching operators.  Section 9.7 of the PG12 manual recommends
"The pattern matching operators of all three kinds do not support
nondeterministic collations. If required, apply a different collation
to the expression to work around this limitation."  However, I'm
finding that pattern matching operations fail when a column is
declared with a non-deterministic collation, *even if* a different,
deterministic collation is explicitly applied to the pattern matching
operation.  This doesn't seem to be the expected behavior.

Example.  This is tested on Postgres 12.3, on Centos 8.1.1911 with libicu 60.3.

Create a non-deterministic collation.
create collation mycollation (provider = icu, locale =
'en-US-ks-level2.utf8', deterministic = false);

Create a couple of sample tables:
create table ctest (id numeric, t text);
create table ctestnd (id numeric, t text collate mycollation);

Populate them with some data:
insert into ctest values (1,'aAa');
insert into ctest select generate_series(2,100000),'bbb';
insert into ctestnd select id, t from ctest;
analyze ctest, ctestnd;

Add a few indexes:
create index ctest_idx01 on ctest (t);
create index ctest_idx02 on ctest (t collate "C");
create index ctestnd_idx01 on ctestnd (t);
create index ctestnd_idx02 on ctestnd (t collate "C");

Test on ctest:
explain select * from ctest where t = 'aAa' collate "C";
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..4.44 rows=1 width=10)
   Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctest where t like 'a%';
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..8.44 rows=1 width=10)
   Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
   Filter: (t ~~ 'a%'::text)
COMMENT: Actually this is very interesting, because even without an
explicit COLLATE clause, LIKE still uses the "C" collation index.  Not
sure if that's intended behavior either?

explain select * from ctest where t like 'a%' collate "C";
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using ctest_idx02 on ctest  (cost=0.42..8.44 rows=1 width=10)
   Index Cond: ((t >= 'a'::text) AND (t < 'b'::text))
   Filter: (t ~~ 'a%'::text COLLATE "C")
COMMENT: Uses explicit collation and index as expected.


Test on ctestnd:
explain select * from ctestnd where t = 'aAa' collate "C";
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using ctestnd_idx02 on ctestnd  (cost=0.42..4.44 rows=1 width=10)
   Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

explain select * from ctestnd where t like 'a%';
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: Fails as expected.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: Not expected.  It seems like the explicit COLLATE clause is
ignored in this case.  I've tried different placements for the COLLATE
clause, and none seem to work.


Is this a bug, or have I missed something?

Thanks,
James Lucas

Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation

From

"David G. Johnston"

Date:

27 May 2020, 15:53:04

On Wed, May 27, 2020 at 8:23 AM James Lucas <jlucasdba@gmail.com> wrote:

create table ctestnd (id numeric, t text collate mycollation);

create index ctestnd_idx02 on ctestnd (t collate "C");

Test on ctestnd:
explain select * from ctestnd where t = 'aAa' collate "C";
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using ctestnd_idx02 on ctestnd (cost=0.42..4.44 rows=1 width=10)
Index Cond: (t = 'aAa'::text COLLATE "C")
COMMENT: Works as expected.

Uses an index scan which is where the deterministic collation exists

explain select * from ctestnd where t like 'a%';
ERROR: nondeterministic collations are not supported for LIKE
COMMENT: Fails as expected.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR: nondeterministic collations are not supported for LIKE

Your schema is inherently unstable in this respect because the planner has to be allowed to choose a sequential scan and as soon as it does it attempts to perform like comparisons with table data that is stored using a non-deterministic collation.

I don't know what kinds of promises we make about implicit collation manipulation here but absent such a transformation the sequential scan plan with LIKE generates an invalid plan choice. That it doesn't go find the index that happens to have a workable collation for the query is unsurprising - whether that is even a possibility is beyond me.

David J.

Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation

From

James Lucas

Date:

27 May 2020, 16:19:31

Hi David,

Thanks for the response.  One possibly relevant thing I forgot to
mention.  The collation for the database is "en_US.UTF-8", which is
thus also the collation for the t column of ctest.

Per the documentation, it seems putting an implicit collation on the
operation should work.  Although the documentation is admittedly a
little vague in this respect.  I also found a mail thread in the list
where Peter Eisentraut recommended syntax exactly like this (collate
"C") to work around the inability to use pattern matching on
non-deterministic collation columns.  Unfortunately that thread
trailed out without a response if it actually worked.

Noticed something else a bit interesting.  Perhaps removing indexes
from the equation would also help:

drop index ctestnd_idx01, ctestnd_idx02, ctest_idx01, ctest_idx02;

explain select * from ctest where t like 'a%' collate "C";
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on ctest  (cost=0.00..1791.00 rows=1 width=10)
   Filter: (t ~~ 'a%'::text COLLATE "C")
COMMENT: Okay

explain select * from ctest where t like 'a%' collate mycollation;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on ctest  (cost=0.00..1791.00 rows=1 width=10)
   Filter: (t ~~ 'a%'::text COLLATE mycollation)
COMMENT: Wait, that doesn't seem right.

select * from ctest where t like 'a%' collate mycollation;
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: So in this case, specifying an explicit non-deterministic
collation with EXPLAIN, we get a plan.  But when we actually go to
execute, it fails.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR:  nondeterministic collations are not supported for LIKE
COMMENT: But in the inverse case, running explain on a column with a
non-deterministic collation, but an explicit deterministic collation,
we don't even get a plan with EXPLAIN.  That seems inconsistent.  Only
conclusion I can reach is that it's failing a check at an earlier
point in the process than in the other case.

Thanks,
James

On Wed, May 27, 2020 at 10:53 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Wed, May 27, 2020 at 8:23 AM James Lucas <jlucasdba@gmail.com> wrote:
>>
>>
>> create table ctestnd (id numeric, t text collate mycollation);
>>
>> create index ctestnd_idx02 on ctestnd (t collate "C");
>
>
>>
>> Test on ctestnd:
>> explain select * from ctestnd where t = 'aAa' collate "C";
>>                                   QUERY PLAN
>> ------------------------------------------------------------------------------
>>  Index Scan using ctestnd_idx02 on ctestnd  (cost=0.42..4.44 rows=1 width=10)
>>    Index Cond: (t = 'aAa'::text COLLATE "C")
>> COMMENT: Works as expected.
>
>
> Uses an index scan which is where the deterministic collation exists
>
>>
>>
>> explain select * from ctestnd where t like 'a%';
>> ERROR:  nondeterministic collations are not supported for LIKE
>> COMMENT: Fails as expected.
>>
>> explain select * from ctestnd where t like 'a%' collate "C";
>> ERROR:  nondeterministic collations are not supported for LIKE
>>
>
> Your schema is inherently unstable in this respect because the planner has to be allowed to choose a sequential scan
andas soon as it does it attempts to perform like comparisons with table data that is stored using a non-deterministic
collation.
>
> I don't know what kinds of promises we make about implicit collation manipulation here but absent such a
transformationthe sequential scan plan with LIKE generates an invalid plan choice.  That it doesn't go find the index
thathappens to have a workable collation for the query is unsurprising - whether that is even a possibility is beyond
me.
>
> David J.
>

Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From

Tom Lane

Date:

28 May 2020, 00:21:26

James Lucas <jlucasdba@gmail.com> writes:
> explain select * from ctestnd where t like 'a%' collate "C";
> ERROR:  nondeterministic collations are not supported for LIKE

Yeah.  I traced through this, and the place where it's failing is where
the planner tries to apply the LIKE operator to the stored MCV values
(to see how many of them pass the condition, which gives us a big clue
about the selectivity).  Unfortunately, per the comments in selfuncs.c,

 * For both oprrest and oprjoin functions, the operator's input collation OID
 * (if any) is passed using the standard fmgr mechanism, so that the estimator
 * function can fetch it with PG_GET_COLLATION().  Note, however, that all
 * statistics in pg_statistic are currently built using the relevant column's
 * collation.  Thus, in most cases where we are looking at statistics, we
 * should ignore the operator collation and use the stats entry's collation.
 * We expect that the error induced by doing this is usually not large enough
 * to justify complicating matters.  In any case, doing otherwise would yield
 * entirely garbage results for ordered stats data such as histograms.

mcv_selectivity is following this advice and applying LIKE with the
ctestnd.t column's declared collation ... and then the operator throws
an error.

The idea that using the "wrong" collation might actually cause an error
was not factored into this design, obviously.  I'm not sure offhand what
to do about it.  If we go over to using the query's collation then we
avoid that issue, but instead we have the problem noted in this comment
about the histogram sort order not matching what the operator expects.
(In the case of mcv_selectivity the sort order isn't really an issue,
but it is an issue for sibling functions such as
ineq_histogram_selectivity.)

This issue only dates back to commit 5e0928005; before that, we just
blindly passed DEFAULT_COLLATION_OID to operators being evaluated for
estimation purposes.  (I suppose if you made the database's default
collation nondeterministic, you could still get into trouble; but that
case may not be reachable right now.)  On the other hand, the actual
breakage is even newer, because nondeterministic collations weren't
added until 5e1963fb7, several months later.  Both of those are v12
cycle, so it's academic from a user's standpoint which one we blame;
but the upshot is that this case doesn't work.

Ideally, no operator would ever throw an error about unsupported
collations, but I suppose that day is far away.

I guess the path of least resistance is to change the selectivity
functions to use the query's collation; then, if you get an error
here you would have done so at runtime anyway.  The problem of
inconsistency with the histogram collation will be real for
ineq_histogram_selectivity; but we had a variant of that before,
in that always using DEFAULT_COLLATION_OID would give answers
that were wrong for a query using a different collation.

Peter, any other thoughts?

            regards, tom lane

Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation

From

James Lucas

Date:

28 May 2020, 18:04:06

Thanks Tom,

This is too much into the guts of the planner for me to contribute
much.  This does raise an interesting point that I had not considered
though - it sounds like column statistics depend on the default
collation on the column.  That could impact the plans chosen for
future queries, even if those queries are performed using a different
collation.  For most deterministic collations I expect that probably
doesn't make much of a difference, but for non-deterministic
collations it seems like the difference in stats could be significant.
N_distinct, in particular, seems like it might be very different for a
non-deterministic collation.

I tried setting up a pathological test case for this, and it seems
like at least currently, even with a non-deterministic collation
statistics still count values as distinct, even if the default
collation would consider them equivalent.  Not sure if that's as
intended or not?

create table stest (id numeric, t text);
create table stestnd (id numeric, t text collate mycollation);
insert into stest select generate_series(1,50000),'aaa';
insert into stest select generate_series(50001,100000),'aAa';
insert into stest select generate_series(100001,150000),'bbb';
insert into stest select generate_series(150001,200000),'bBb';
insert into stest select generate_series(200001,250000),'ccc';
insert into stest select generate_series(250001,300000),'cCc';
insert into stestnd select * from stest;
analyze stest, stestnd;

select schemaname, tablename, attname, n_distinct, most_common_vals
from pg_stats where attname='t' and tablename like 'stest%' order by
tablename;
 schemaname | tablename | attname | n_distinct |     most_common_vals
------------+-----------+---------+------------+---------------------------
 public     | stest     | t       |          6 | {aAa,cCc,bbb,aaa,ccc,bBb}
 public     | stestnd   | t       |          6 | {bBb,ccc,bbb,aAa,cCc,aaa}


Actually it turns out the DISTINCT clause doesn't either:

select count(*) from (select distinct t from stest) s;
 count
-------
     6
select count(*) from (select distinct t from stestnd) s;
 count
-------
     6

Sorry - don't want to derail the question at hand too much.  It seems
like it might be relevant if the discussion is around stats and
collation handling.

Thanks,
James

On Wed, May 27, 2020 at 7:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Lucas <jlucasdba@gmail.com> writes:
> > explain select * from ctestnd where t like 'a%' collate "C";
> > ERROR:  nondeterministic collations are not supported for LIKE
>
> Yeah.  I traced through this, and the place where it's failing is where
> the planner tries to apply the LIKE operator to the stored MCV values
> (to see how many of them pass the condition, which gives us a big clue
> about the selectivity).  Unfortunately, per the comments in selfuncs.c,
>
>  * For both oprrest and oprjoin functions, the operator's input collation OID
>  * (if any) is passed using the standard fmgr mechanism, so that the estimator
>  * function can fetch it with PG_GET_COLLATION().  Note, however, that all
>  * statistics in pg_statistic are currently built using the relevant column's
>  * collation.  Thus, in most cases where we are looking at statistics, we
>  * should ignore the operator collation and use the stats entry's collation.
>  * We expect that the error induced by doing this is usually not large enough
>  * to justify complicating matters.  In any case, doing otherwise would yield
>  * entirely garbage results for ordered stats data such as histograms.
>
> mcv_selectivity is following this advice and applying LIKE with the
> ctestnd.t column's declared collation ... and then the operator throws
> an error.
>
> The idea that using the "wrong" collation might actually cause an error
> was not factored into this design, obviously.  I'm not sure offhand what
> to do about it.  If we go over to using the query's collation then we
> avoid that issue, but instead we have the problem noted in this comment
> about the histogram sort order not matching what the operator expects.
> (In the case of mcv_selectivity the sort order isn't really an issue,
> but it is an issue for sibling functions such as
> ineq_histogram_selectivity.)
>
> This issue only dates back to commit 5e0928005; before that, we just
> blindly passed DEFAULT_COLLATION_OID to operators being evaluated for
> estimation purposes.  (I suppose if you made the database's default
> collation nondeterministic, you could still get into trouble; but that
> case may not be reachable right now.)  On the other hand, the actual
> breakage is even newer, because nondeterministic collations weren't
> added until 5e1963fb7, several months later.  Both of those are v12
> cycle, so it's academic from a user's standpoint which one we blame;
> but the upshot is that this case doesn't work.
>
> Ideally, no operator would ever throw an error about unsupported
> collations, but I suppose that day is far away.
>
> I guess the path of least resistance is to change the selectivity
> functions to use the query's collation; then, if you get an error
> here you would have done so at runtime anyway.  The problem of
> inconsistency with the histogram collation will be real for
> ineq_histogram_selectivity; but we had a variant of that before,
> in that always using DEFAULT_COLLATION_OID would give answers
> that were wrong for a query using a different collation.
>
> Peter, any other thoughts?
>
>                         regards, tom lane

Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From

Tom Lane

Date:

28 May 2020, 18:29:28

James Lucas <jlucasdba@gmail.com> writes:
> I tried setting up a pathological test case for this, and it seems
> like at least currently, even with a non-deterministic collation
> statistics still count values as distinct, even if the default
> collation would consider them equivalent.  Not sure if that's as
> intended or not?

I experimented with this, and what I'm seeing is that ucol_strcollUTF8()
reports that 'aaa' is different from 'aAa'.  So the behavior on the
Postgres side is as-expected.  I suspect that the 'en-US-ks-level2'
ICU locale doesn't act as you think it does.  (That is, just saying
that a collation is nondeterministic doesn't make it so; it only forces
Postgres through slower code paths that allow for the possibility of
bitwise-unequal strings being reported as equal by ICU.)  Not knowing
anything about ICU, I can't say more than that.

[ Tested on libicu-60.3-2.el8_1 ]

            regards, tom lane

Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation

From

"Daniel Verite"

Date:

28 May 2020, 18:48:38

    Tom Lane wrote:

> I suspect that the 'en-US-ks-level2' ICU locale doesn't act as you
> think it does.

Indeed, because the syntax is tricky. The OP wants 'en-US-u-ks-level2'.
With 'en-US-ks-level2', the ks-level2 component is ignored and you
get a tertiary colstrength.

Or use 'en-US@colStrength=secondary' which is possibly more
readable and works with older versions of ICU.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Explicit deterministic COLLATE fails with pattern matchingoperations on column with non-deterministic collation

From

James Lucas

Date:

28 May 2020, 20:42:07

Apologies if anyone gets this twice.  I got a rejected mail notice
back the first time I sent.

You are correct.  I was playing around with collation naming sometime
back and when I started looking at this, I just used one I had left in
the database assuming it was correct.  That's my bad.

I dropped the tables and redefined the collation as
create collation mycollation (provider = icu, locale =
'en-US-u-ks-level2', deterministic = false);

Now the results are more what I expected.

select schemaname, tablename, attname, n_distinct, most_common_vals
from pg_stats where attname='t' and tablename like 'stest%' order by
tablename;
 schemaname | tablename | attname | n_distinct |     most_common_vals
------------+-----------+---------+------------+---------------------------
 public     | stest     | t       |          6 | {aaa,cCc,bBb,bbb,ccc,aAa}
 public     | stestnd   | t       |          3 | {ccc,bbb,aaa}

So that is something to be aware of - the collation defined on the
column can impact stats values, which could in turn impact plans
chosen for queries that use alternative collations.

Sorry for the distraction.  That still leaves us with the original
issue regarding LIKE and COLLATE.

Thanks,
James

On Thu, May 28, 2020 at 1:48 PM Daniel Verite <daniel@manitou-mail.org> wrote:
>
>         Tom Lane wrote:
>
> > I suspect that the 'en-US-ks-level2' ICU locale doesn't act as you
> > think it does.
>
> Indeed, because the syntax is tricky. The OP wants 'en-US-u-ks-level2'.
> With 'en-US-ks-level2', the ks-level2 component is ignored and you
> get a tertiary colstrength.
>
> Or use 'en-US@colStrength=secondary' which is possibly more
> readable and works with older versions of ICU.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite

Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From

Tom Lane

Date:

28 May 2020, 20:48:00

James Lucas <jlucasdba@gmail.com> writes:
> So that is something to be aware of - the collation defined on the
> column can impact stats values, which could in turn impact plans
> chosen for queries that use alternative collations.

Yeah.  At some point we might try to collect stats with respect to
multiple collations, but that's a long way off probably.

(I have suggested that CREATE STATISTICS could be extended to
control this type of thing, but I don't think anyone's worked
on making it happen.)

            regards, tom lane

Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From

Tom Lane

Date:

02 June 2020, 22:53:48

I wrote:
> I guess the path of least resistance is to change the selectivity
> functions to use the query's collation; then, if you get an error
> here you would have done so at runtime anyway.  The problem of
> inconsistency with the histogram collation will be real for
> ineq_histogram_selectivity; but we had a variant of that before,
> in that always using DEFAULT_COLLATION_OID would give answers
> that were wrong for a query using a different collation.

I worked on this for awhile and came up with the attached patchset.

0001 does about the minimum required to avoid this failure, by
passing the query's collation not stacoll to operators and selectivity
functions invoked during selectivity estimation.  Unfortunately, it
doesn't seem like we could sanely back-patch this, because it requires
adding parameters to several globally-visible functions.  The odds
that some external code is calling those functions seem too high to
risk an ABI break.  So, while I'd like to squeeze this into v13,
we still need to think about what to do for v12.

0002 addresses the mentioned problem with ineq_histogram_selectivity
by having that function actually verify that the query operator and
collation match what the pg_statistic histogram was generated with.
If they don't match, all is not lost.  What we can do is just
sequentially apply the query's operator and comparison constant to
each histogram entry, and take the fraction of matches as our
selectivity estimate.  This is more or less the same insight we have
used in generic_restriction_selectivity: the histogram is a pretty
decent sample of the column, even if its ordering is not quite what
you want.

0002 also deletes a hack I had put in get_attstatsslot() to insert a
dummy value into sslot->stacoll.  That hack isn't necessary any longer
(because indeed we aren't using sslot->stacoll's value anywhere as of
0001), and it breaks the verification check that 0002 wants to add to
ineq_histogram_selectivity, which depends on stacoll being truthful.
I also adjusted get_variable_range() to deal with collations more
honestly.

When I went to test 0002, I found out that it broke some test cases
in privileges.sql, and the reason was rather interesting.  What those
cases are relying on is getting a highly accurate selectivity
estimate for a user-defined operator, for which the only thing the
planner knows for sure is that it uses scalarltsel as the restriction
estimator.  Despite this lack of knowledge, the existing code just
blithely uses the histogram as though it is *precisely* applicable
to the user-defined operator.  (Which it is, since that operator is
just a wrapper around regular "<" ... but the system has no business
assuming that.)  So with the patch, the case exercises the new code
path that just counts matches, and that gives us only
1/default_statistics_target resolution in the selectivity estimate;
which is not enough to get the expected plan to be selected.  I worked
around this for the moment by cranking up default_statistics_target
while running the ANALYZE in that test script, but I wonder if we
should instead tweak those test cases to be more robust.

I think the combination of 0001+0002 really moves the goalposts a
long way in terms of having honest stats estimation for non-default
collations, so I'd like to sneak it into v13.  As for v12, about
the only alternatives I can think of are:

1. Do nothing, reasoning that if nobody noticed for a year, this
situation is enough of a corner case that we can leave it unfixed.
Obviously that's pretty unsatisfying.

2. Change all the stats functions to pass DEFAULT_COLLATION_OID
when invoking operator functions.  This is not too attractive
either because it essentially reverts 5e0928005; in fact, to avoid
breaking things completely we'd likely have to revert the part
of that commit that taught ANALYZE to collect stats using column
collations instead of DEFAULT_COLLATION_OID.  Then we get into
questions like what about 6b0faf723 --- it's going to be a mess.

3. Hack things up so that the core code renames all these exposed
functions to, say, ineq_histogram_selectivity_ext() and so on,
allowing the additional arguments to exist, but the old names would
still be there as ABI compatibility wrappers.  This might produce
slightly funny results for external code calling the wrappers, since
the wrappers would have to assume DEFAULT_COLLATION_OID, but it'd
avoid an ABI break at least.  I don't want to propagate such a thing
into HEAD, so this would leave us with unsightly differences between
v12 and earlier/later branches -- but there aren't *that* many places
involved.  (I'd envision this approach as back-porting 0001 but not
0002.  For one reason, there's noplace for a wrapper to get the
additional operator OID needed for ineq_histogram_selectivity_ext.
For another, the results for the privilege test suggest that 0002
might have surprising effects on user-defined operators, so back
patching it might draw more complaints.)

Alternatives #2 and #3 would result in (different) changes in the
selectivity estimates v12 produces when considering columns with
non-default collations and/or queries using collations that don't
match the relevant columns.  So that might be an argument for
doing nothing in v12; people tend not to like it when minor
releases cause unexpected plan changes.  Also, #2 is probably
strictly worse than #3 on this score, since it'd move such
estimates away from reality not towards it.

Thoughts?

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 4ac2ed5e54..778dbf1e98 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -582,7 +582,7 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     double        selec;

     /* Use generic restriction selectivity logic, with default 0.001. */
-    selec = generic_restriction_selectivity(root, operator,
+    selec = generic_restriction_selectivity(root, operator, InvalidOid,
                                             args, varRelid,
                                             0.001);

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 286e000d4e..ae5c8f084e 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -92,6 +92,7 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt,
 static Selectivity prefix_selectivity(PlannerInfo *root,
                                       VariableStatData *vardata,
                                       Oid eqopr, Oid ltopr, Oid geopr,
+                                      Oid collation,
                                       Const *prefixcon);
 static Selectivity like_selectivity(const char *patt, int pattlen,
                                     bool case_insensitive);
@@ -534,12 +535,6 @@ patternsel_common(PlannerInfo *root,
      * something binary-compatible but different.)    We can use it to identify
      * the comparison operators and the required type of the comparison
      * constant, much as in match_pattern_prefix().
-     *
-     * NOTE: this logic does not consider collations.  Ideally we'd force use
-     * of "C" collation, but since ANALYZE only generates statistics for the
-     * column's specified collation, we have little choice but to use those.
-     * But our results are so approximate anyway that it probably hardly
-     * matters.
      */
     vartype = vardata.vartype;

@@ -622,7 +617,7 @@ patternsel_common(PlannerInfo *root,
         /*
          * Pattern specifies an exact match, so estimate as for '='
          */
-        result = var_eq_const(&vardata, eqopr, prefix->constvalue,
+        result = var_eq_const(&vardata, eqopr, collation, prefix->constvalue,
                               false, true, false);
     }
     else
@@ -654,7 +649,8 @@ patternsel_common(PlannerInfo *root,
             opfuncid = get_opcode(oprid);
         fmgr_info(opfuncid, &opproc);

-        selec = histogram_selectivity(&vardata, &opproc, constval, true,
+        selec = histogram_selectivity(&vardata, &opproc, collation,
+                                      constval, true,
                                       10, 1, &hist_size);

         /* If not at least 100 entries, use the heuristic method */
@@ -666,6 +662,7 @@ patternsel_common(PlannerInfo *root,
             if (pstatus == Pattern_Prefix_Partial)
                 prefixsel = prefix_selectivity(root, &vardata,
                                                eqopr, ltopr, geopr,
+                                               collation,
                                                prefix);
             else
                 prefixsel = 1.0;
@@ -698,7 +695,8 @@ patternsel_common(PlannerInfo *root,
          * directly to the result selectivity.  Also add up the total fraction
          * represented by MCV entries.
          */
-        mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true,
+        mcv_selec = mcv_selectivity(&vardata, &opproc, collation,
+                                    constval, true,
                                     &sumcommon);

         /*
@@ -1196,7 +1194,7 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
  * population represented by the histogram --- the caller must fold this
  * together with info about MCVs and NULLs.
  *
- * We use the specified btree comparison operators to do the estimation.
+ * We use the given comparison operators and collation to do the estimation.
  * The given variable and Const must be of the associated datatype(s).
  *
  * XXX Note: we make use of the upper bound to estimate operator selectivity
@@ -1207,11 +1205,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
 static Selectivity
 prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
                    Oid eqopr, Oid ltopr, Oid geopr,
+                   Oid collation,
                    Const *prefixcon)
 {
     Selectivity prefixsel;
     FmgrInfo    opproc;
-    AttStatsSlot sslot;
     Const       *greaterstrcon;
     Selectivity eq_sel;

@@ -1220,6 +1218,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,

     prefixsel = ineq_histogram_selectivity(root, vardata,
                                            &opproc, true, true,
+                                           collation,
                                            prefixcon->constvalue,
                                            prefixcon->consttype);

@@ -1229,27 +1228,18 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         return DEFAULT_MATCH_SEL;
     }

-    /*-------
-     * If we can create a string larger than the prefix, say
-     * "x < greaterstr".  We try to generate the string referencing the
-     * collation of the var's statistics, but if that's not available,
-     * use DEFAULT_COLLATION_OID.
-     *-------
+    /*
+     * If we can create a string larger than the prefix, say "x < greaterstr".
      */
-    if (HeapTupleIsValid(vardata->statsTuple) &&
-        get_attstatsslot(&sslot, vardata->statsTuple,
-                         STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
-         /* sslot.stacoll is set up */ ;
-    else
-        sslot.stacoll = DEFAULT_COLLATION_OID;
     fmgr_info(get_opcode(ltopr), &opproc);
-    greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll);
+    greaterstrcon = make_greater_string(prefixcon, &opproc, collation);
     if (greaterstrcon)
     {
         Selectivity topsel;

         topsel = ineq_histogram_selectivity(root, vardata,
                                             &opproc, false, false,
+                                            collation,
                                             greaterstrcon->constvalue,
                                             greaterstrcon->consttype);

@@ -1278,7 +1268,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
      * probably off the end of the histogram, and thus we probably got a very
      * small estimate from the >= condition; so we still need to clamp.
      */
-    eq_sel = var_eq_const(vardata, eqopr, prefixcon->constvalue,
+    eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
                           false, true, false);

     prefixsel = Max(prefixsel, eq_sel);
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 863efd3d76..955e0ee87f 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -137,7 +137,8 @@ networksel(PG_FUNCTION_ARGS)
      * by MCV entries.
      */
     fmgr_info(get_opcode(operator), &proc);
-    mcv_selec = mcv_selectivity(&vardata, &proc, constvalue, varonleft,
+    mcv_selec = mcv_selectivity(&vardata, &proc, InvalidOid,
+                                constvalue, varonleft,
                                 &sumcommon);

     /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cfb05682bc..2332277307 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -88,11 +88,7 @@
  * (if any) is passed using the standard fmgr mechanism, so that the estimator
  * function can fetch it with PG_GET_COLLATION().  Note, however, that all
  * statistics in pg_statistic are currently built using the relevant column's
- * collation.  Thus, in most cases where we are looking at statistics, we
- * should ignore the operator collation and use the stats entry's collation.
- * We expect that the error induced by doing this is usually not large enough
- * to justify complicating matters.  In any case, doing otherwise would yield
- * entirely garbage results for ordered stats data such as histograms.
+ * collation.
  *----------
  */

@@ -149,14 +145,14 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid,
+static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
                               AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
                               bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid,
+static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
                              bool isdefault1, bool isdefault2,
@@ -194,10 +190,11 @@ static double convert_timevalue_to_scalar(Datum value, Oid typid,
 static void examine_simple_variable(PlannerInfo *root, Var *var,
                                     VariableStatData *vardata);
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                               Oid sortop, Datum *min, Datum *max);
+                               Oid sortop, Oid collation,
+                               Datum *min, Datum *max);
 static bool get_actual_variable_range(PlannerInfo *root,
                                       VariableStatData *vardata,
-                                      Oid sortop,
+                                      Oid sortop, Oid collation,
                                       Datum *min, Datum *max);
 static bool get_actual_variable_endpoint(Relation heapRel,
                                          Relation indexRel,
@@ -235,6 +232,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -268,12 +266,12 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
      * in the query.)
      */
     if (IsA(other, Const))
-        selec = var_eq_const(&vardata, operator,
+        selec = var_eq_const(&vardata, operator, collation,
                              ((Const *) other)->constvalue,
                              ((Const *) other)->constisnull,
                              varonleft, negate);
     else
-        selec = var_eq_non_const(&vardata, operator, other,
+        selec = var_eq_non_const(&vardata, operator, collation, other,
                                  varonleft, negate);

     ReleaseVariableStats(vardata);
@@ -287,7 +285,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
  * This is exported so that some other estimation functions can use it.
  */
 double
-var_eq_const(VariableStatData *vardata, Oid operator,
+var_eq_const(VariableStatData *vardata, Oid operator, Oid collation,
              Datum constval, bool constisnull,
              bool varonleft, bool negate)
 {
@@ -356,7 +354,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
              * eqproc returns NULL, though really equality functions should
              * never do that.
              */
-            InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot.stacoll,
+            InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                      NULL, NULL);
             fcinfo->args[0].isnull = false;
             fcinfo->args[1].isnull = false;
@@ -458,7 +456,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
  * This is exported so that some other estimation functions can use it.
  */
 double
-var_eq_non_const(VariableStatData *vardata, Oid operator,
+var_eq_non_const(VariableStatData *vardata, Oid operator, Oid collation,
                  Node *other,
                  bool varonleft, bool negate)
 {
@@ -573,6 +571,7 @@ neqsel(PG_FUNCTION_ARGS)
  */
 static double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
+              Oid collation,
               VariableStatData *vardata, Datum constval, Oid consttype)
 {
     Form_pg_statistic stats;
@@ -672,7 +671,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      * to the result selectivity.  Also add up the total fraction represented
      * by MCV entries.
      */
-    mcv_selec = mcv_selectivity(vardata, &opproc, constval, true,
+    mcv_selec = mcv_selectivity(vardata, &opproc, collation, constval, true,
                                 &sumcommon);

     /*
@@ -681,6 +680,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      */
     hist_selec = ineq_histogram_selectivity(root, vardata,
                                             &opproc, isgt, iseq,
+                                            collation,
                                             constval, consttype);

     /*
@@ -722,7 +722,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
  * if there is no MCV list.
  */
 double
-mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, Oid collation,
                 Datum constval, bool varonleft,
                 double *sumcommonp)
 {
@@ -749,7 +749,7 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
          * operators that can return NULL.  A small side benefit is to not
          * need to re-initialize the fcinfo struct from scratch each time.
          */
-        InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll,
+        InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -813,7 +813,8 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
  * prudent to clamp the result range, ie, disbelieve exact 0 or 1 outputs.
  */
 double
-histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+histogram_selectivity(VariableStatData *vardata,
+                      FmgrInfo *opproc, Oid collation,
                       Datum constval, bool varonleft,
                       int min_hist_size, int n_skip,
                       int *hist_size)
@@ -846,7 +847,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
              * is to not need to re-initialize the fcinfo struct from scratch
              * each time.
              */
-            InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll,
+            InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
                                      NULL, NULL);
             fcinfo->args[0].isnull = false;
             fcinfo->args[1].isnull = false;
@@ -903,7 +904,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
  * Otherwise, fall back to the default selectivity provided by the caller.
  */
 double
-generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
+generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
                                 List *args, int varRelid,
                                 double default_selectivity)
 {
@@ -946,7 +947,8 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
         /*
          * Calculate the selectivity for the column's most common values.
          */
-        mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft,
+        mcvsel = mcv_selectivity(&vardata, &opproc, collation,
+                                 constval, varonleft,
                                  &mcvsum);

         /*
@@ -955,7 +957,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
          * population.  Otherwise use the default selectivity for the non-MCV
          * population.
          */
-        selec = histogram_selectivity(&vardata, &opproc,
+        selec = histogram_selectivity(&vardata, &opproc, collation,
                                       constval, varonleft,
                                       10, 1, &hist_size);
         if (selec < 0)
@@ -1029,6 +1031,7 @@ double
 ineq_histogram_selectivity(PlannerInfo *root,
                            VariableStatData *vardata,
                            FmgrInfo *opproc, bool isgt, bool iseq,
+                           Oid collation,
                            Datum constval, Oid consttype)
 {
     double        hist_selec;
@@ -1042,9 +1045,11 @@ ineq_histogram_selectivity(PlannerInfo *root,
      * column type.  However, to make that work we will need to figure out
      * which staop to search for --- it's not necessarily the one we have at
      * hand!  (For example, we might have a '<=' operator rather than the '<'
-     * operator that will appear in staop.)  For now, assume that whatever
-     * appears in pg_statistic is sorted the same way our operator sorts, or
-     * the reverse way if isgt is true.
+     * operator that will appear in staop.)  The collation might not agree
+     * either.  For now, just assume that whatever appears in pg_statistic is
+     * sorted the same way our operator sorts, or the reverse way if isgt is
+     * true.  This could result in a bogus estimate, but it still seems better
+     * than falling back to the default estimate.
      */
     if (HeapTupleIsValid(vardata->statsTuple) &&
         statistic_proc_security_check(vardata, opproc->fn_oid) &&
@@ -1090,6 +1095,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                 have_end = get_actual_variable_range(root,
                                                      vardata,
                                                      sslot.staop,
+                                                     collation,
                                                      &sslot.values[0],
                                                      &sslot.values[1]);

@@ -1107,17 +1113,19 @@ ineq_histogram_selectivity(PlannerInfo *root,
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          &sslot.values[0],
                                                          NULL);
                 else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          NULL,
                                                          &sslot.values[probe]);

                 ltcmp = DatumGetBool(FunctionCall2Coll(opproc,
-                                                       sslot.stacoll,
+                                                       collation,
                                                        sslot.values[probe],
                                                        constval));
                 if (isgt)
@@ -1202,7 +1210,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                  * values to a uniform comparison scale, and do a linear
                  * interpolation within this bin.
                  */
-                if (convert_to_scalar(constval, consttype, sslot.stacoll,
+                if (convert_to_scalar(constval, consttype, collation,
                                       &val,
                                       sslot.values[i - 1], sslot.values[i],
                                       vardata->vartype,
@@ -1342,6 +1350,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -1394,7 +1403,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     }

     /* The rest of the work is done by scalarineqsel(). */
-    selec = scalarineqsel(root, operator, isgt, iseq,
+    selec = scalarineqsel(root, operator, isgt, iseq, collation,
                           &vardata, constval, consttype);

     ReleaseVariableStats(vardata);
@@ -1459,7 +1468,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
          * A boolean variable V is equivalent to the clause V = 't', so we
          * compute the selectivity as if that is what we have.
          */
-        selec = var_eq_const(&vardata, BooleanEqualOperator,
+        selec = var_eq_const(&vardata, BooleanEqualOperator, InvalidOid,
                              BoolGetDatum(true), false, true, false);
     }
     else
@@ -2185,6 +2194,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     JoinType    jointype = (JoinType) PG_GETARG_INT16(3);
 #endif
     SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+    Oid            collation = PG_GET_COLLATION();
     double        selec;
     double        selec_inner;
     VariableStatData vardata1;
@@ -2235,7 +2245,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     }

     /* We need to compute the inner-join selectivity in all cases */
-    selec_inner = eqjoinsel_inner(opfuncoid,
+    selec_inner = eqjoinsel_inner(opfuncoid, collation,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
@@ -2262,7 +2272,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
             inner_rel = find_join_input_rel(root, sjinfo->min_righthand);

             if (!join_is_reversed)
-                selec = eqjoinsel_semi(opfuncoid,
+                selec = eqjoinsel_semi(opfuncoid, collation,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
                                        isdefault1, isdefault2,
@@ -2275,7 +2285,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                 Oid            commop = get_commutator(operator);
                 Oid            commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;

-                selec = eqjoinsel_semi(commopfuncoid,
+                selec = eqjoinsel_semi(commopfuncoid, collation,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
                                        isdefault2, isdefault1,
@@ -2323,7 +2333,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid,
+eqjoinsel_inner(Oid opfuncoid, Oid collation,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
@@ -2373,7 +2383,7 @@ eqjoinsel_inner(Oid opfuncoid,
          * returns NULL, though really equality functions should never do
          * that.
          */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll,
+        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -2520,7 +2530,7 @@ eqjoinsel_inner(Oid opfuncoid,
  * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid,
+eqjoinsel_semi(Oid opfuncoid, Oid collation,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
                bool isdefault1, bool isdefault2,
@@ -2603,7 +2613,7 @@ eqjoinsel_semi(Oid opfuncoid,
          * returns NULL, though really equality functions should never do
          * that.
          */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll,
+        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -2851,6 +2861,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     Oid            op_lefttype;
     Oid            op_righttype;
     Oid            opno,
+                collation,
                 lsortop,
                 rsortop,
                 lstatop,
@@ -2875,6 +2886,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     if (!is_opclause(clause))
         return;                    /* shouldn't happen */
     opno = ((OpExpr *) clause)->opno;
+    collation = ((OpExpr *) clause)->inputcollid;
     left = get_leftop((Expr *) clause);
     right = get_rightop((Expr *) clause);
     if (!right)
@@ -3008,20 +3020,20 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     /* Try to get ranges of both inputs */
     if (!isgt)
     {
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmin, &leftmax))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmin, &rightmax))
             goto fail;            /* no range available from stats */
     }
     else
     {
         /* need to swap the max and min */
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmax, &leftmin))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmax, &rightmin))
             goto fail;            /* no range available from stats */
     }
@@ -3031,13 +3043,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * fraction that's <= the right-side maximum value.  But only believe
      * non-default estimates, else stick with our 1.0.
      */
-    selec = scalarineqsel(root, leop, isgt, true, &leftvar,
+    selec = scalarineqsel(root, leop, isgt, true, collation, &leftvar,
                           rightmax, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftend = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revleop, isgt, true, &rightvar,
+    selec = scalarineqsel(root, revleop, isgt, true, collation, &rightvar,
                           leftmax, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightend = selec;
@@ -3061,13 +3073,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * minimum value.  But only believe non-default estimates, else stick with
      * our own default.
      */
-    selec = scalarineqsel(root, ltop, isgt, false, &leftvar,
+    selec = scalarineqsel(root, ltop, isgt, false, collation, &leftvar,
                           rightmin, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftstart = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revltop, isgt, false, &rightvar,
+    selec = scalarineqsel(root, revltop, isgt, false, collation, &rightvar,
                           leftmin, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightstart = selec;
@@ -3147,10 +3159,11 @@ matchingsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     double        selec;

     /* Use generic restriction selectivity logic. */
-    selec = generic_restriction_selectivity(root, operator,
+    selec = generic_restriction_selectivity(root, operator, collation,
                                             args, varRelid,
                                             DEFAULT_MATCHING_SEL);

@@ -5337,9 +5350,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
  *
  * sortop is the "<" comparison operator to use.  This should generally
  * be "<" not ">", as only the former is likely to be found in pg_statistic.
+ * The collation must be specified too.
  */
 static bool
-get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
+get_variable_range(PlannerInfo *root, VariableStatData *vardata,
+                   Oid sortop, Oid collation,
                    Datum *min, Datum *max)
 {
     Datum        tmin = 0;
@@ -5359,7 +5374,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      * before enabling this.
      */
 #ifdef NOT_USED
-    if (get_actual_variable_range(root, vardata, sortop, min, max))
+    if (get_actual_variable_range(root, vardata, sortop, collation, min, max))
         return true;
 #endif

@@ -5387,7 +5402,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      *
      * If there is a histogram that is sorted with some other operator than
      * the one we want, fail --- this suggests that there is data we can't
-     * use.
+     * use.  XXX consider collation too.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_HISTOGRAM, sortop,
@@ -5434,14 +5449,14 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
                 continue;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                sslot.values[i], tmin)))
             {
                 tmin = sslot.values[i];
                 tmin_is_mcv = true;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                tmax, sslot.values[i])))
             {
                 tmax = sslot.values[i];
@@ -5471,10 +5486,11 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
  *        If no data available, return false.
  *
  * sortop is the "<" comparison operator to use.
+ * collation is the required collation.
  */
 static bool
 get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                          Oid sortop,
+                          Oid sortop, Oid collation,
                           Datum *min, Datum *max)
 {
     bool        have_data = false;
@@ -5514,9 +5530,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
             continue;

         /*
-         * The first index column must match the desired variable and sort
-         * operator --- but we can use a descending-order index.
+         * The first index column must match the desired variable, sortop, and
+         * collation --- but we can use a descending-order index.
          */
+        if (collation != index->indexcollations[0])
+            continue;            /* test first 'cause it's cheapest */
         if (!match_index_to_operand(vardata->var, 0, index))
             continue;
         switch (get_op_opfamily_strategy(sortop, index->sortopfamily[0]))
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 9690b4e486..15d2289024 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -144,24 +144,30 @@ extern void get_join_variables(PlannerInfo *root, List *args,
                                bool *join_is_reversed);
 extern double get_variable_numdistinct(VariableStatData *vardata,
                                        bool *isdefault);
-extern double mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+extern double mcv_selectivity(VariableStatData *vardata,
+                              FmgrInfo *opproc, Oid collation,
                               Datum constval, bool varonleft,
                               double *sumcommonp);
-extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+extern double histogram_selectivity(VariableStatData *vardata,
+                                    FmgrInfo *opproc, Oid collation,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
-extern double generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
+extern double generic_restriction_selectivity(PlannerInfo *root,
+                                              Oid oproid, Oid collation,
                                               List *args, int varRelid,
                                               double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,
+                                         Oid collation,
                                          Datum constval, Oid consttype);
-extern double var_eq_const(VariableStatData *vardata, Oid oproid,
+extern double var_eq_const(VariableStatData *vardata,
+                           Oid oproid, Oid collation,
                            Datum constval, bool constisnull,
                            bool varonleft, bool negate);
-extern double var_eq_non_const(VariableStatData *vardata, Oid oproid,
+extern double var_eq_non_const(VariableStatData *vardata,
+                               Oid oproid, Oid collation,
                                Node *other,
                                bool varonleft, bool negate);

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index ae5c8f084e..bcfbaa1c3d 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1217,7 +1217,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
     fmgr_info(get_opcode(geopr), &opproc);

     prefixsel = ineq_histogram_selectivity(root, vardata,
-                                           &opproc, true, true,
+                                           geopr, &opproc, true, true,
                                            collation,
                                            prefixcon->constvalue,
                                            prefixcon->consttype);
@@ -1238,7 +1238,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         Selectivity topsel;

         topsel = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, false, false,
+                                            ltopr, &opproc, false, false,
                                             collation,
                                             greaterstrcon->constvalue,
                                             greaterstrcon->consttype);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 2332277307..208744cd3a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -192,6 +192,10 @@ static void examine_simple_variable(PlannerInfo *root, Var *var,
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                                Oid sortop, Oid collation,
                                Datum *min, Datum *max);
+static void get_stats_slot_range(AttStatsSlot *sslot,
+                                 Oid opfuncoid, FmgrInfo *opproc,
+                                 Oid collation, int16 typLen, bool typByVal,
+                                 Datum *min, Datum *max, bool *p_have_data);
 static bool get_actual_variable_range(PlannerInfo *root,
                                       VariableStatData *vardata,
                                       Oid sortop, Oid collation,
@@ -679,7 +683,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      * compute the resulting contribution to selectivity.
      */
     hist_selec = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, isgt, iseq,
+                                            operator, &opproc, isgt, iseq,
                                             collation,
                                             constval, consttype);

@@ -1019,6 +1023,9 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
  * satisfies the inequality condition, ie, VAR < (or <=, >, >=) CONST.
  * The isgt and iseq flags distinguish which of the four cases apply.
  *
+ * While opproc could be looked up from the operator OID, common callers
+ * also need to call it separately, so we make the caller pass both.
+ *
  * Returns -1 if there is no histogram (valid results will always be >= 0).
  *
  * Note that the result disregards both the most-common-values (if any) and
@@ -1030,7 +1037,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
 double
 ineq_histogram_selectivity(PlannerInfo *root,
                            VariableStatData *vardata,
-                           FmgrInfo *opproc, bool isgt, bool iseq,
+                           Oid opoid, FmgrInfo *opproc, bool isgt, bool iseq,
                            Oid collation,
                            Datum constval, Oid consttype)
 {
@@ -1057,7 +1064,9 @@ ineq_histogram_selectivity(PlannerInfo *root,
                          STATISTIC_KIND_HISTOGRAM, InvalidOid,
                          ATTSTATSSLOT_VALUES))
     {
-        if (sslot.nvalues > 1)
+        if (sslot.nvalues > 1 &&
+            sslot.stacoll == collation &&
+            comparison_ops_are_compatible(sslot.staop, opoid))
         {
             /*
              * Use binary search to find the desired location, namely the
@@ -1332,6 +1341,49 @@ ineq_histogram_selectivity(PlannerInfo *root,
                     hist_selec = 1.0 - cutoff;
             }
         }
+        else if (sslot.nvalues > 1)
+        {
+            /*
+             * If we get here, we have a histogram but it's not sorted the way
+             * we want.  Do a brute-force search to see how many of the
+             * entries satisfy the comparison condition, and take that
+             * fraction as our estimate.  (This is identical to the inner loop
+             * of histogram_selectivity; maybe share code?)
+             */
+            LOCAL_FCINFO(fcinfo, 2);
+            int            nmatch = 0;
+
+            InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
+                                     NULL, NULL);
+            fcinfo->args[0].isnull = false;
+            fcinfo->args[1].isnull = false;
+            fcinfo->args[1].value = constval;
+            for (int i = 0; i < sslot.nvalues; i++)
+            {
+                Datum        fresult;
+
+                fcinfo->args[0].value = sslot.values[i];
+                fcinfo->isnull = false;
+                fresult = FunctionCallInvoke(fcinfo);
+                if (!fcinfo->isnull && DatumGetBool(fresult))
+                    nmatch++;
+            }
+            hist_selec = ((double) nmatch) / ((double) sslot.nvalues);
+
+            /*
+             * As above, clamp to a hundredth of the histogram resolution.
+             * This case is surely even less trustworthy than the normal one,
+             * so we shouldn't believe exact 0 or 1 selectivity.
+             */
+            {
+                double        cutoff = 0.01 / (double) (sslot.nvalues - 1);
+
+                if (hist_selec < cutoff)
+                    hist_selec = cutoff;
+                else if (hist_selec > 1.0 - cutoff)
+                    hist_selec = 1.0 - cutoff;
+            }
+        }

         free_attstatsslot(&sslot);
     }
@@ -5363,8 +5415,8 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     int16        typLen;
     bool        typByVal;
     Oid            opfuncoid;
+    FmgrInfo    opproc;
     AttStatsSlot sslot;
-    int            i;

     /*
      * XXX It's very tempting to try to use the actual column min and max, if
@@ -5395,20 +5447,19 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                                        (opfuncoid = get_opcode(sortop))))
         return false;

+    opproc.fn_oid = InvalidOid; /* mark this as not looked up yet */
+
     get_typlenbyval(vardata->atttype, &typLen, &typByVal);

     /*
-     * If there is a histogram, grab the first and last values.
-     *
-     * If there is a histogram that is sorted with some other operator than
-     * the one we want, fail --- this suggests that there is data we can't
-     * use.  XXX consider collation too.
+     * If there is a histogram with the ordering we want, grab the first and
+     * last values.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_HISTOGRAM, sortop,
                          ATTSTATSSLOT_VALUES))
     {
-        if (sslot.nvalues > 0)
+        if (sslot.stacoll == collation && sslot.nvalues > 0)
         {
             tmin = datumCopy(sslot.values[0], typByVal, typLen);
             tmax = datumCopy(sslot.values[sslot.nvalues - 1], typByVal, typLen);
@@ -5416,57 +5467,36 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
         }
         free_attstatsslot(&sslot);
     }
-    else if (get_attstatsslot(&sslot, vardata->statsTuple,
-                              STATISTIC_KIND_HISTOGRAM, InvalidOid,
-                              0))
+
+    /*
+     * Otherwise, if there is a histogram with some other ordering, scan it
+     * and get the min and max values according to the ordering we want.  This
+     * of course may not find values that are really extremal according to our
+     * ordering, but it beats ignoring available data.
+     */
+    if (!have_data &&
+        get_attstatsslot(&sslot, vardata->statsTuple,
+                         STATISTIC_KIND_HISTOGRAM, InvalidOid,
+                         ATTSTATSSLOT_VALUES))
     {
+        get_stats_slot_range(&sslot, opfuncoid, &opproc,
+                             collation, typLen, typByVal,
+                             &tmin, &tmax, &have_data);
         free_attstatsslot(&sslot);
-        return false;
     }

     /*
      * If we have most-common-values info, look for extreme MCVs.  This is
      * needed even if we also have a histogram, since the histogram excludes
-     * the MCVs.  However, usually the MCVs will not be the extreme values, so
-     * avoid unnecessary data copying.
+     * the MCVs.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_MCV, InvalidOid,
                          ATTSTATSSLOT_VALUES))
     {
-        bool        tmin_is_mcv = false;
-        bool        tmax_is_mcv = false;
-        FmgrInfo    opproc;
-
-        fmgr_info(opfuncoid, &opproc);
-
-        for (i = 0; i < sslot.nvalues; i++)
-        {
-            if (!have_data)
-            {
-                tmin = tmax = sslot.values[i];
-                tmin_is_mcv = tmax_is_mcv = have_data = true;
-                continue;
-            }
-            if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               collation,
-                                               sslot.values[i], tmin)))
-            {
-                tmin = sslot.values[i];
-                tmin_is_mcv = true;
-            }
-            if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               collation,
-                                               tmax, sslot.values[i])))
-            {
-                tmax = sslot.values[i];
-                tmax_is_mcv = true;
-            }
-        }
-        if (tmin_is_mcv)
-            tmin = datumCopy(tmin, typByVal, typLen);
-        if (tmax_is_mcv)
-            tmax = datumCopy(tmax, typByVal, typLen);
+        get_stats_slot_range(&sslot, opfuncoid, &opproc,
+                             collation, typLen, typByVal,
+                             &tmin, &tmax, &have_data);
         free_attstatsslot(&sslot);
     }

@@ -5475,6 +5505,61 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     return have_data;
 }

+/*
+ * get_stats_slot_range: scan sslot for min/max values
+ *
+ * Subroutine for get_variable_range.
+ */
+static void
+get_stats_slot_range(AttStatsSlot *sslot, Oid opfuncoid, FmgrInfo *opproc,
+                     Oid collation, int16 typLen, bool typByVal,
+                     Datum *min, Datum *max, bool *p_have_data)
+{
+    Datum        tmin = *min;
+    Datum        tmax = *max;
+    bool        have_data = *p_have_data;
+    bool        found_tmin = false;
+    bool        found_tmax = false;
+
+    /* Look up the comparison function, if we didn't already do so */
+    if (opproc->fn_oid != opfuncoid)
+        fmgr_info(opfuncoid, opproc);
+
+    /* Scan all the slot's values */
+    for (int i = 0; i < sslot->nvalues; i++)
+    {
+        if (!have_data)
+        {
+            tmin = tmax = sslot->values[i];
+            found_tmin = found_tmax = true;
+            *p_have_data = have_data = true;
+            continue;
+        }
+        if (DatumGetBool(FunctionCall2Coll(opproc,
+                                           collation,
+                                           sslot->values[i], tmin)))
+        {
+            tmin = sslot->values[i];
+            found_tmin = true;
+        }
+        if (DatumGetBool(FunctionCall2Coll(opproc,
+                                           collation,
+                                           tmax, sslot->values[i])))
+        {
+            tmax = sslot->values[i];
+            found_tmax = true;
+        }
+    }
+
+    /*
+     * Copy the slot's values, if we found new extreme values.
+     */
+    if (found_tmin)
+        *min = datumCopy(tmin, typByVal, typLen);
+    if (found_tmax)
+        *max = datumCopy(tmax, typByVal, typLen);
+}
+

 /*
  * get_actual_variable_range
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 63d1263502..f3bf413829 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -731,6 +731,55 @@ equality_ops_are_compatible(Oid opno1, Oid opno2)
     return result;
 }

+/*
+ * comparison_ops_are_compatible
+ *        Return true if the two given comparison operators have compatible
+ *        semantics.
+ *
+ * This is trivially true if they are the same operator.  Otherwise,
+ * we look to see if they can be found in the same btree opfamily.
+ * For example, '<' and '>=' ops match if they belong to the same family.
+ *
+ * (This is identical to equality_ops_are_compatible(), except that we
+ * don't bother to examine hash opclasses.)
+ */
+bool
+comparison_ops_are_compatible(Oid opno1, Oid opno2)
+{
+    bool        result;
+    CatCList   *catlist;
+    int            i;
+
+    /* Easy if they're the same operator */
+    if (opno1 == opno2)
+        return true;
+
+    /*
+     * We search through all the pg_amop entries for opno1.
+     */
+    catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno1));
+
+    result = false;
+    for (i = 0; i < catlist->n_members; i++)
+    {
+        HeapTuple    op_tuple = &catlist->members[i]->tuple;
+        Form_pg_amop op_form = (Form_pg_amop) GETSTRUCT(op_tuple);
+
+        if (op_form->amopmethod == BTREE_AM_OID)
+        {
+            if (op_in_opfamily(opno2, op_form->amopfamily))
+            {
+                result = true;
+                break;
+            }
+        }
+    }
+
+    ReleaseSysCacheList(catlist);
+
+    return result;
+}
+

 /*                ---------- AMPROC CACHES ----------                         */

@@ -3028,19 +3077,6 @@ get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
     sslot->staop = (&stats->staop1)[i];
     sslot->stacoll = (&stats->stacoll1)[i];

-    /*
-     * XXX Hopefully-temporary hack: if stacoll isn't set, inject the default
-     * collation.  This won't matter for non-collation-aware datatypes.  For
-     * those that are, this covers cases where stacoll has not been set.  In
-     * the short term we need this because some code paths involving type NAME
-     * do not pass any collation to prefix_selectivity and related functions.
-     * Even when that's been fixed, it's likely that some add-on typanalyze
-     * functions won't get the word right away about filling stacoll during
-     * ANALYZE, so we'll probably need this for awhile.
-     */
-    if (sslot->stacoll == InvalidOid)
-        sslot->stacoll = DEFAULT_COLLATION_OID;
-
     if (flags & ATTSTATSSLOT_VALUES)
     {
         val = SysCacheGetAttr(STATRELATTINH, statstuple,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 91aed1f5a5..fecfe1f4f6 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -82,6 +82,7 @@ extern bool get_op_hash_functions(Oid opno,
                                   RegProcedure *lhs_procno, RegProcedure *rhs_procno);
 extern List *get_op_btree_interpretation(Oid opno);
 extern bool equality_ops_are_compatible(Oid opno1, Oid opno2);
+extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2);
 extern Oid    get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype,
                               int16 procnum);
 extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok);
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 15d2289024..7ac4a06391 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -159,7 +159,8 @@ extern double generic_restriction_selectivity(PlannerInfo *root,
                                               double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
-                                         FmgrInfo *opproc, bool isgt, bool iseq,
+                                         Oid opoid, FmgrInfo *opproc,
+                                         bool isgt, bool iseq,
                                          Oid collation,
                                          Datum constval, Oid consttype);
 extern double var_eq_const(VariableStatData *vardata,
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index c2d037b614..7caf0c9b6b 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -191,7 +191,10 @@ CREATE TABLE atest12 as
   SELECT x AS a, 10001 - x AS b FROM generate_series(1,10000) x;
 CREATE INDEX ON atest12 (a);
 CREATE INDEX ON atest12 (abs(a));
+-- results below depend on having quite accurate stats for atest12
+SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
+RESET default_statistics_target;
 CREATE FUNCTION leak(integer,integer) RETURNS boolean
   AS $$begin return $1 < $2; end$$
   LANGUAGE plpgsql immutable;
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 2ba69617dc..0ab5245b1e 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -136,7 +136,10 @@ CREATE TABLE atest12 as
   SELECT x AS a, 10001 - x AS b FROM generate_series(1,10000) x;
 CREATE INDEX ON atest12 (a);
 CREATE INDEX ON atest12 (abs(a));
+-- results below depend on having quite accurate stats for atest12
+SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
+RESET default_statistics_target;

 CREATE FUNCTION leak(integer,integer) RETURNS boolean
   AS $$begin return $1 < $2; end$$

Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From

Tom Lane

Date:

05 June 2020, 01:22:01

I wrote:
> 3. Hack things up so that the core code renames all these exposed
> functions to, say, ineq_histogram_selectivity_ext() and so on,
> allowing the additional arguments to exist, but the old names would
> still be there as ABI compatibility wrappers.

Here's a proposed v12 patch along those lines.

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 51545e0ef4..ec8e40626f 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -578,6 +578,7 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -617,8 +618,9 @@ ltreeparentsel(PG_FUNCTION_ARGS)
         /*
          * Is the constant "<@" to any of the column's most common values?
          */
-        mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft,
-                                 &mcvsum);
+        mcvsel = mcv_selectivity_ext(&vardata, &contproc, collation,
+                                     constval, varonleft,
+                                     &mcvsum);

         /*
          * If the histogram is large enough, see what fraction of it the
@@ -626,9 +628,9 @@ ltreeparentsel(PG_FUNCTION_ARGS)
          * non-MCV population.  Otherwise use the default selectivity for the
          * non-MCV population.
          */
-        selec = histogram_selectivity(&vardata, &contproc,
-                                      constval, varonleft,
-                                      10, 1, &hist_size);
+        selec = histogram_selectivity_ext(&vardata, &contproc, collation,
+                                          constval, varonleft,
+                                          10, 1, &hist_size);
         if (selec < 0)
         {
             /* Nope, fall back on default */
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 77cc378196..6465c9edcc 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -90,7 +90,9 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt,
                                                   Selectivity *rest_selec);
 static Selectivity prefix_selectivity(PlannerInfo *root,
                                       VariableStatData *vardata,
-                                      Oid vartype, Oid opfamily, Const *prefixcon);
+                                      Oid vartype, Oid opfamily,
+                                      Oid collation,
+                                      Const *prefixcon);
 static Selectivity like_selectivity(const char *patt, int pattlen,
                                     bool case_insensitive);
 static Selectivity regex_selectivity(const char *patt, int pattlen,
@@ -586,8 +588,8 @@ patternsel_common(PlannerInfo *root,

         if (eqopr == InvalidOid)
             elog(ERROR, "no = operator for opfamily %u", opfamily);
-        result = var_eq_const(&vardata, eqopr, prefix->constvalue,
-                              false, true, false);
+        result = var_eq_const_ext(&vardata, eqopr, collation,
+                                  prefix->constvalue, false, true, false);
     }
     else
     {
@@ -618,8 +620,9 @@ patternsel_common(PlannerInfo *root,
             opfuncid = get_opcode(oprid);
         fmgr_info(opfuncid, &opproc);

-        selec = histogram_selectivity(&vardata, &opproc, constval, true,
-                                      10, 1, &hist_size);
+        selec = histogram_selectivity_ext(&vardata, &opproc, collation,
+                                          constval, true,
+                                          10, 1, &hist_size);

         /* If not at least 100 entries, use the heuristic method */
         if (hist_size < 100)
@@ -629,7 +632,7 @@ patternsel_common(PlannerInfo *root,

             if (pstatus == Pattern_Prefix_Partial)
                 prefixsel = prefix_selectivity(root, &vardata, vartype,
-                                               opfamily, prefix);
+                                               opfamily, collation, prefix);
             else
                 prefixsel = 1.0;
             heursel = prefixsel * rest_selec;
@@ -661,8 +664,9 @@ patternsel_common(PlannerInfo *root,
          * directly to the result selectivity.  Also add up the total fraction
          * represented by MCV entries.
          */
-        mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true,
-                                    &sumcommon);
+        mcv_selec = mcv_selectivity_ext(&vardata, &opproc, collation,
+                                        constval, true,
+                                        &sumcommon);

         /*
          * Now merge the results from the MCV and histogram calculations,
@@ -1170,12 +1174,13 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
  */
 static Selectivity
 prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
-                   Oid vartype, Oid opfamily, Const *prefixcon)
+                   Oid vartype, Oid opfamily,
+                   Oid collation,
+                   Const *prefixcon)
 {
     Selectivity prefixsel;
     Oid            cmpopr;
     FmgrInfo    opproc;
-    AttStatsSlot sslot;
     Const       *greaterstrcon;
     Selectivity eq_sel;

@@ -1185,10 +1190,11 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         elog(ERROR, "no >= operator for opfamily %u", opfamily);
     fmgr_info(get_opcode(cmpopr), &opproc);

-    prefixsel = ineq_histogram_selectivity(root, vardata,
-                                           &opproc, true, true,
-                                           prefixcon->constvalue,
-                                           prefixcon->consttype);
+    prefixsel = ineq_histogram_selectivity_ext(root, vardata,
+                                               &opproc, true, true,
+                                               collation,
+                                               prefixcon->constvalue,
+                                               prefixcon->consttype);

     if (prefixsel < 0.0)
     {
@@ -1196,33 +1202,24 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         return DEFAULT_MATCH_SEL;
     }

-    /*-------
-     * If we can create a string larger than the prefix, say
-     * "x < greaterstr".  We try to generate the string referencing the
-     * collation of the var's statistics, but if that's not available,
-     * use DEFAULT_COLLATION_OID.
-     *-------
+    /*
+     * If we can create a string larger than the prefix, say "x < greaterstr".
      */
-    if (HeapTupleIsValid(vardata->statsTuple) &&
-        get_attstatsslot(&sslot, vardata->statsTuple,
-                         STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
-         /* sslot.stacoll is set up */ ;
-    else
-        sslot.stacoll = DEFAULT_COLLATION_OID;
     cmpopr = get_opfamily_member(opfamily, vartype, vartype,
                                  BTLessStrategyNumber);
     if (cmpopr == InvalidOid)
         elog(ERROR, "no < operator for opfamily %u", opfamily);
     fmgr_info(get_opcode(cmpopr), &opproc);
-    greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll);
+    greaterstrcon = make_greater_string(prefixcon, &opproc, collation);
     if (greaterstrcon)
     {
         Selectivity topsel;

-        topsel = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, false, false,
-                                            greaterstrcon->constvalue,
-                                            greaterstrcon->consttype);
+        topsel = ineq_histogram_selectivity_ext(root, vardata,
+                                                &opproc, false, false,
+                                                collation,
+                                                greaterstrcon->constvalue,
+                                                greaterstrcon->consttype);

         /* ineq_histogram_selectivity worked before, it shouldn't fail now */
         Assert(topsel >= 0.0);
@@ -1253,8 +1250,8 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
                                  BTEqualStrategyNumber);
     if (cmpopr == InvalidOid)
         elog(ERROR, "no = operator for opfamily %u", opfamily);
-    eq_sel = var_eq_const(vardata, cmpopr, prefixcon->constvalue,
-                          false, true, false);
+    eq_sel = var_eq_const_ext(vardata, cmpopr, collation, prefixcon->constvalue,
+                              false, true, false);

     prefixsel = Max(prefixsel, eq_sel);

diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 5e0f0614ee..89131ebab7 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -137,8 +137,9 @@ networksel(PG_FUNCTION_ARGS)
      * by MCV entries.
      */
     fmgr_info(get_opcode(operator), &proc);
-    mcv_selec = mcv_selectivity(&vardata, &proc, constvalue, varonleft,
-                                &sumcommon);
+    mcv_selec = mcv_selectivity_ext(&vardata, &proc, InvalidOid,
+                                    constvalue, varonleft,
+                                    &sumcommon);

     /*
      * If we have a histogram, use it to estimate the proportion of the
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b67897da88..f9a2c96b0e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -88,11 +88,7 @@
  * (if any) is passed using the standard fmgr mechanism, so that the estimator
  * function can fetch it with PG_GET_COLLATION().  Note, however, that all
  * statistics in pg_statistic are currently built using the relevant column's
- * collation.  Thus, in most cases where we are looking at statistics, we
- * should ignore the operator collation and use the stats entry's collation.
- * We expect that the error induced by doing this is usually not large enough
- * to justify complicating matters.  In any case, doing otherwise would yield
- * entirely garbage results for ordered stats data such as histograms.
+ * collation.
  *----------
  */

@@ -148,14 +144,14 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid,
+static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
                               AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
                               bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid,
+static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
                              bool isdefault1, bool isdefault2,
@@ -193,10 +189,11 @@ static double convert_timevalue_to_scalar(Datum value, Oid typid,
 static void examine_simple_variable(PlannerInfo *root, Var *var,
                                     VariableStatData *vardata);
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                               Oid sortop, Datum *min, Datum *max);
+                               Oid sortop, Oid collation,
+                               Datum *min, Datum *max);
 static bool get_actual_variable_range(PlannerInfo *root,
                                       VariableStatData *vardata,
-                                      Oid sortop,
+                                      Oid sortop, Oid collation,
                                       Datum *min, Datum *max);
 static bool get_actual_variable_endpoint(Relation heapRel,
                                          Relation indexRel,
@@ -234,6 +231,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -267,10 +265,10 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
      * in the query.)
      */
     if (IsA(other, Const))
-        selec = var_eq_const(&vardata, operator,
-                             ((Const *) other)->constvalue,
-                             ((Const *) other)->constisnull,
-                             varonleft, negate);
+        selec = var_eq_const_ext(&vardata, operator, collation,
+                                 ((Const *) other)->constvalue,
+                                 ((Const *) other)->constisnull,
+                                 varonleft, negate);
     else
         selec = var_eq_non_const(&vardata, operator, other,
                                  varonleft, negate);
@@ -289,6 +287,16 @@ double
 var_eq_const(VariableStatData *vardata, Oid operator,
              Datum constval, bool constisnull,
              bool varonleft, bool negate)
+{
+    return var_eq_const_ext(vardata, operator, DEFAULT_COLLATION_OID,
+                            constval, constisnull,
+                            varonleft, negate);
+}
+
+double
+var_eq_const_ext(VariableStatData *vardata, Oid operator, Oid collation,
+                 Datum constval, bool constisnull,
+                 bool varonleft, bool negate)
 {
     double        selec;
     double        nullfrac = 0.0;
@@ -353,12 +361,12 @@ var_eq_const(VariableStatData *vardata, Oid operator,
                 /* be careful to apply operator right way 'round */
                 if (varonleft)
                     match = DatumGetBool(FunctionCall2Coll(&eqproc,
-                                                           sslot.stacoll,
+                                                           collation,
                                                            sslot.values[i],
                                                            constval));
                 else
                     match = DatumGetBool(FunctionCall2Coll(&eqproc,
-                                                           sslot.stacoll,
+                                                           collation,
                                                            constval,
                                                            sslot.values[i]));
                 if (match)
@@ -555,6 +563,7 @@ neqsel(PG_FUNCTION_ARGS)
  */
 static double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
+              Oid collation,
               VariableStatData *vardata, Datum constval, Oid consttype)
 {
     Form_pg_statistic stats;
@@ -654,16 +663,17 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      * to the result selectivity.  Also add up the total fraction represented
      * by MCV entries.
      */
-    mcv_selec = mcv_selectivity(vardata, &opproc, constval, true,
-                                &sumcommon);
+    mcv_selec = mcv_selectivity_ext(vardata, &opproc, collation, constval, true,
+                                    &sumcommon);

     /*
      * If there is a histogram, determine which bin the constant falls in, and
      * compute the resulting contribution to selectivity.
      */
-    hist_selec = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, isgt, iseq,
-                                            constval, consttype);
+    hist_selec = ineq_histogram_selectivity_ext(root, vardata,
+                                                &opproc, isgt, iseq,
+                                                collation,
+                                                constval, consttype);

     /*
      * Now merge the results from the MCV and histogram calculations,
@@ -707,6 +717,15 @@ double
 mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                 Datum constval, bool varonleft,
                 double *sumcommonp)
+{
+    return mcv_selectivity_ext(vardata, opproc, DEFAULT_COLLATION_OID,
+                               constval, varonleft, sumcommonp);
+}
+
+double
+mcv_selectivity_ext(VariableStatData *vardata, FmgrInfo *opproc, Oid collation,
+                    Datum constval, bool varonleft,
+                    double *sumcommonp)
 {
     double        mcv_selec,
                 sumcommon;
@@ -726,11 +745,11 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
         {
             if (varonleft ?
                 DatumGetBool(FunctionCall2Coll(opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                sslot.values[i],
                                                constval)) :
                 DatumGetBool(FunctionCall2Coll(opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                constval,
                                                sslot.values[i])))
                 mcv_selec += sslot.numbers[i];
@@ -780,6 +799,20 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                       Datum constval, bool varonleft,
                       int min_hist_size, int n_skip,
                       int *hist_size)
+{
+    return histogram_selectivity_ext(vardata,
+                                     opproc, DEFAULT_COLLATION_OID,
+                                     constval, varonleft,
+                                     min_hist_size, n_skip,
+                                     hist_size);
+}
+
+double
+histogram_selectivity_ext(VariableStatData *vardata,
+                          FmgrInfo *opproc, Oid collation,
+                          Datum constval, bool varonleft,
+                          int min_hist_size, int n_skip,
+                          int *hist_size)
 {
     double        result;
     AttStatsSlot sslot;
@@ -804,11 +837,11 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
             {
                 if (varonleft ?
                     DatumGetBool(FunctionCall2Coll(opproc,
-                                                   sslot.stacoll,
+                                                   collation,
                                                    sslot.values[i],
                                                    constval)) :
                     DatumGetBool(FunctionCall2Coll(opproc,
-                                                   sslot.stacoll,
+                                                   collation,
                                                    constval,
                                                    sslot.values[i])))
                     nmatch++;
@@ -848,6 +881,19 @@ ineq_histogram_selectivity(PlannerInfo *root,
                            VariableStatData *vardata,
                            FmgrInfo *opproc, bool isgt, bool iseq,
                            Datum constval, Oid consttype)
+{
+    return ineq_histogram_selectivity_ext(root, vardata,
+                                          opproc, isgt, iseq,
+                                          DEFAULT_COLLATION_OID,
+                                          constval, consttype);
+}
+
+double
+ineq_histogram_selectivity_ext(PlannerInfo *root,
+                               VariableStatData *vardata,
+                               FmgrInfo *opproc, bool isgt, bool iseq,
+                               Oid collation,
+                               Datum constval, Oid consttype)
 {
     double        hist_selec;
     AttStatsSlot sslot;
@@ -860,9 +906,11 @@ ineq_histogram_selectivity(PlannerInfo *root,
      * column type.  However, to make that work we will need to figure out
      * which staop to search for --- it's not necessarily the one we have at
      * hand!  (For example, we might have a '<=' operator rather than the '<'
-     * operator that will appear in staop.)  For now, assume that whatever
-     * appears in pg_statistic is sorted the same way our operator sorts, or
-     * the reverse way if isgt is true.
+     * operator that will appear in staop.)  The collation might not agree
+     * either.  For now, just assume that whatever appears in pg_statistic is
+     * sorted the same way our operator sorts, or the reverse way if isgt is
+     * true.  This could result in a bogus estimate, but it still seems better
+     * than falling back to the default estimate.
      */
     if (HeapTupleIsValid(vardata->statsTuple) &&
         statistic_proc_security_check(vardata, opproc->fn_oid) &&
@@ -908,6 +956,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                 have_end = get_actual_variable_range(root,
                                                      vardata,
                                                      sslot.staop,
+                                                     collation,
                                                      &sslot.values[0],
                                                      &sslot.values[1]);

@@ -925,17 +974,19 @@ ineq_histogram_selectivity(PlannerInfo *root,
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          &sslot.values[0],
                                                          NULL);
                 else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          NULL,
                                                          &sslot.values[probe]);

                 ltcmp = DatumGetBool(FunctionCall2Coll(opproc,
-                                                       sslot.stacoll,
+                                                       collation,
                                                        sslot.values[probe],
                                                        constval));
                 if (isgt)
@@ -1020,7 +1071,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                  * values to a uniform comparison scale, and do a linear
                  * interpolation within this bin.
                  */
-                if (convert_to_scalar(constval, consttype, sslot.stacoll,
+                if (convert_to_scalar(constval, consttype, collation,
                                       &val,
                                       sslot.values[i - 1], sslot.values[i],
                                       vardata->vartype,
@@ -1160,6 +1211,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -1212,7 +1264,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     }

     /* The rest of the work is done by scalarineqsel(). */
-    selec = scalarineqsel(root, operator, isgt, iseq,
+    selec = scalarineqsel(root, operator, isgt, iseq, collation,
                           &vardata, constval, consttype);

     ReleaseVariableStats(vardata);
@@ -1277,8 +1329,8 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
          * A boolean variable V is equivalent to the clause V = 't', so we
          * compute the selectivity as if that is what we have.
          */
-        selec = var_eq_const(&vardata, BooleanEqualOperator,
-                             BoolGetDatum(true), false, true, false);
+        selec = var_eq_const_ext(&vardata, BooleanEqualOperator, InvalidOid,
+                                 BoolGetDatum(true), false, true, false);
     }
     else
     {
@@ -2003,6 +2055,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     JoinType    jointype = (JoinType) PG_GETARG_INT16(3);
 #endif
     SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+    Oid            collation = PG_GET_COLLATION();
     double        selec;
     double        selec_inner;
     VariableStatData vardata1;
@@ -2053,7 +2106,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     }

     /* We need to compute the inner-join selectivity in all cases */
-    selec_inner = eqjoinsel_inner(opfuncoid,
+    selec_inner = eqjoinsel_inner(opfuncoid, collation,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
@@ -2080,7 +2133,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
             inner_rel = find_join_input_rel(root, sjinfo->min_righthand);

             if (!join_is_reversed)
-                selec = eqjoinsel_semi(opfuncoid,
+                selec = eqjoinsel_semi(opfuncoid, collation,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
                                        isdefault1, isdefault2,
@@ -2093,7 +2146,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                 Oid            commop = get_commutator(operator);
                 Oid            commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;

-                selec = eqjoinsel_semi(commopfuncoid,
+                selec = eqjoinsel_semi(commopfuncoid, collation,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
                                        isdefault2, isdefault1,
@@ -2141,7 +2194,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid,
+eqjoinsel_inner(Oid opfuncoid, Oid collation,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
@@ -2203,7 +2256,7 @@ eqjoinsel_inner(Oid opfuncoid,
                 if (hasmatch2[j])
                     continue;
                 if (DatumGetBool(FunctionCall2Coll(&eqproc,
-                                                   sslot1->stacoll,
+                                                   collation,
                                                    sslot1->values[i],
                                                    sslot2->values[j])))
                 {
@@ -2321,7 +2374,7 @@ eqjoinsel_inner(Oid opfuncoid,
  * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid,
+eqjoinsel_semi(Oid opfuncoid, Oid collation,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
                bool isdefault1, bool isdefault2,
@@ -2415,7 +2468,7 @@ eqjoinsel_semi(Oid opfuncoid,
                 if (hasmatch2[j])
                     continue;
                 if (DatumGetBool(FunctionCall2Coll(&eqproc,
-                                                   sslot1->stacoll,
+                                                   collation,
                                                    sslot1->values[i],
                                                    sslot2->values[j])))
                 {
@@ -2635,6 +2688,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     Oid            op_lefttype;
     Oid            op_righttype;
     Oid            opno,
+                collation,
                 lsortop,
                 rsortop,
                 lstatop,
@@ -2659,6 +2713,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     if (!is_opclause(clause))
         return;                    /* shouldn't happen */
     opno = ((OpExpr *) clause)->opno;
+    collation = ((OpExpr *) clause)->inputcollid;
     left = get_leftop((Expr *) clause);
     right = get_rightop((Expr *) clause);
     if (!right)
@@ -2792,20 +2847,20 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     /* Try to get ranges of both inputs */
     if (!isgt)
     {
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmin, &leftmax))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmin, &rightmax))
             goto fail;            /* no range available from stats */
     }
     else
     {
         /* need to swap the max and min */
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmax, &leftmin))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmax, &rightmin))
             goto fail;            /* no range available from stats */
     }
@@ -2815,13 +2870,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * fraction that's <= the right-side maximum value.  But only believe
      * non-default estimates, else stick with our 1.0.
      */
-    selec = scalarineqsel(root, leop, isgt, true, &leftvar,
+    selec = scalarineqsel(root, leop, isgt, true, collation, &leftvar,
                           rightmax, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftend = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revleop, isgt, true, &rightvar,
+    selec = scalarineqsel(root, revleop, isgt, true, collation, &rightvar,
                           leftmax, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightend = selec;
@@ -2845,13 +2900,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * minimum value.  But only believe non-default estimates, else stick with
      * our own default.
      */
-    selec = scalarineqsel(root, ltop, isgt, false, &leftvar,
+    selec = scalarineqsel(root, ltop, isgt, false, collation, &leftvar,
                           rightmin, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftstart = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revltop, isgt, false, &rightvar,
+    selec = scalarineqsel(root, revltop, isgt, false, collation, &rightvar,
                           leftmin, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightstart = selec;
@@ -5124,9 +5179,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
  *
  * sortop is the "<" comparison operator to use.  This should generally
  * be "<" not ">", as only the former is likely to be found in pg_statistic.
+ * The collation must be specified too.
  */
 static bool
-get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
+get_variable_range(PlannerInfo *root, VariableStatData *vardata,
+                   Oid sortop, Oid collation,
                    Datum *min, Datum *max)
 {
     Datum        tmin = 0;
@@ -5146,7 +5203,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      * before enabling this.
      */
 #ifdef NOT_USED
-    if (get_actual_variable_range(root, vardata, sortop, min, max))
+    if (get_actual_variable_range(root, vardata, sortop, collation, min, max))
         return true;
 #endif

@@ -5174,7 +5231,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      *
      * If there is a histogram that is sorted with some other operator than
      * the one we want, fail --- this suggests that there is data we can't
-     * use.
+     * use.  XXX consider collation too.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_HISTOGRAM, sortop,
@@ -5221,14 +5278,14 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
                 continue;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                sslot.values[i], tmin)))
             {
                 tmin = sslot.values[i];
                 tmin_is_mcv = true;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                tmax, sslot.values[i])))
             {
                 tmax = sslot.values[i];
@@ -5258,10 +5315,11 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
  *        If no data available, return false.
  *
  * sortop is the "<" comparison operator to use.
+ * collation is the required collation.
  */
 static bool
 get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                          Oid sortop,
+                          Oid sortop, Oid collation,
                           Datum *min, Datum *max)
 {
     bool        have_data = false;
@@ -5301,9 +5359,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
             continue;

         /*
-         * The first index column must match the desired variable and sort
-         * operator --- but we can use a descending-order index.
+         * The first index column must match the desired variable, sortop, and
+         * collation --- but we can use a descending-order index.
          */
+        if (collation != index->indexcollations[0])
+            continue;            /* test first 'cause it's cheapest */
         if (!match_index_to_operand(vardata->var, 0, index))
             continue;
         switch (get_op_opfamily_strategy(sortop, index->sortopfamily[0]))
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 85d9ecbfc6..521cd84130 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -143,17 +143,36 @@ extern double get_variable_numdistinct(VariableStatData *vardata,
 extern double mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                               Datum constval, bool varonleft,
                               double *sumcommonp);
+extern double mcv_selectivity_ext(VariableStatData *vardata,
+                                  FmgrInfo *opproc, Oid collation,
+                                  Datum constval, bool varonleft,
+                                  double *sumcommonp);
 extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
+extern double histogram_selectivity_ext(VariableStatData *vardata,
+                                        FmgrInfo *opproc, Oid collation,
+                                        Datum constval, bool varonleft,
+                                        int min_hist_size, int n_skip,
+                                        int *hist_size);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,
                                          Datum constval, Oid consttype);
+extern double ineq_histogram_selectivity_ext(PlannerInfo *root,
+                                             VariableStatData *vardata,
+                                             FmgrInfo *opproc,
+                                             bool isgt, bool iseq,
+                                             Oid collation,
+                                             Datum constval, Oid consttype);
 extern double var_eq_const(VariableStatData *vardata, Oid oproid,
                            Datum constval, bool constisnull,
                            bool varonleft, bool negate);
+extern double var_eq_const_ext(VariableStatData *vardata,
+                               Oid oproid, Oid collation,
+                               Datum constval, bool constisnull,
+                               bool varonleft, bool negate);
 extern double var_eq_non_const(VariableStatData *vardata, Oid oproid,
                                Node *other,
                                bool varonleft, bool negate);