Re: long analyze, libc bug and libicu - Mailing list pgsql-bugs

From Tom Lane
Subject Re: long analyze, libc bug and libicu
Date
Msg-id 70603.1530974130@sss.pgh.pa.us
Whole thread Raw
In response to Re: long analyze, libc bug and libicu  (Grigory Smolkin <g.smolkin@postgrespro.ru>)
List pgsql-bugs
Grigory Smolkin <g.smolkin@postgrespro.ru> writes:
> On 07/07/2018 10:10 AM, Peter Eisentraut wrote:
>> On 05.07.18 17:05, Grigory Smolkin wrote:
>>> Why ANALYZE igrones column COLLATE?

>> I think the statistics would be mostly the same independent of which
>> collation you use.  This could possibly be refined, but I don't think
>> it's a major problem right now.

I don't actually believe that the stats would be mostly the same.
Yes, we ought to arrive at the same MCV list, ndistinct, etc, but the
histogram depends critically on the sort order.  In particular its
endpoints, and estimates for comparison values near the endpoints,
might be very much different.

However, this is something that was left for future research when
we added collations, and nobody's really followed up on that.
Should ANALYZE/the planner care about collation (perhaps only for
specific stats types)?  Does that go as far as ignoring stats that don't
match the query operator's collation? Should we consider recording stats
for more than one collation, and if so which ones?  What are the
backwards-compatibility issues involved in changing something like this?

Grigory's proposal amounts to assuming that the column's assigned
collation is the only one of interest, which might be true but it
needs some defense.  In any case it wouldn't end up being a three-line
patch; there's a whole lot of downstream work to consider.

But besides that, I've got no sympathy for forcing through a change
in this area just on the grounds that some platform's strcoll_l is
ridiculously slow with certain collations.  The right answer for that
is to lobby the libc maintainers to fix strcoll_l, especially since
the odds of us changing this in released branches are nil.

            regards, tom lane


pgsql-bugs by date:

Previous
From: Stephen Frost
Date:
Subject: Re: long analyze, libc bug and libicu
Next
From: Tom Lane
Date:
Subject: Re: BUG #15263: pg_dump / psql failure. When loading, psql does not see function-based constraints or indices