Thread: long analyze, libc bug and libicu

long analyze, libc bug and libicu

From
Grigory Smolkin
Date:
Hello, everyone!

One of your clients with PostgreSQL 10.4 recently noticed that ANALYZE 
started to take unproportionally long amount of time. Backtrace:

#0  0x00007f244b68ca06 in strcoll_l () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f244e37802d in varstrfastcmp_locale (x=139794027137284, 
y=139794027967812, ssup=<optimized out>) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/utils/adt/varlena.c:2185
#2  0x00007f244e0d6b15 in ApplySortComparator (isNull1=0 '\000', 
isNull2=0 '\000', ssup=0x7fffa958de80, datum2=<optimized out>, 
datum1=<optimized out>) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/include/utils/sortsupport.h:225
#3  compare_scalars (a=<optimized out>, b=<optimized out>, 
arg=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:2821
#4  0x00007f244e3e1023 in med3 (a=0x7f24578f6bd8 "\224__U$\177", 
b=0x7f24578f6c78 "\004ITU$\177", c=0x7f24578f6d18 "D\365`U$\177", 
cmp=0x7f244e0d6af0 <compare_scalars>, arg=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:108
#5  0x00007f244e3e13ec in qsort_arg (a=a@entry=0x7f24578f6bd8, 
n=<optimized out>, n@entry=82, es=es@entry=16, 
cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, 
arg=arg@entry=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:157
#6  0x00007f244e3e1475 in qsort_arg (a=a@entry=0x7f24578f6248, 
n=<optimized out>, n@entry=235, es=es@entry=16, 
cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, 
arg=arg@entry=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:217
#7  0x00007f244e3e1475 in qsort_arg (a=a@entry=0x7f24578f5348, 
n=<optimized out>, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 
<compare_scalars>, arg=arg@entry=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:217
#8  0x00007f244e3e13ac in qsort_arg (a=a@entry=0x7f24578f5348, 
n=<optimized out>, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 
<compare_scalars>, arg=arg@entry=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:203
#9  0x00007f244e3e13ac in qsort_arg (a=0x7f24578f5348, 
a@entry=0x7f2457816928, n=<optimized out>, n@entry=109083, 
es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, 
arg=arg@entry=0x7fffa958de70) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:203
#10 0x00007f244e0d604f in compute_scalar_stats (stats=0x7f244fd168f8, 
fetchfunc=<optimized out>, samplerows=110194, totalrows=110194) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:2378
#11 0x00007f244e0d76f9 in do_analyze_rel 
(onerel=onerel@entry=0x7f244fc339d8, options=options@entry=99, 
params=params@entry=0x7f244f98b9f0, va_cols=va_cols@entry=0x0, 
acquirefunc=<optimized out>, relpages=11515, inh=0 '\000', 
in_outer_xact=0 '\000', elevel=13) at 
/build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:545


After a bit of digging it became clear that they hit the libc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=18441

We decided to switch to libicu collation, but it became clear that 
ANALYZE ignore column level COLLATE and stubbornly use default collation.

It happens thanks to this piece of code:

static void
compute_scalar_stats(VacAttrStatsP stats,
                      AnalyzeAttrFetchFunc fetchfunc,
                      int samplerows,
                      double totalrows)
....
     /* We always use the default collation for statistics */
     ssup.ssup_collation = DEFAULT_COLLATION_OID;
....

Ok, we decided to set libicu collation as default for database, although 
it is a major pain.
But turns out it`s impossible to set libicu collation as default for 
database:

postgres=# create database icu LC_COLLATE = 'en-x-icu';
ERROR:  invalid locale name: "en-x-icu"

Why ANALYZE igrones column COLLATE? It there workaround to force ANALYZE 
to use libicu?

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: long analyze, libc bug and libicu

From
Peter Eisentraut
Date:
On 05.07.18 17:05, Grigory Smolkin wrote:
> Why ANALYZE igrones column COLLATE?

I think the statistics would be mostly the same independent of which
collation you use.  This could possibly be refined, but I don't think
it's a major problem right now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: long analyze, libc bug and libicu

From
Grigory Smolkin
Date:

On 07/07/2018 10:10 AM, Peter Eisentraut wrote:

On 05.07.18 17:05, Grigory Smolkin wrote:
Why ANALYZE igrones column COLLATE?
I think the statistics would be mostly the same independent of which
collation you use.  This could possibly be refined, but I don't think
it's a major problem right now.


Thank you for your interest in this problem!
I think the statistics would be mostly the same independent of which
collation you use.

I assumed that one of the goals of using libicu is to be independent from libc collation and it`s bugs and inconsistencies, but current ANALYZE forced to use libc anyway, which undermines that goal.
 This could possibly be refined, but I don't think
it's a major problem right now.

It`s a major problem to people, who use Thai alphabet.
In attachment there is a data sample(33MB on my machine). ANALYZE`ing it comes up with following results:

postgres=# ANALYZE t_icu_coll;
ANALYZE
Time: 2252086.648 ms

37minutes on 33MB table is painful. On big tables autovacuum ANALYZE goes for hours, starving autovacuum VACUUM for worker slots(autovacuum_max_workers).
Another major problem is that in strol_l() backend process ignores pg_terminate_backend()/pg_cancel_backend() functions.

With attached patch this problem goes away:

postgres=# analyze t_icu_coll;
ANALYZE
Time: 161.419 ms



-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: long analyze, libc bug and libicu

From
Stephen Frost
Date:
Greetings,

* Grigory Smolkin (g.smolkin@postgrespro.ru) wrote:
> On 07/07/2018 10:10 AM, Peter Eisentraut wrote:
> >On 05.07.18 17:05, Grigory Smolkin wrote:
> >>Why ANALYZE igrones column COLLATE?
> >I think the statistics would be mostly the same independent of which
> >collation you use.  This could possibly be refined, but I don't think
> >it's a major problem right now.
>
> Thank you for your interest in this problem!
>
> >I think the statistics would be mostly the same independent of which
> >collation you use.
>
> I assumed that one of the goals of using libicu is to be independent from
> libc collation and it`s bugs and inconsistencies, but current ANALYZE forced
> to use libc anyway, which undermines that goal.

I would have thought so too, especially in a case like you describe
below...

> >  This could possibly be refined, but I don't think
> >it's a major problem right now.
>
> It`s a major problem to people, who use Thai alphabet.
> In attachment there is a data sample(33MB on my machine). ANALYZE`ing it
> comes up with following results:
>
> postgres=# ANALYZE t_icu_coll;
> ANALYZE
> Time: 2252086.648 ms
>
> 37minutes on 33MB table is painful. On big tables autovacuum ANALYZE goes
> for hours, starving autovacuum VACUUM for worker
> slots(autovacuum_max_workers).
> Another major problem is that in strol_l() backend process ignores
> pg_terminate_backend()/pg_cancel_backend() functions.
>
> With attached patch this problem goes away:
>
> postgres=# analyze t_icu_coll;
> ANALYZE
> Time: 161.419 ms

Wow, that's definitely an issue.

I haven't looked at the patch in any depth, but definitely a +1 from me
for figuring out how to fix this issue..

Thanks!

Stephen

Attachment

Re: long analyze, libc bug and libicu

From
Tom Lane
Date:
Grigory Smolkin <g.smolkin@postgrespro.ru> writes:
> On 07/07/2018 10:10 AM, Peter Eisentraut wrote:
>> On 05.07.18 17:05, Grigory Smolkin wrote:
>>> Why ANALYZE igrones column COLLATE?

>> I think the statistics would be mostly the same independent of which
>> collation you use.  This could possibly be refined, but I don't think
>> it's a major problem right now.

I don't actually believe that the stats would be mostly the same.
Yes, we ought to arrive at the same MCV list, ndistinct, etc, but the
histogram depends critically on the sort order.  In particular its
endpoints, and estimates for comparison values near the endpoints,
might be very much different.

However, this is something that was left for future research when
we added collations, and nobody's really followed up on that.
Should ANALYZE/the planner care about collation (perhaps only for
specific stats types)?  Does that go as far as ignoring stats that don't
match the query operator's collation? Should we consider recording stats
for more than one collation, and if so which ones?  What are the
backwards-compatibility issues involved in changing something like this?

Grigory's proposal amounts to assuming that the column's assigned
collation is the only one of interest, which might be true but it
needs some defense.  In any case it wouldn't end up being a three-line
patch; there's a whole lot of downstream work to consider.

But besides that, I've got no sympathy for forcing through a change
in this area just on the grounds that some platform's strcoll_l is
ridiculously slow with certain collations.  The right answer for that
is to lobby the libc maintainers to fix strcoll_l, especially since
the odds of us changing this in released branches are nil.

            regards, tom lane