Thread: long analyze, libc bug and libicu
Hello, everyone! One of your clients with PostgreSQL 10.4 recently noticed that ANALYZE started to take unproportionally long amount of time. Backtrace: #0 0x00007f244b68ca06 in strcoll_l () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f244e37802d in varstrfastcmp_locale (x=139794027137284, y=139794027967812, ssup=<optimized out>) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/utils/adt/varlena.c:2185 #2 0x00007f244e0d6b15 in ApplySortComparator (isNull1=0 '\000', isNull2=0 '\000', ssup=0x7fffa958de80, datum2=<optimized out>, datum1=<optimized out>) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/include/utils/sortsupport.h:225 #3 compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:2821 #4 0x00007f244e3e1023 in med3 (a=0x7f24578f6bd8 "\224__U$\177", b=0x7f24578f6c78 "\004ITU$\177", c=0x7f24578f6d18 "D\365`U$\177", cmp=0x7f244e0d6af0 <compare_scalars>, arg=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:108 #5 0x00007f244e3e13ec in qsort_arg (a=a@entry=0x7f24578f6bd8, n=<optimized out>, n@entry=82, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, arg=arg@entry=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:157 #6 0x00007f244e3e1475 in qsort_arg (a=a@entry=0x7f24578f6248, n=<optimized out>, n@entry=235, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, arg=arg@entry=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:217 #7 0x00007f244e3e1475 in qsort_arg (a=a@entry=0x7f24578f5348, n=<optimized out>, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, arg=arg@entry=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:217 #8 0x00007f244e3e13ac in qsort_arg (a=a@entry=0x7f24578f5348, n=<optimized out>, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, arg=arg@entry=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:203 #9 0x00007f244e3e13ac in qsort_arg (a=0x7f24578f5348, a@entry=0x7f2457816928, n=<optimized out>, n@entry=109083, es=es@entry=16, cmp=cmp@entry=0x7f244e0d6af0 <compare_scalars>, arg=arg@entry=0x7fffa958de70) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/port/qsort_arg.c:203 #10 0x00007f244e0d604f in compute_scalar_stats (stats=0x7f244fd168f8, fetchfunc=<optimized out>, samplerows=110194, totalrows=110194) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:2378 #11 0x00007f244e0d76f9 in do_analyze_rel (onerel=onerel@entry=0x7f244fc339d8, options=options@entry=99, params=params@entry=0x7f244f98b9f0, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=11515, inh=0 '\000', in_outer_xact=0 '\000', elevel=13) at /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/commands/analyze.c:545 After a bit of digging it became clear that they hit the libc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=18441 We decided to switch to libicu collation, but it became clear that ANALYZE ignore column level COLLATE and stubbornly use default collation. It happens thanks to this piece of code: static void compute_scalar_stats(VacAttrStatsP stats, AnalyzeAttrFetchFunc fetchfunc, int samplerows, double totalrows) .... /* We always use the default collation for statistics */ ssup.ssup_collation = DEFAULT_COLLATION_OID; .... Ok, we decided to set libicu collation as default for database, although it is a major pain. But turns out it`s impossible to set libicu collation as default for database: postgres=# create database icu LC_COLLATE = 'en-x-icu'; ERROR: invalid locale name: "en-x-icu" Why ANALYZE igrones column COLLATE? It there workaround to force ANALYZE to use libicu? -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 05.07.18 17:05, Grigory Smolkin wrote: > Why ANALYZE igrones column COLLATE? I think the statistics would be mostly the same independent of which collation you use. This could possibly be refined, but I don't think it's a major problem right now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/07/2018 10:10 AM, Peter Eisentraut wrote:
On 05.07.18 17:05, Grigory Smolkin wrote:Why ANALYZE igrones column COLLATE?I think the statistics would be mostly the same independent of which collation you use. This could possibly be refined, but I don't think it's a major problem right now.
Thank you for your interest in this problem!
I think the statistics would be mostly the same independent of which collation you use.
I assumed that one of the goals of using libicu is to be independent from libc collation and it`s bugs and inconsistencies, but current ANALYZE forced to use libc anyway, which undermines that goal.
This could possibly be refined, but I don't think it's a major problem right now.
It`s a major problem to people, who use Thai alphabet.
In attachment there is a data sample(33MB on my machine). ANALYZE`ing it comes up with following results:
postgres=# ANALYZE t_icu_coll;
ANALYZE
Time: 2252086.648 ms
37minutes on 33MB table is painful. On big tables autovacuum ANALYZE goes for hours, starving autovacuum VACUUM for worker slots(autovacuum_max_workers).
Another major problem is that in strol_l() backend process ignores pg_terminate_backend()/pg_cancel_backend() functions.
With attached patch this problem goes away:
postgres=# analyze t_icu_coll;
ANALYZE
Time: 161.419 ms
-- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Greetings, * Grigory Smolkin (g.smolkin@postgrespro.ru) wrote: > On 07/07/2018 10:10 AM, Peter Eisentraut wrote: > >On 05.07.18 17:05, Grigory Smolkin wrote: > >>Why ANALYZE igrones column COLLATE? > >I think the statistics would be mostly the same independent of which > >collation you use. This could possibly be refined, but I don't think > >it's a major problem right now. > > Thank you for your interest in this problem! > > >I think the statistics would be mostly the same independent of which > >collation you use. > > I assumed that one of the goals of using libicu is to be independent from > libc collation and it`s bugs and inconsistencies, but current ANALYZE forced > to use libc anyway, which undermines that goal. I would have thought so too, especially in a case like you describe below... > > This could possibly be refined, but I don't think > >it's a major problem right now. > > It`s a major problem to people, who use Thai alphabet. > In attachment there is a data sample(33MB on my machine). ANALYZE`ing it > comes up with following results: > > postgres=# ANALYZE t_icu_coll; > ANALYZE > Time: 2252086.648 ms > > 37minutes on 33MB table is painful. On big tables autovacuum ANALYZE goes > for hours, starving autovacuum VACUUM for worker > slots(autovacuum_max_workers). > Another major problem is that in strol_l() backend process ignores > pg_terminate_backend()/pg_cancel_backend() functions. > > With attached patch this problem goes away: > > postgres=# analyze t_icu_coll; > ANALYZE > Time: 161.419 ms Wow, that's definitely an issue. I haven't looked at the patch in any depth, but definitely a +1 from me for figuring out how to fix this issue.. Thanks! Stephen
Attachment
Grigory Smolkin <g.smolkin@postgrespro.ru> writes: > On 07/07/2018 10:10 AM, Peter Eisentraut wrote: >> On 05.07.18 17:05, Grigory Smolkin wrote: >>> Why ANALYZE igrones column COLLATE? >> I think the statistics would be mostly the same independent of which >> collation you use. This could possibly be refined, but I don't think >> it's a major problem right now. I don't actually believe that the stats would be mostly the same. Yes, we ought to arrive at the same MCV list, ndistinct, etc, but the histogram depends critically on the sort order. In particular its endpoints, and estimates for comparison values near the endpoints, might be very much different. However, this is something that was left for future research when we added collations, and nobody's really followed up on that. Should ANALYZE/the planner care about collation (perhaps only for specific stats types)? Does that go as far as ignoring stats that don't match the query operator's collation? Should we consider recording stats for more than one collation, and if so which ones? What are the backwards-compatibility issues involved in changing something like this? Grigory's proposal amounts to assuming that the column's assigned collation is the only one of interest, which might be true but it needs some defense. In any case it wouldn't end up being a three-line patch; there's a whole lot of downstream work to consider. But besides that, I've got no sympathy for forcing through a change in this area just on the grounds that some platform's strcoll_l is ridiculously slow with certain collations. The right answer for that is to lobby the libc maintainers to fix strcoll_l, especially since the odds of us changing this in released branches are nil. regards, tom lane