Daniel Verite <daniel@manitou-mail.org> wrote:
> SELECT count(distinct wordtext COLLATE :"collname") FROM words_test;
>
>Some of the collations that crash:
> az-Latn-AZ-u-co-search-x-icu
> bs-Latn-BA-u-co-search-x-icu
> bs-x-icu
> cs-CZ-u-co-search-x-icu
> de-BE-u-co-phonebk-x-icu
> sr-Latn-XK-x-icu
> zh-Hans-CN-u-co-big5han-x-icu
>
>Trying all of them I had 146 crashes out of the 1741 ICU
>entries in pg_collation created by initdb.
>
>The size of the table is 291MB, and work_mem to 128MB.
>
>Reducing the dataset tends to make the problem disappear: if I split
>the table in halves based on row_number() to bisect on the data,
>the queries on both parts pass without crashing.
I think that this sensitivity to work_mem exists because abbreviated
keys are used for quicksort operations that sort individual runs.
As work_mem is increased, and less merging is required, affected
codepaths are reached less frequently. You would probably find that the
problem appears more consistently if varstr_sortsupport() is modified so
that even ICU collations never use abbreviated keys; that would be a
matter of "abbreviate" always being set to false within that function.
I suggest using the new amcheck contrib module as part of this testing
(you'll need to use CREATE INDEX to have an index to perform
verification against). This will zero in on inconsistencies that may be
far more subtle than a hard crash. I wouldn't assume that abbreviated
key comparisons are correct here just because there is no hard crash.
Does the crash always have ucol_strcollUseLatin1UTF8() in its backtrace?
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs