Thread: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

[HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
varstr_sortsupport() only allows Windows to use SortSupport with a
non-C-locale (when the server encoding happens to be UTF-8, which I
assume is the common case). This is because we (quite reasonably)
don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion
hack from varstr_cmp() for the SortSupport authoritative comparator
(varstrfastcmp_locale() shouldn't get its own copy of this kludge,
because it's supposed to be "fast"). This broad restriction made sense
when Windows + UTF-8 + non-C-locale necessarily required the
aforementioned UTF-16 conversion kludge. However, iff an ICU locale is
in use on Windows (or any other platform), then we can always use
SortSupport, regardless of anything else (we should not have the core
code install a fmgr comparison shim that just calls varstr_cmp(),
though we still do). We don't actually need the UTF-16 kludge at all,
so we can use SortSupport without any special care.

The current state of affairs doesn't make any sense, AFAICT, and so
the restriction should be removed on general principle: we *already*
expect ICU to have no restrictions that are peculiar to Windows, as we
see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on
to this restriction. This restriction also seems worth fixing because
Windows users are generally more likely to want to use ICU locales;
most of them would otherwise end up actually paying the overhead for
the UTF-16 kludge. (Presumably the UTF-16 conversion makes text
sorting *even slower* than it would be if we merely didn't do
SortSupport, which is to say: very slow indeed.)

In summary, we're currently attaching the use of SortSupport to the
wrong thing. We're treating this UTF-16 business as something that
implies a broad OS/platform restriction, when in fact it should be
treated as implying a restriction for one particular collation
provider only (a collation provider that happens to be built into
Windows, but isn't really special to us).

Attached patch shows what I'm getting at. This is untested, since I
don't use Windows. Proceed with caution.

On a related note, am I the only one that finds it questionable that
str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
contains an "#ifdef USE_ICU" block? It seems like those two things
might get conflated on some platforms. We don't want lower() to ever
not use the ICU infrastructure when an ICU collation is used, and yet
it's not obvious that that's impossible. I understand that the code in
regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
facilities, and that that was always accepted as legacy that ICU had
to live with. Maybe a static assertion is all that we need here (ICU
builds must also be USE_WIDE_UPPER_LOWER builds).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

[HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Noah Misch
Date:
On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote:
> On a related note, am I the only one that finds it questionable that
> str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
> contains an "#ifdef USE_ICU" block? It seems like those two things
> might get conflated on some platforms. We don't want lower() to ever
> not use the ICU infrastructure when an ICU collation is used, and yet
> it's not obvious that that's impossible. I understand that the code in
> regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
> facilities, and that that was always accepted as legacy that ICU had
> to live with. Maybe a static assertion is all that we need here (ICU
> builds must also be USE_WIDE_UPPER_LOWER builds).

I checked !USE_WIDE_UPPER_LOWER by configuring v10 as follows:
 ./configure -C --prefix=$HOME/sw/nopath/pg10 --enable-debug \     --enable-cassert --enable-depend --enable-tap-tests
--with-libxml\     --with-gssapi --with-openssl ac_cv_func_towlower=no
 

The result fails to compile:

$ make
formatting.c: In function ‘str_tolower’:
formatting.c:1623:10: error: ‘mylocale’ undeclared (first use in this function)     if (mylocale)         ^
... snipped other errors ...

A --with-icu build fails similarly.  PG9.6 builds and passes "make check".

Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.  Solaris 2.5.1 (out of
support for 12 years) might be the most interesting OS affected:

https://www.gnu.org/software/gnulib/manual/html_node/towlower.html
https://www.gnu.org/software/gnulib/manual/html_node/wcstombs.html


The above-described topic is currently a PostgreSQL 10 open item.  Peter
(Eisentraut), since you committed the patch believed to have created it, you
own this open item.  If some other commit is more relevant or if this does not
belong as a v10 open item, please let us know.  Otherwise, please observe the
policy on open item ownership[1] and send a status update within three
calendar days of this message.  Include a date for your subsequent status
update.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Noah Misch
Date:
On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

This is currently a v10 open item, but I think it doesn't qualify for that
treatment.  It's merely an opportunity for optimization, albeit an
attractively-simple one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.

+1 ... if nothing else, there's the problem that untested code is likely
to be broken.  You just proved it *is* broken, of course, but my point
is that even if we repaired the immediate damage we could have little
confidence in it staying fixed.

I think the USE_WIDE_UPPER_LOWER split was originally my code, so I'm
willing to take care of removing it if there's consensus that that's
what to do.

I'm not sure that we need to treat this as a v10 open item, though.
The premise of removing !USE_WIDE_UPPER_LOWER is that nobody cares
anymore, therefore it shouldn't matter to users whether we remove it in
v10.  There's an argument that having only two states of the relevant
code, not three, in the live back branches is worth something for
maintenance --- but should that outweigh the risk of breaking something
post-rc1?
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

Fair enough.

This is clearly an omission that was made in 41839b7ab, the commit
that added/fixed Windows support for ICU. Hopefully the omission can
be corrected for v10. I see this as a surprising behavior for users,
since ICU locales on all other platforms *will* have much faster
sorting than libc locales (often more than 3x faster). This is mostly
because ICU brings back abbreviated keys. 3x - 5x faster is more of a
qualitative difference than a quantitative one, IMHO.

That having been said, I'm certainly not going to insist on a v10 fix
for this issue.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Peter Eisentraut
Date:
On 9/21/17 01:29, Noah Misch wrote:
> I checked !USE_WIDE_UPPER_LOWER by configuring v10 as follows:
> 
>   ./configure -C --prefix=$HOME/sw/nopath/pg10 --enable-debug \
>       --enable-cassert --enable-depend --enable-tap-tests --with-libxml \
>       --with-gssapi --with-openssl ac_cv_func_towlower=no
> 
> The result fails to compile:

Yeah, the placement of the ifdef blocks was pretty bogus.  This patch
fixes it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Eisentraut
Date:
On 9/16/17 18:33, Peter Geoghegan wrote:
> In summary, we're currently attaching the use of SortSupport to the
> wrong thing. We're treating this UTF-16 business as something that
> implies a broad OS/platform restriction, when in fact it should be
> treated as implying a restriction for one particular collation
> provider only (a collation provider that happens to be built into
> Windows, but isn't really special to us).
> 
> Attached patch shows what I'm getting at. This is untested, since I
> don't use Windows. Proceed with caution.

Your analysis makes sense, but the patch doesn't work, because "locale"
is never set before reading it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>> Attached patch shows what I'm getting at. This is untested, since I
>> don't use Windows. Proceed with caution.
>
> Your analysis makes sense, but the patch doesn't work, because "locale"
> is never set before reading it.

It was just for illustrative purposes. Seems fine to actually move the
WIN32 block down to just before the existing TRUST_STRXFRM test, since
the locale is going to be cached and then immediately reused where we
return immediately (Windows libc) anyway. This would also be a win for
clarity, IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

I have withdrawn this as an open item. I'm still hopeful that it can
be worked through for v10 at Peter's discretion.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Tom Lane
Date:
I wrote:
> Noah Misch <noah@leadboat.com> writes:
>> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
>> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.

> +1 ... if nothing else, there's the problem that untested code is likely
> to be broken.  You just proved it *is* broken, of course, but my point
> is that even if we repaired the immediate damage we could have little
> confidence in it staying fixed.

Further notes about that:

* The Single Unix Spec v2 (a/k/a POSIX 1997), which has been our minimum
portability spec for quite awhile, requires wcstombs() and towlower(),
and further requires the latter to be declared in <wctype.h>.

* Surveying the buildfarm, I agree with your conclusion that every active
member has wcstombs() and towlower().  gaur/pademelon is the lone member
that lacks <wctype.h>; it declares towlower() in <wchar.h> instead.  It's
not so surprising that that system adheres to a pre-1997 idea of where to
put that, because its /usr/include files mostly date from 1996 ...

Meanwhile, I see that Peter has posted a fix for the immediate problem.
I propose that Peter should apply his fix in HEAD and v10, and then
I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Noah Misch
Date:
On Thu, Sep 21, 2017 at 05:38:13PM -0400, Tom Lane wrote:
> I wrote:
> > Noah Misch <noah@leadboat.com> writes:
> >> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing
> >> USE_WIDE_UPPER_LOWER?  Every buildfarm fossil has both.
> 
> > +1 ... if nothing else, there's the problem that untested code is likely
> > to be broken.  You just proved it *is* broken, of course, but my point
> > is that even if we repaired the immediate damage we could have little
> > confidence in it staying fixed.

That would be easy enough to ensure by adding ac_cv_func_towlower=no to some
buildfarm animal.  But the real-world need for said code is clearly dead and
gone.  Good riddance.

> Meanwhile, I see that Peter has posted a fix for the immediate problem.
> I propose that Peter should apply his fix in HEAD and v10, and then
> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

That works for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Peter Eisentraut
Date:
On 9/21/17 17:38, Tom Lane wrote:
> Meanwhile, I see that Peter has posted a fix for the immediate problem.
> I propose that Peter should apply his fix in HEAD and v10, and then

done

> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Eisentraut
Date:
On 9/21/17 15:21, Peter Geoghegan wrote:
> On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>> Attached patch shows what I'm getting at. This is untested, since I
>>> don't use Windows. Proceed with caution.
>>
>> Your analysis makes sense, but the patch doesn't work, because "locale"
>> is never set before reading it.
> 
> It was just for illustrative purposes. Seems fine to actually move the
> WIN32 block down to just before the existing TRUST_STRXFRM test, since
> the locale is going to be cached and then immediately reused where we
> return immediately (Windows libc) anyway. This would also be a win for
> clarity, IMV.

I agree.  The attached patch should do it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 9/21/17 17:38, Tom Lane wrote:
>> Meanwhile, I see that Peter has posted a fix for the immediate problem.
>> I propose that Peter should apply his fix in HEAD and v10, and then

> done

>> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

And done.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I agree.  The attached patch should do it.

I see one small issue here: You'll now need to set ssup->comparator
back to NULL before you return early in the Windows' libc case. That
way, a shim comparator (that goes through bttextcmp(), in the case of
text) will be installed within FinishSortSupportFunction(). Other than
that, looks good to me.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Eisentraut
Date:
On 9/22/17 12:25, Peter Geoghegan wrote:
> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> I agree.  The attached patch should do it.
> 
> I see one small issue here: You'll now need to set ssup->comparator
> back to NULL before you return early in the Windows' libc case. That
> way, a shim comparator (that goes through bttextcmp(), in the case of
> text) will be installed within FinishSortSupportFunction(). Other than
> that, looks good to me.

committed accordingly

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

From
Peter Geoghegan
Date:
On Sun, Sep 24, 2017 at 5:04 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/22/17 12:25, Peter Geoghegan wrote:
>> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> I agree.  The attached patch should do it.
>>
>> I see one small issue here: You'll now need to set ssup->comparator
>> back to NULL before you return early in the Windows' libc case. That
>> way, a shim comparator (that goes through bttextcmp(), in the case of
>> text) will be installed within FinishSortSupportFunction(). Other than
>> that, looks good to me.
>
> committed accordingly

Thanks.

I am currently working on a patch for the issues with ICU colcollate
stability as I see them. I should be able to post something later
today or tomorrow. I would appreciate it if you held off on committing
anything there until you've considered what I'll propose.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers