Thread: Windows buildfarm animals are still not happy with abbreviated keys patch

Windows buildfarm animals are still not happy with abbreviated keys patch

From

Peter Geoghegan

Date:

21 January 2015, 22:30:46

Even following Robert's disabling of abbreviated keys on Windows,
buildfarm animals hamerkop, brolga, currawong and bowerbird remain
unhappy, with failing regression tests for "collate" and sometimes
(but not always) "aggregates". Some of these only use the C locale.

I think that "aggregates" does not fail for brolga because it's built
without "locale_t" support (not sure how to interpret MSVC "configure"
output, but I think that the other animals do have such support).  So
maybe this code being executed within btsortsupport_worker(), for the
C locale on Windows is at issue (note that abbreviation is still
disabled):

tss->locale = pg_newlocale_from_collation(collid);

That doesn't explain the other problems, though.
-- 
Peter Geoghegan

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

22 January 2015, 18:57:47

On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Even following Robert's disabling of abbreviated keys on Windows,
> buildfarm animals hamerkop, brolga, currawong and bowerbird remain
> unhappy, with failing regression tests for "collate" and sometimes
> (but not always) "aggregates". Some of these only use the C locale.
>
> I think that "aggregates" does not fail for brolga because it's built
> without "locale_t" support (not sure how to interpret MSVC "configure"
> output, but I think that the other animals do have such support).  So
> maybe this code being executed within btsortsupport_worker(), for the
> C locale on Windows is at issue (note that abbreviation is still
> disabled):
>
> tss->locale = pg_newlocale_from_collation(collid);

Yes, you seem to have rather unwisely changed around the order of the
checks in btsortsupport_worker().  I've just rewritten it heavily to
try to more clearly separate the decision about whether to do
fmgr-elision, and if so which comparator to use, from the decision
about whether to use abbreviation.  Naturally I can't promise that
this is completely right, but I hope that, if it's not, it will at
least be easier to fix.  There's no sanity in removing the
lc_collate_is_c() check from the top of the function and then trying
to remember to exclude that case individually from the multiple places
further down that count on the fact that they'll never be reached in
that case.  This function may even have a third decision to make
someday, and they can't all be intertwined.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

22 January 2015, 19:28:22

On Thu, Jan 22, 2015 at 10:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 21, 2015 at 2:30 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Even following Robert's disabling of abbreviated keys on Windows,
>> buildfarm animals hamerkop, brolga, currawong and bowerbird remain
>> unhappy, with failing regression tests for "collate" and sometimes
>> (but not always) "aggregates". Some of these only use the C locale.
>>
>> I think that "aggregates" does not fail for brolga because it's built
>> without "locale_t" support (not sure how to interpret MSVC "configure"
>> output, but I think that the other animals do have such support).  So
>> maybe this code being executed within btsortsupport_worker(), for the
>> C locale on Windows is at issue (note that abbreviation is still
>> disabled):
>>
>> tss->locale = pg_newlocale_from_collation(collid);
>
> Yes, you seem to have rather unwisely changed around the order of the
> checks in btsortsupport_worker().  I've just rewritten it heavily to
> try to more clearly separate the decision about whether to do
> fmgr-elision, and if so which comparator to use, from the decision
> about whether to use abbreviation.  Naturally I can't promise that
> this is completely right, but I hope that, if it's not, it will at
> least be easier to fix.  There's no sanity in removing the
> lc_collate_is_c() check from the top of the function and then trying
> to remember to exclude that case individually from the multiple places
> further down that count on the fact that they'll never be reached in
> that case.  This function may even have a third decision to make
> someday, and they can't all be intertwined.

This seems to have broken more stuff.  My working hypothesis is that
the culprit is here:
       /*        * There is no special handling of the C locale here, unlike with        * varstr_cmp().  strxfrm() is
usedindifferently.        */
 

As far as I can see, that's just hoping that when the locale is C,
strxfrm() is memcpy().  If it isn't, then you will potentially get
different results when the abbreviated keys stuff is used vs. when it
isn't, because when it isn't, we're going to memcmp(), and when it is,
we're going to memcmp() the results of strxfrm().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

22 January 2015, 20:17:46

On Thu, Jan 22, 2015 at 11:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> This seems to have broken more stuff.  My working hypothesis is that
> the culprit is here:
>
>         /*
>          * There is no special handling of the C locale here, unlike with
>          * varstr_cmp().  strxfrm() is used indifferently.
>          */
>
> As far as I can see, that's just hoping that when the locale is C,
> strxfrm() is memcpy().  If it isn't, then you will potentially get
> different results when the abbreviated keys stuff is used vs. when it
> isn't, because when it isn't, we're going to memcmp(), and when it is,
> we're going to memcmp() the results of strxfrm().

As noted also on the thread Kevin started, I've now pushed a fix for
this and am eagerly awaiting new buildfarm returns.  I wonder if this
might account for the ordering failures we were seeing on Windows
before.  We thought it was a Windows-specific issue, but I bet the
real problem was here:
       if (collid != DEFAULT_COLLATION_OID)       {               if (!OidIsValid(collid))               {
        /*                        * This typically means that the parser could

not resolve a                        * conflict of implicit collations, so report
it that way.                        */                       ereport(ERROR,

(errcode(ERRCODE_INDETERMINATE_COLLATION),                                        errmsg("could not determine
which collation to use for string comparison"),                                        errhint("Use the COLLATE
clause to set the collation explicitly.")));               }
#ifdef HAVE_LOCALE_T               tss->locale = pg_newlocale_from_collation(collid);
#endif       }

Before the abbreviated keys commit, this code only ran if we'd already
determined that lc_collate_is_c(collid) was false.  But that commit
made this also run when that value was true.  So if HAVE_LOCALE_T and
collid != DEFAULT_COLLATION_ID, then tss->locale was getting set.  If
it got set to something that made strxfrm() behave like memcmp(), then
all was well.  If not, then life was bad.  Now you'd hope that would
work out, but maybe not.

Also, suppose HAVE_LOCALE_T was defined but collid ==
DEFAULT_COLLATION_ID.  Then tss->locale didn't get initialized at all,
and bttext_abbrev_convert() just passed whatever stupid value it found
there to strxfrm_l().

Finally, suppose we *didn't* HAVE_LOCALE_T.   Then, the
non-abbreviated comparator knew it should use memcmp(), but the
abbreviated comparator was happy to use strxfrm() even though that
would use the current locale, but the C locale that the user
requested.

Long story short: this was broken.  It may be that when the dust
settles we can try re-enabling this on Windows.   It might work now
that this issue is (hopefully) fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

22 January 2015, 20:55:44

On Thu, Jan 22, 2015 at 12:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Long story short: this was broken.  It may be that when the dust
> settles we can try re-enabling this on Windows.   It might work now
> that this issue is (hopefully) fixed.

Uggh.  That still wasn't right; I've pushed commit
d060e07fa919e0eb681e2fa2cfbe63d6c40eb2cf to try to fix it.

Stay tuned for more exciting dispatches from the department of abbreviated keys!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Peter Geoghegan

Date:

22 January 2015, 21:00:40

On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Stay tuned for more exciting dispatches from the department of abbreviated keys!

I certainly suspected that we'd have problems with Windows, but
nothing this bad.

-- 
Peter Geoghegan

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

22 January 2015, 23:34:35

On Thu, Jan 22, 2015 at 1:00 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Jan 22, 2015 at 9:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Stay tuned for more exciting dispatches from the department of abbreviated keys!
>
> I certainly suspected that we'd have problems with Windows, but
> nothing this bad.

This isn't really Windows-specific.  The root of the problem is that
when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
key even though memcmp() is the authoritative comparator in that case.
Exactly which platforms happened to blow up as a result of that is
kind of beside the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Peter Geoghegan

Date:

23 January 2015, 01:51:58

On Thu, Jan 22, 2015 at 12:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> This isn't really Windows-specific.  The root of the problem is that
> when LC_COLLATE=C you were trying to use strxfrm() for the abbreviated
> key even though memcmp() is the authoritative comparator in that case.
> Exactly which platforms happened to blow up as a result of that is
> kind of beside the point.

I don't see how that could be a problem. Even if the strxfrm() blob
wasn't identical to the original string (I think it should always be,
accept maybe on MacOSX), it's still reasonable to assume that there is
equivalent behavior across the C locale and locales with collations
like en_US.UTF-8. It wasn't as if I was using strxfrm() within
bttextfastcmp_c() -- just within bttext_abbrev_convert(), to form an
abbreviated key for text that uses the C locale.

The C locale is just another locale - AFAICT, the only reason we have
treated it differently in PostgreSQL is because that tended to avoid
needing to copy and NUL-terminate, which strcoll() requires. It might
be that that doesn't work out for some reason (but not because
strxfrm() will not have behavior equivalent to memcpy() with the C
locale -- I was *not* relying on that).

That having been said, it's clearer to continue to handle each case (C
locale vs other locales) separately within the new
bttext_abbrev_convert() function, just to be consistent, but also to
avoid NUL-terminating the text strings to pass to strxfrm(), which as
you point out is an avoidable cost. So, I'm not asking you to restore
the terser uniform use of strxfrm() we had before, but, for what it's
worth, that was not the real issue. The real issue was that strxfrm()
spuriously used the wrong locale, as you said. Provided strxfrm() had
the correct locale (the C locale), then AFAICT it ought to have been
fine, regardless of whether or not it then behave exactly equivalently
to memcpy().

-- 
Peter Geoghegan

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

23 January 2015, 20:34:21

On Thu, Jan 22, 2015 at 5:51 PM, Peter Geoghegan <pg@heroku.com> wrote:
> That having been said, it's clearer to continue to handle each case (C
> locale vs other locales) separately within the new
> bttext_abbrev_convert() function, just to be consistent, but also to
> avoid NUL-terminating the text strings to pass to strxfrm(), which as
> you point out is an avoidable cost. So, I'm not asking you to restore
> the terser uniform use of strxfrm() we had before, but, for what it's
> worth, that was not the real issue. The real issue was that strxfrm()
> spuriously used the wrong locale, as you said. Provided strxfrm() had
> the correct locale (the C locale), then AFAICT it ought to have been
> fine, regardless of whether or not it then behave exactly equivalently
> to memcpy().

I don't agree.  On a system where HAVE_LOCALE_T is not defined, there
is *no way* to get strxfrm() to behave like memcpy(), because we're
not passing a locale to it.  Clearly, strxfrm() is going to do a
locale-aware transformation in that case whether the user wrote
collate "C" or not.  The comments in regc_pg_locale.c explain:
* 1. In C/POSIX collations, we use hard-wired code.  We can't depend on* the <ctype.h> functions since those will obey
LC_CTYPE. Note that these* collations don't give a fig about multibyte characters.** 2. In the "default" collation
(whichis supposed to obey LC_CTYPE):** 2a. When working in UTF8 encoding, we use the <wctype.h> functions if*
available. This assumes that every platform uses Unicode codepoints* directly as the wchar_t representation of Unicode.
On some platforms* wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.** 2b. In all other
encodings,or on machines that lack <wctype.h>, we use* the <ctype.h> functions for pg_wchar values up to 255, and punt
forvalues* above that.  This is only 100% correct in single-byte encodings such as* LATINn.  However, non-Unicode
multibyteencodings are mostly Far Eastern* character sets for which the properties being tested here aren't very*
relevantfor higher code values anyway.  The difficulty with using the* <wctype.h> functions with non-Unicode multibyte
encodingsis that we can* have no certainty that the platform's wchar_t representation matches* what we do in pg_wchar
conversions.**3. Other collations are only supported on platforms that HAVE_LOCALE_T.* Here, we use the
locale_t-extendedforms of the <wctype.h> and <ctype.h>* functions, under exactly the same cases as #2.

In other words, even on systems that don't HAVE_LOCALE_T, we still
have to support the default collation and the C collation, and they
have to behave differently.  There's no way to make that work using
only strxfrm(), because nothing gets passed to that function to tell
it which of those two things it is supposed to do.

Now even if the above were not an issue, for example because we
dropped support for systems where !HAVE_LOCALE_T, I think it would be
poor form to depend on strxfrm_l() to behave like memcpy() where we
could just as easily call memcpy() and be *sure* that it was going to
do what we wanted.  Much of writing good code is figuring out what
could go wrong and then figuring out how to prevent it, and relying on
strxfrm_l() would be an unnecessary risk regardless.  Given the
existence of !HAVE_LOCALE_T systems, it's just plain broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

26 January 2015, 22:05:54

On Fri, Jan 23, 2015 at 12:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> In other words, even on systems that don't HAVE_LOCALE_T, we still
> have to support the default collation and the C collation, and they
> have to behave differently.  There's no way to make that work using
> only strxfrm(), because nothing gets passed to that function to tell
> it which of those two things it is supposed to do.
>
> Now even if the above were not an issue, for example because we
> dropped support for systems where !HAVE_LOCALE_T, I think it would be
> poor form to depend on strxfrm_l() to behave like memcpy() where we
> could just as easily call memcpy() and be *sure* that it was going to
> do what we wanted.  Much of writing good code is figuring out what
> could go wrong and then figuring out how to prevent it, and relying on
> strxfrm_l() would be an unnecessary risk regardless.  Given the
> existence of !HAVE_LOCALE_T systems, it's just plain broken.

Now that these issues are fixed and the buildfarm is green again, I'm
going to try re-enabling this optimization on Windows.  My working
theory is that disabling that categorically was a mis-diagnosis of the
real problem, and that now that the issues mentioned above are cleaned
up, it'll just work.  That might not be right, but I think it's worth
a try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Peter Geoghegan

Date:

26 January 2015, 22:10:24

On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Now that these issues are fixed and the buildfarm is green again, I'm
> going to try re-enabling this optimization on Windows.  My working
> theory is that disabling that categorically was a mis-diagnosis of the
> real problem, and that now that the issues mentioned above are cleaned
> up, it'll just work.  That might not be right, but I think it's worth
> a try.

Right. We're only supporting the C locale on Windows. How could
Windows possibly be broken if locale-related functions like strxfrm()
and strcoll() are not even called?

-- 
Peter Geoghegan

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

26 January 2015, 22:21:19

On Mon, Jan 26, 2015 at 2:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Now that these issues are fixed and the buildfarm is green again, I'm
>> going to try re-enabling this optimization on Windows.  My working
>> theory is that disabling that categorically was a mis-diagnosis of the
>> real problem, and that now that the issues mentioned above are cleaned
>> up, it'll just work.  That might not be right, but I think it's worth
>> a try.
>
> Right. We're only supporting the C locale on Windows. How could
> Windows possibly be broken if locale-related functions like strxfrm()
> and strcoll() are not even called?

That's what I hope to find out.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Michael Paquier

Date:

27 January 2015, 05:53:08

On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> That's what I hope to find out.  :-)
Buildfarm seems happy now. I just gave a try to that on one of my
small Windows VMs and compared the performance with 9.4 for this
simple test case when building with MSVC 2010:
create table aa as select random()::text as a, 'filler filler filler'
as b a from generate_series(1,1000000);
create index aai in aa(a):
On 9.4, the index creation took 26.5s, while on master it took 18s.
That's nice, particularly for things like a restore from a dump.
-- 
Michael

Re: Windows buildfarm animals are still not happy with abbreviated keys patch

From

Robert Haas

Date:

27 January 2015, 15:40:45

On Mon, Jan 26, 2015 at 9:52 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> That's what I hope to find out.  :-)
> Buildfarm seems happy now. I just gave a try to that on one of my
> small Windows VMs and compared the performance with 9.4 for this
> simple test case when building with MSVC 2010:
> create table aa as select random()::text as a, 'filler filler filler'
> as b a from generate_series(1,1000000);
> create index aai in aa(a):
> On 9.4, the index creation took 26.5s, while on master it took 18s.
> That's nice, particularly for things like a restore from a dump.

Cool.  That's a little bit smaller win than I would have expected
given my Linux results, but it's certainly respectable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company