Thread: pg_dump/pg_restore seem broken on hamerkop

pg_dump/pg_restore seem broken on hamerkop

From
Tom Lane
Date:
Buildfarm member hamerkop has been failing in the pg_upgrade regression
test for the last several days.  The problem looks like this:

command: "C:/buildfarm/build_root/HEAD/pgsql.build/contrib/pg_upgrade/tmp_check/install/bin/pg_restore" --port 50432
--username"Administrator" --exit-on-error --verbose --dbname "postgres" "pg_upgrade_dump_12145.custom" >>
"pg_upgrade_dump_12145.log"2>&1
 
pg_restore: connecting to database for restore
pg_restore: [archiver (db)] Error while INITIALIZING:
pg_restore: [archiver (db)] could not execute query: ERROR:  invalid byte sequence for encoding "UTF8": 0x93

I can't help noticing that this started immediately after commit
0eea804 "pg_dump: Reduce use of global variables".  No idea why
the issue is only showing up on this one animal.
        regards, tom lane



Re: pg_dump/pg_restore seem broken on hamerkop

From
Tatsuo Ishii
Date:
> Buildfarm member hamerkop has been failing in the pg_upgrade regression
> test for the last several days.  The problem looks like this:
> 
> command: "C:/buildfarm/build_root/HEAD/pgsql.build/contrib/pg_upgrade/tmp_check/install/bin/pg_restore" --port 50432
--username"Administrator" --exit-on-error --verbose --dbname "postgres" "pg_upgrade_dump_12145.custom" >>
"pg_upgrade_dump_12145.log"2>&1
 
> pg_restore: connecting to database for restore
> pg_restore: [archiver (db)] Error while INITIALIZING:
> pg_restore: [archiver (db)] could not execute query: ERROR:  invalid byte sequence for encoding "UTF8": 0x93
> 
> I can't help noticing that this started immediately after commit
> 0eea804 "pg_dump: Reduce use of global variables".  No idea why
> the issue is only showing up on this one animal.

I guess one of possibilities is there's garbage in memory which is
related to restore the process. If so, coverity may be able to find
something. The last coverity analysis was Oct 12. The commit 0eea804
was made on Oct 14. So let's see what new coverity scan reports...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: pg_dump/pg_restore seem broken on hamerkop

From
Tatsuo Ishii
Date:
>> Buildfarm member hamerkop has been failing in the pg_upgrade regression
>> test for the last several days.  The problem looks like this:
>> 
>> command: "C:/buildfarm/build_root/HEAD/pgsql.build/contrib/pg_upgrade/tmp_check/install/bin/pg_restore" --port 50432
--username"Administrator" --exit-on-error --verbose --dbname "postgres" "pg_upgrade_dump_12145.custom" >>
"pg_upgrade_dump_12145.log"2>&1
 
>> pg_restore: connecting to database for restore
>> pg_restore: [archiver (db)] Error while INITIALIZING:
>> pg_restore: [archiver (db)] could not execute query: ERROR:  invalid byte sequence for encoding "UTF8": 0x93
>> 
>> I can't help noticing that this started immediately after commit
>> 0eea804 "pg_dump: Reduce use of global variables".  No idea why
>> the issue is only showing up on this one animal.
> 
> I guess one of possibilities is there's garbage in memory which is
> related to restore the process. If so, coverity may be able to find
> something. The last coverity analysis was Oct 12. The commit 0eea804
> was made on Oct 14. So let's see what new coverity scan reports...

The latest coverity scan report did not include paticular defects
related to pg_dump/pg_restore...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: pg_dump/pg_restore seem broken on hamerkop

From
Alvaro Herrera
Date:
Tatsuo Ishii wrote:
> > Buildfarm member hamerkop has been failing in the pg_upgrade regression
> > test for the last several days.  The problem looks like this:
> > 
> > command: "C:/buildfarm/build_root/HEAD/pgsql.build/contrib/pg_upgrade/tmp_check/install/bin/pg_restore" --port
50432--username "Administrator" --exit-on-error --verbose --dbname "postgres" "pg_upgrade_dump_12145.custom" >>
"pg_upgrade_dump_12145.log"2>&1
 
> > pg_restore: connecting to database for restore
> > pg_restore: [archiver (db)] Error while INITIALIZING:
> > pg_restore: [archiver (db)] could not execute query: ERROR:  invalid byte sequence for encoding "UTF8": 0x93
> > 
> > I can't help noticing that this started immediately after commit
> > 0eea804 "pg_dump: Reduce use of global variables".  No idea why
> > the issue is only showing up on this one animal.
> 
> I guess one of possibilities is there's garbage in memory which is
> related to restore the process.

That sounds most likely.  The complete error in hamerkop's log is:

pg_restore: connecting to database for restore
pg_restore: [archiver (db)] Error while INITIALIZING:
pg_restore: [archiver (db)] could not execute query: ERROR:  invalid byte sequence for encoding "UTF8": 0x93   Command
was:-- Started on 2014-10-26 03:06:00 “Œ‹ž (•W€Žž)
 

This "Started on" business comes from pg_backup_archiver.c, which has
   if (AH->public.verbose)       dumpTimestamp(AH, "Started on", AH->createDate);

where dumpTimestamp is

/** dumpTimestamp*/
static void
dumpTimestamp(ArchiveHandle *AH, const char *msg, time_t tim)
{   char        buf[64];
   if (strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M:%S %z", localtime(&tim)) != 0)       ahprintf(AH, "-- %s %s\n\n",
msg,buf);
 
}

So this seems related to the %z part of the strftime() call.  I have no
explanation for this failure ATM; maybe pg_restore is failing to set the
locale properly?  I also notice pg_restore.c previously included
pg_backup_archiver.h (which in turn includes <time.h>); strftime
requires <time.h> so maybe this is causing a problem, but since
pg_restore.c itself is not calling strftime, I don't see how this would
be related.

[Some more code and git-log reading later]  I see that the %z is a very
recent addition: it only got there as of commit ad5d46a449, of September
5th ... and now I also see that hamerkop's last green run before the
failure, on Oct 13rd, did *not* include the pg_upgrade check.  So I'm
thinking this was broken much earlier than 0eea804.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_dump/pg_restore seem broken on hamerkop

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So this seems related to the %z part of the strftime() call.  I have no
> explanation for this failure ATM; maybe pg_restore is failing to set the
> locale properly?  I also notice pg_restore.c previously included
> pg_backup_archiver.h (which in turn includes <time.h>); strftime
> requires <time.h> so maybe this is causing a problem, but since
> pg_restore.c itself is not calling strftime, I don't see how this would
> be related.

Hm.  %z ought not be locale-dependent ... however, it has a bigger
problem, which is that it's a C99-ism.  It's not there in SUSv2,
which is our normal baseline for what's portable.  I think we need
to get rid of that.  %Z should be portable.

(Is it possible that Windows' strftime() reads %z as doing something
other than what C99 says?)

> [Some more code and git-log reading later]  I see that the %z is a very
> recent addition: it only got there as of commit ad5d46a449, of September
> 5th ... and now I also see that hamerkop's last green run before the
> failure, on Oct 13rd, did *not* include the pg_upgrade check.  So I'm
> thinking this was broken much earlier than 0eea804.

Ooohh ... you are right, the first failing build involved not only
the pg_dump refactoring commit, but an upgrade in the buildfarm script
that hamerkop was using (from 4.4 to 4.14).  So it's entirely possible
this issue was already there and we just weren't seeing it tested.
        regards, tom lane



Re: pg_dump/pg_restore seem broken on hamerkop

From
Tom Lane
Date:
I wrote:
> Hm.  %z ought not be locale-dependent ... however, it has a bigger
> problem, which is that it's a C99-ism.  It's not there in SUSv2,
> which is our normal baseline for what's portable.  I think we need
> to get rid of that.  %Z should be portable.

> (Is it possible that Windows' strftime() reads %z as doing something
> other than what C99 says?)

A bit of googling leads me to Microsoft reference material saying that
their strftime treats %z and %Z alike.  So in point of fact, the
assumption underlying commit ad5d46a4494b0b48 was flat out wrong.
Switching to %z doesn't get you out of the problem noted in the
comments it removed:
   /*    * We don't print the timezone on Win32, because the names are long and    * localized, which means they may
containcharacters in various random    * encodings; this has been seen to cause encoding errors when reading the    *
dumpscript.    */
 

I'm going to go revert most of that commit and make the code like it
was before:
   if (strftime(buf, sizeof(buf),
#ifndef WIN32                "%Y-%m-%d %H:%M:%S %Z",
#else                "%Y-%m-%d %H:%M:%S",
#endif                localtime(&now)) != 0)

If somebody wants to have timezones printed on Windows, they can
try again, but they're gonna have to work harder than this.

It's possible that fixing this will not fix whatever's bothering
hamerkop, but given the lack of backward-portability of %z, this
code has got to go anyway.
        regards, tom lane



Re: pg_dump/pg_restore seem broken on hamerkop

From
Tom Lane
Date:
I wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> [Some more code and git-log reading later]  I see that the %z is a very
>> recent addition: it only got there as of commit ad5d46a449, of September
>> 5th ... and now I also see that hamerkop's last green run before the
>> failure, on Oct 13rd, did *not* include the pg_upgrade check.  So I'm
>> thinking this was broken much earlier than 0eea804.

> Ooohh ... you are right, the first failing build involved not only
> the pg_dump refactoring commit, but an upgrade in the buildfarm script
> that hamerkop was using (from 4.4 to 4.14).  So it's entirely possible
> this issue was already there and we just weren't seeing it tested.

hamerkop is still failing in its runs done today.  However, I'm not sure
that that proves anything about our hoped-for fix, because the commit SHAs
it's showing on the buildfarm status page are still from Oct 13.  It looks
like hamerkop has failed to pull any git updates since it was migrated to
the new buildfarm script version.
        regards, tom lane



Re: pg_dump/pg_restore seem broken on hamerkop

From
Andrew Dunstan
Date:
On 10/29/2014 12:26 AM, Tom Lane wrote:
> I wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> [Some more code and git-log reading later]  I see that the %z is a very
>>> recent addition: it only got there as of commit ad5d46a449, of September
>>> 5th ... and now I also see that hamerkop's last green run before the
>>> failure, on Oct 13rd, did *not* include the pg_upgrade check.  So I'm
>>> thinking this was broken much earlier than 0eea804.
>> Ooohh ... you are right, the first failing build involved not only
>> the pg_dump refactoring commit, but an upgrade in the buildfarm script
>> that hamerkop was using (from 4.4 to 4.14).  So it's entirely possible
>> this issue was already there and we just weren't seeing it tested.
> hamerkop is still failing in its runs done today.  However, I'm not sure
> that that proves anything about our hoped-for fix, because the commit SHAs
> it's showing on the buildfarm status page are still from Oct 13.  It looks
> like hamerkop has failed to pull any git updates since it was migrated to
> the new buildfarm script version.
>
>             


This machine appears to be misconfigured for git. I will email the owner 
about fixing it.

cheers

andrew




Re: pg_dump/pg_restore seem broken on hamerkop

From
Bruce Momjian
Date:
On Sun, Oct 26, 2014 at 08:29:19PM -0400, Tom Lane wrote:
> I wrote:
> > Hm.  %z ought not be locale-dependent ... however, it has a bigger
> > problem, which is that it's a C99-ism.  It's not there in SUSv2,
> > which is our normal baseline for what's portable.  I think we need
> > to get rid of that.  %Z should be portable.
> 
> > (Is it possible that Windows' strftime() reads %z as doing something
> > other than what C99 says?)
> 
> A bit of googling leads me to Microsoft reference material saying that
> their strftime treats %z and %Z alike.  So in point of fact, the
> assumption underlying commit ad5d46a4494b0b48 was flat out wrong.
> Switching to %z doesn't get you out of the problem noted in the
> comments it removed:
> 
>     /*
>      * We don't print the timezone on Win32, because the names are long and
>      * localized, which means they may contain characters in various random
>      * encodings; this has been seen to cause encoding errors when reading the
>      * dump script.
>      */
> 
> I'm going to go revert most of that commit and make the code like it
> was before:

Thanks.  The major goal of the patch was to get a timezone designation
in there, and you have done that were possible, which is non-Windows. 
Your C comment clearly explained why Windows is a problem, and
centralized the code.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +