Re: [COMMITTERS] pgsql: Upgrade to Autoconf 2.69 - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [COMMITTERS] pgsql: Upgrade to Autoconf 2.69
Date
Msg-id 20131227185743.GQ22570@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Upgrade to Autoconf 2.69  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Andres Freund wrote:
> Hi,
> 
> On 2013-12-24 12:58:04 -0300, Alvaro Herrera wrote:
> > > Shortly after this patch was committed, buildfarm member locust (running
> > > Mac OS X 10.5 apparently) started failing the pg_upgrade check:
> > > 
> > > command:
"/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/install//Users/pgbuildfarm/Documents/workdir//HEAD/inst/bin/pg_ctl"
-w-l "pg_upgrade_server.log" -D
"/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/data"-o "-p 57632 -b -c
synchronous_commit=off-c fsync=off -c full_page_writes=off  -c listen_addresses='' -c unix_socket_permissions=0700 -c
unix_socket_directories='/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade'"start >>
"pg_upgrade_server.log"2>&1
 
> > > waiting for server to start....LOG:  database system was shut down at 2013-12-19 12:51:16 CET
> > > LOG:  invalid primary checkpoint record
> > > LOG:  invalid secondary checkpoint link in control file
> > > PANIC:  could not locate a valid checkpoint record
> > 
> > Any comment on this problem?  Somehow ReadRecord is unable to find a
> > checkpoint, yet there's no error message to be seen anywhere, whereas
> > pg_resetxlog does report it:
> > 
> > > command:
"/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/install//Users/pgbuildfarm/Documents/workdir//HEAD/inst/bin/pg_resetxlog"
-l000000010000000000000009 "/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/data" >>
"pg_upgrade_utility.log"2>&1
 
> > > pg_resetxlog: could not read from directory "pg_xlog": Invalid argument
> > 
> > I cannot but think xlogreader is at fault.
> > 
> > Regardless of the solution to the Mac OS X problem, ISTM this should be
> > fixed.
> 
> I didn't look at any code, and I won't today, but it doesn't look
> surprising - the report when starting the server above is presumable the
> one in ReadCheckpoint() (or similar) and it probably just reports that
> ReadRecord() didn't return a record.

How is this not surprising?  Surely failing to find a checkpoint record
is not a problem to be taken lightly.

> pg_resetxlog (which doesn't use xlogreader!) reports that it couldn't
> read from directory "pg_xlog", so there's something wonky independently
> from xlogreader.

Yes, most likely there is.  My point is that the LOG messages above
should have logged the system error that caused the checkpoint record to
be unfindable.

> I'd guess that xlog.c read_page callback errors out without reporting
> an error. IIRC we're logging some failures as DEBUG there, because
> they really aren't unexpected, and normally just signal the end of
> wal.

Hmm?  At least, I recall something like a "unexpected pageaddr" message
is sometimes logged when end-of-wal is found.  Why would other error
messages be hidden?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Question about Lockhart's book
Next
From: Christian Convey
Date:
Subject: Re: Question about Lockhart's book