Thread: Re: [PATCHES] Fix "database is ready" race condition

Re: [PATCHES] Fix "database is ready" race condition

From
Tom Lane
Date:
Markus Schiltknecht <markus@bluegap.ch> writes:
> is there a good reason to print the "database system is ready" message 
> in StartupXLOG() in xact.c? It has a) nothing to do with xlog and b) 
> opens a small race condition: the message gets printed, while it still 
> take some CPU cycles until the postmaster really gets the SIGCHLD signal 
> and sets StartupPID = 0. If you (or rather: an automated test program) 
> try to connect within this timespan, you get a "database is starting up" 
> error, which clearly contradicts the "is ready" message.

I don't think there's any compelling reason for having that log message
in its current form.  What about redefining it to mean "postmaster is
ready to accept connections" --- either with that wording, or keeping
the old wording?  Then we could just put it in one place in postmaster.c
and be done.  I think your proposed patch is overcomplicated by trying
to have it still come out in bootstrap/standalone cases.  For a
standalone backend, getting a prompt is what tells you it's ready ;-)
        regards, tom lane


Re: [PATCHES] Fix "database is ready" race condition

From
Markus Schiltknecht
Date:
Hi,

Tom Lane wrote:
> I don't think there's any compelling reason for having that log message
> in its current form.  What about redefining it to mean "postmaster is
> ready to accept connections" --- either with that wording, or keeping
> the old wording?  Then we could just put it in one place in postmaster.c
> and be done.

Sure, I'm fine with that wording. As long as it's in the reaper() and 
thus the postmaster is really ready then it says so, I'm happy.

Markus



Re: [PATCHES] Fix "database is ready" race condition

From
"Simon Riggs"
Date:
On Sun, 2007-02-04 at 14:15 -0500, Tom Lane wrote:
> Markus Schiltknecht <markus@bluegap.ch> writes:
> > is there a good reason to print the "database system is ready" message 
> > in StartupXLOG() in xact.c? It has a) nothing to do with xlog and b) 
> > opens a small race condition: the message gets printed, while it still 
> > take some CPU cycles until the postmaster really gets the SIGCHLD signal 
> > and sets StartupPID = 0. If you (or rather: an automated test program) 
> > try to connect within this timespan, you get a "database is starting up" 
> > error, which clearly contradicts the "is ready" message.
> 
> I don't think there's any compelling reason for having that log message
> in its current form.  What about redefining it to mean "postmaster is
> ready to accept connections" --- either with that wording, or keeping
> the old wording?  Then we could just put it in one place in postmaster.c
> and be done.  I think your proposed patch is overcomplicated by trying
> to have it still come out in bootstrap/standalone cases.  For a
> standalone backend, getting a prompt is what tells you it's ready ;-)

I'm OK with moving the message to be executed from another place, but I
have some comments on the changed wording.

Firstly, "Database system" is great general wording. "Postmaster" only
means something if you know the architecture of PostgreSQL, which most
people don't. 

If we did change the wording, I'd want to have separate messages for the
two events of
- database can now accept connections
- recovery is complete

One of the TODO items is to allow the dbserver to be available for
read-only queries while still recovering, so any change to the wording
should be made with that in mind, so we don't need to change it too
often.

My suggestions would be
1. "Database system has completed recovery" and
2. "Database system is ready to accept connections"

Currently those messages would occur in that order and be issued by
StartupXLOG() for (1) and postmaster for (2). In the future they may be
issued in a different order.

If we stick with only a single message, we should keep it the same as
now, wherever the code and whatever the exact timing of its execution.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: [PATCHES] Fix "database is ready" race condition

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> My suggestions would be
> 1. "Database system has completed recovery" and
> 2. "Database system is ready to accept connections"

The second was in fact the wording I had in mind, sorry for not being
clear.  As to the first, the question is whether a log message at that
specific point has any use.  It's not "completion of recovery", exactly,
since we go through that spot whether it's a normal startup or recovery
(and there is another log message when we actually do any WAL replay).
AFAICS it's a fairly randomly chosen place in a long sequence of Things
That Must Happen.  Part of the reason Markus is seeing a race condition
is that this isn't the last thing done before the startup subprocess
exits --- see BootstrapMain.  So I'm for just getting rid of it.
        regards, tom lane


Re: [PATCHES] Fix "database is ready" race condition

From
Jim Nasby
Date:
On Feb 5, 2007, at 8:19 AM, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> My suggestions would be
>> 1. "Database system has completed recovery" and
>> 2. "Database system is ready to accept connections"
>
> The second was in fact the wording I had in mind, sorry for not being
> clear.  As to the first, the question is whether a log message at that
> specific point has any use.  It's not "completion of recovery",  
> exactly,
> since we go through that spot whether it's a normal startup or  
> recovery
> (and there is another log message when we actually do any WAL replay).
> AFAICS it's a fairly randomly chosen place in a long sequence of  
> Things
> That Must Happen.  Part of the reason Markus is seeing a race  
> condition
> is that this isn't the last thing done before the startup subprocess
> exits --- see BootstrapMain.  So I'm for just getting rid of it.

It is useful to know if the database had to do recovery, though, and  
if it did do recovery, it would be useful to know how long it took if  
the subsequent startup took a real amount of time.

BTW, this is a real problem I've seen on a database with 500k entries  
in pg_class... it takes several minutes to finish starting after the  
'Postmaster is ready' message.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: [PATCHES] Fix "database is ready" race condition

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On Feb 5, 2007, at 8:19 AM, Tom Lane wrote:
> >"Simon Riggs" <simon@2ndquadrant.com> writes:
> >>My suggestions would be
> >>1. "Database system has completed recovery" and
> >>2. "Database system is ready to accept connections"
> >
> >The second was in fact the wording I had in mind, sorry for not being
> >clear.  As to the first, the question is whether a log message at
> >that specific point has any use.  It's not "completion of recovery",
> >exactly, since we go through that spot whether it's a normal startup
> >or  recovery (and there is another log message when we actually do
> >any WAL replay).  AFAICS it's a fairly randomly chosen place in a
> >long sequence of  Things That Must Happen.  Part of the reason Markus
> >is seeing a race  condition is that this isn't the last thing done
> >before the startup subprocess exits --- see BootstrapMain.  So I'm
> >for just getting rid of it.
> 
> It is useful to know if the database had to do recovery, though, and  
> if it did do recovery, it would be useful to know how long it took if  
> the subsequent startup took a real amount of time.

That's already logged elsewhere, so there's no loss of functionality.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support