Thread: Re: [COMMITTERS] pgsql: Check compulsory parameters in recovery.conf in standby_mode, per

On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
> Log Message:
> -----------
> Check compulsory parameters in recovery.conf in standby_mode, per docs.

On the recent discussion (*1), some people argued that specifying neither
primary_conninfo nor restore_command in the standby mode is not unreasonable,
and we reached the consensus that the setting should be allowed. So this
commit doesn't reflect the discussion. How about reverting the commit,
and restarting the discussion if you have complaint against the consensus?

(*1) http://archives.postgresql.org/pgsql-hackers/2010-03/msg01306.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Mon, 2010-04-05 at 17:08 +0900, Fujii Masao wrote:
> On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
> > Log Message:
> > -----------
> > Check compulsory parameters in recovery.conf in standby_mode, per docs.
> 
> On the recent discussion (*1), some people argued that specifying neither
> primary_conninfo nor restore_command in the standby mode is not unreasonable,
> and we reached the consensus that the setting should be allowed. So this
> commit doesn't reflect the discussion. How about reverting the commit,
> and restarting the discussion if you have complaint against the consensus?
> 
> (*1) http://archives.postgresql.org/pgsql-hackers/2010-03/msg01306.php

This change was made because on a separate thread Robert Haas complained
about the server hanging when only standby_mode was set. I verified the
hang and "fixed" the issue. I have added a comment to the previous
thread to restart the discussion.

If the capability that the server sit quietly doing nothing for ever was
somehow important then we should document that both in code and in docs.

I am very disconcerted that there are still no docs whatsoever to
describe how the server works in these new modes.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs wrote:
> I am very disconcerted that there are still no docs whatsoever to
> describe how the server works in these new modes.

I did add a few paragraphs last week, see
http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#STANDBY-SERVER-OPERATION.
It doesn't explicitly talk about that configuration where both
primary_conninfo and restore_command are missing (or misconfigured), but
it does say that it polls pg_xlog. How does that read to you?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Mon, 2010-04-05 at 19:01 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I am very disconcerted that there are still no docs whatsoever to
> > describe how the server works in these new modes.
> 
> I did add a few paragraphs last week, see
> http://developer.postgresql.org/pgdocs/postgres/warm-standby.html#STANDBY-SERVER-OPERATION.
> It doesn't explicitly talk about that configuration where both
> primary_conninfo and restore_command are missing (or misconfigured), but
> it does say that it polls pg_xlog. How does that read to you?

It is better, though it might mislead others into thinking I mean
sufficient, which it is not. I do not mean to be rude, but we must be
clear that a major feature requires matching documentation.

Looking through the code some more I note that their are two timing
related parameters that are hardcoded into XLogPageRead(). At the very
least such things should be #defines, though one of them was previously
configurable using pg_standby, so I would like to see them both
accessible to user tuning.

I was also surprised to note that the Startup process is not signaled by
WALReceiver when new WAL is received, so it will continue sleeping even
though it has work to do.

-- Simon Riggs           www.2ndQuadrant.com



On Mon, 2010-04-05 at 17:08 +0900, Fujii Masao wrote:
> On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
> > Log Message:
> > -----------
> > Check compulsory parameters in recovery.conf in standby_mode, per docs.
>
> On the recent discussion (*1), some people argued that specifying neither
> primary_conninfo nor restore_command in the standby mode is not unreasonable,
> and we reached the consensus that the setting should be allowed. So this
> commit doesn't reflect the discussion. How about reverting the commit,
> and restarting the discussion if you have complaint against the consensus?

The attached patch changes the messages and downgrades FATAL to WARNING.

Comments?

--
 Simon Riggs           www.2ndQuadrant.com

Attachment
On Mon, 2010-04-05 at 19:29 +0100, Simon Riggs wrote:

> Looking through the code some more I note that their are two timing
> related parameters that are hardcoded into XLogPageRead(). At the very
> least such things should be #defines, though one of them was previously
> configurable using pg_standby, so I would like to see them both
> accessible to user tuning.

...the code says "we want to react quickly when the next WAL record
arrives" and then sleeps for 100ms.

> I was also surprised to note that the Startup process is not signaled by
> WALReceiver when new WAL is received, so it will continue sleeping even
> though it has work to do.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs escribió:
> On Mon, 2010-04-05 at 17:08 +0900, Fujii Masao wrote:
> > On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
> > > Log Message:
> > > -----------
> > > Check compulsory parameters in recovery.conf in standby_mode, per docs.
> > 
> > On the recent discussion (*1), some people argued that specifying neither
> > primary_conninfo nor restore_command in the standby mode is not unreasonable,
> > and we reached the consensus that the setting should be allowed. So this
> > commit doesn't reflect the discussion. How about reverting the commit,
> > and restarting the discussion if you have complaint against the consensus?
> 
> The attached patch changes the messages and downgrades FATAL to WARNING.
> 
> Comments?

Please note that errdetail must be a complete sentence.




-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


On Mon, 2010-04-05 at 15:02 -0400, Alvaro Herrera wrote:
> Simon Riggs escribió:
> > On Mon, 2010-04-05 at 17:08 +0900, Fujii Masao wrote:
> > > On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
> > > > Log Message:
> > > > -----------
> > > > Check compulsory parameters in recovery.conf in standby_mode, per docs.
> > >
> > > On the recent discussion (*1), some people argued that specifying neither
> > > primary_conninfo nor restore_command in the standby mode is not unreasonable,
> > > and we reached the consensus that the setting should be allowed. So this
> > > commit doesn't reflect the discussion. How about reverting the commit,
> > > and restarting the discussion if you have complaint against the consensus?
> >
> > The attached patch changes the messages and downgrades FATAL to WARNING.
> >
> > Comments?
>
> Please note that errdetail must be a complete sentence.

Better?

--
 Simon Riggs           www.2ndQuadrant.com

Attachment
On Mon, Apr 5, 2010 at 3:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2010-04-05 at 15:02 -0400, Alvaro Herrera wrote:
>> Simon Riggs escribió:
>> > On Mon, 2010-04-05 at 17:08 +0900, Fujii Masao wrote:
>> > > On Sat, Apr 3, 2010 at 6:50 AM, Simon Riggs <sriggs@postgresql.org> wrote:
>> > > > Log Message:
>> > > > -----------
>> > > > Check compulsory parameters in recovery.conf in standby_mode, per docs.
>> > >
>> > > On the recent discussion (*1), some people argued that specifying neither
>> > > primary_conninfo nor restore_command in the standby mode is not unreasonable,
>> > > and we reached the consensus that the setting should be allowed. So this
>> > > commit doesn't reflect the discussion. How about reverting the commit,
>> > > and restarting the discussion if you have complaint against the consensus?
>> >
>> > The attached patch changes the messages and downgrades FATAL to WARNING.
>> >
>> > Comments?
>>
>> Please note that errdetail must be a complete sentence.
>
> Better?

I haven't tested this (bad words to start out with, I know) but
assuming this message is going to print once on startup rather than
repeatedly, +1 from me.

...Robert


Simon Riggs <simon@2ndQuadrant.com> writes:
>                      (errmsg("recovery command file \"%s\" specified neither primary_conninfo nor restore_command",
> -                            RECOVERY_COMMAND_FILE)));
> +                            RECOVERY_COMMAND_FILE),
> +                     errdetail("The database server will regularly poll the pg_xlog subdirectory to check for files
placedthere.")));
 

That's not a "detail", as it has nothing to do with details of the error
condition.  It might pass muster as a "hint", though I'm not entirely
sure what the point is.
        regards, tom lane


On Mon, 2010-04-05 at 15:58 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> >                      (errmsg("recovery command file \"%s\" specified neither primary_conninfo nor
restore_command",
> > -                            RECOVERY_COMMAND_FILE)));
> > +                            RECOVERY_COMMAND_FILE),
> > +                     errdetail("The database server will regularly poll the pg_xlog subdirectory to check for
filesplaced there.")));
 
> 
> That's not a "detail", as it has nothing to do with details of the error
> condition.  It might pass muster as a "hint", though I'm not entirely
> sure what the point is.

The server sits around doing nothing and the point of the message is to
explain why that is and give a hint as to what might change that
situation, if anything.

-- Simon Riggs           www.2ndQuadrant.com



On Mon, Apr 5, 2010 at 4:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2010-04-05 at 15:58 -0400, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>> >                                     (errmsg("recovery command file \"%s\" specified neither primary_conninfo nor
restore_command",
>> > -                                                   RECOVERY_COMMAND_FILE)));
>> > +                                                   RECOVERY_COMMAND_FILE),
>> > +                                    errdetail("The database server will regularly poll the pg_xlog subdirectory
tocheck for files placed there."))); 
>>
>> That's not a "detail", as it has nothing to do with details of the error
>> condition.  It might pass muster as a "hint", though I'm not entirely
>> sure what the point is.
>
> The server sits around doing nothing and the point of the message is to
> explain why that is and give a hint as to what might change that
> situation, if anything.

Yeah, I think that's good information to provide.  But I have to admit
that I too considered suggesting changing it from errdetail to
errhint, so the fact that Tom had the same thought suggests to me that
that's probably better.

...Robert


On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I was also surprised to note that the Startup process is not signaled by
> WALReceiver when new WAL is received, so it will continue sleeping even
> though it has work to do.

I don't think this is so useful in 9.0 since synchronous replication
(i.e., transaction commit wait for WAL to be replayed by the standby)
is not supported.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Simon Riggs wrote:
> On Mon, 2010-04-05 at 19:29 +0100, Simon Riggs wrote:
> 
>> Looking through the code some more I note that their are two timing
>> related parameters that are hardcoded into XLogPageRead(). At the very
>> least such things should be #defines, though one of them was previously
>> configurable using pg_standby, so I would like to see them both
>> accessible to user tuning.
> 
> ...the code says "we want to react quickly when the next WAL record
> arrives" and then sleeps for 100ms.

The comment continues ", so sleep only a bit". That's in comparison to
the 5 s wait when polling the archive.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Fujii Masao wrote:
> On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I was also surprised to note that the Startup process is not signaled by
>> WALReceiver when new WAL is received, so it will continue sleeping even
>> though it has work to do.
> 
> I don't think this is so useful in 9.0 since synchronous replication
> (i.e., transaction commit wait for WAL to be replayed by the standby)
> is not supported.

Well, it would still be useful, as it would shorten the delay. But yeah,
there's a delay in asynchronous replication anyway, so we decided to
keep it simple and just poll. It's not ideal and definitely needs to be
revisited for synchronous mode in the future. Same for walsenders.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Tue, 2010-04-06 at 10:19 +0300, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> I was also surprised to note that the Startup process is not signaled by
> >> WALReceiver when new WAL is received, so it will continue sleeping even
> >> though it has work to do.
> > 
> > I don't think this is so useful in 9.0 since synchronous replication
> > (i.e., transaction commit wait for WAL to be replayed by the standby)
> > is not supported.
> 
> Well, it would still be useful, as it would shorten the delay. But yeah,
> there's a delay in asynchronous replication anyway, so we decided to
> keep it simple and just poll. It's not ideal and definitely needs to be
> revisited for synchronous mode in the future. Same for walsenders.

A signal seems fairly straightforward to me, the archiver did this in
8.0 and it was not considered complex then. Quite why it would be
complex here is not clear.

I'm not happy that it waits, nor that the wait is non-tunable. I would
like to see a new parameter added for this.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs wrote:
> On Tue, 2010-04-06 at 10:19 +0300, Heikki Linnakangas wrote:
>> Fujii Masao wrote:
>>> On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> I was also surprised to note that the Startup process is not signaled by
>>>> WALReceiver when new WAL is received, so it will continue sleeping even
>>>> though it has work to do.
>>> I don't think this is so useful in 9.0 since synchronous replication
>>> (i.e., transaction commit wait for WAL to be replayed by the standby)
>>> is not supported.
>> Well, it would still be useful, as it would shorten the delay. But yeah,
>> there's a delay in asynchronous replication anyway, so we decided to
>> keep it simple and just poll. It's not ideal and definitely needs to be
>> revisited for synchronous mode in the future. Same for walsenders.
> 
> A signal seems fairly straightforward to me, the archiver did this in
> 8.0 and it was not considered complex then. Quite why it would be
> complex here is not clear.

The other side of the problem is that walsender polls too. Eliminating
the delay from walreceiver doesn't buy you much unless you eliminate the
delay from the walsender too. And things get complicated there. Do you
signal the walsenders at every commit? That would be a lot of volume,
and adds more work for every normal transaction in the primary. Maybe
not much, but it would be one more thing to worry about and test.

> I'm not happy that it waits, nor that the wait is non-tunable. I would
> like to see a new parameter added for this.

I wanted to keep it simple for users, but feel free to add a parameter
if you feel it must be configurable.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Tue, 2010-04-06 at 12:38 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2010-04-06 at 10:19 +0300, Heikki Linnakangas wrote:
> >> Fujii Masao wrote:
> >>> On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>>> I was also surprised to note that the Startup process is not signaled by
> >>>> WALReceiver when new WAL is received, so it will continue sleeping even
> >>>> though it has work to do.
> >>> I don't think this is so useful in 9.0 since synchronous replication
> >>> (i.e., transaction commit wait for WAL to be replayed by the standby)
> >>> is not supported.
> >> Well, it would still be useful, as it would shorten the delay. But yeah,
> >> there's a delay in asynchronous replication anyway, so we decided to
> >> keep it simple and just poll. It's not ideal and definitely needs to be
> >> revisited for synchronous mode in the future. Same for walsenders.
> > 
> > A signal seems fairly straightforward to me, the archiver did this in
> > 8.0 and it was not considered complex then. Quite why it would be
> > complex here is not clear.
> 
> The other side of the problem is that walsender polls too. Eliminating
> the delay from walreceiver doesn't buy you much unless you eliminate the
> delay from the walsender too. And things get complicated there. Do you
> signal the walsenders at every commit? That would be a lot of volume,
> and adds more work for every normal transaction in the primary. Maybe
> not much, but it would be one more thing to worry about and test.

You are trying to connect two unrelated things.

We can argue that the WALSender's delay allows it to build up a good
sized batch of work to transfer.

Having the Startup process wait does not buy us anything at all.
Currently if the Startup process finishes more quickly than the
WALreceiver it will wait for 100ms.

I am surprised at your arguments for simplicity. With Hot Standby you
have insisted that everything should be in place. With SR you seem to
have just stopped at a barely working, poorly documented implementation.
We both know you can fix these things easily and quickly. Please do so.
Not because I say so, but because everybody else will soon notice that
you could have and did not.

-- Simon Riggs           www.2ndQuadrant.com



Simon Riggs wrote:
> I am surprised at your arguments for simplicity. With Hot Standby you
> have insisted that everything should be in place. With SR you seem to
> have just stopped at a barely working, poorly documented implementation.

That's opposite to my recollection of the hot standby development. I
simplified and ripped out a lot of stuff from the original patch.

If you insist, I'll work out a patch to send a signal to startup process
after every fsync(), but it really doesn't seem very important given
that there's always a delay there anyway.

> We both know you can fix these things easily and quickly. Please do so.

That's a plural form. What's the other thing you're referring to?

> Not because I say so, but because everybody else will soon notice that
> you could have and did not.

Bollocks.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Well, it would still be useful, as it would shorten the delay. But yeah,
> there's a delay in asynchronous replication anyway, so we decided to
> keep it simple and just poll. It's not ideal and definitely needs to be
> revisited for synchronous mode in the future. Same for walsenders.

Stop me if I misunderstood the case at hand, but while waiting some more
for having a sizeable batch to send makes a lot of sense to me, waiting
on the receiver side when there's some work to do will only forbids a
slow slave to keep up with the load, increasing lag artificially.

I'm used to asynchronous replication where you're never allowed to rest
if some batch is ready for you to process.

Regards,
-- 
dim


On Tue, Apr 6, 2010 at 8:06 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> If you insist, I'll work out a patch to send a signal to startup process
> after every fsync(), but it really doesn't seem very important given
> that there's always a delay there anyway.

Agreed. Even if we get rid of the delay of startup process, it would still
take time until the committed transaction has become visible in the standby.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Dimitri Fontaine wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Well, it would still be useful, as it would shorten the delay. But yeah,
>> there's a delay in asynchronous replication anyway, so we decided to
>> keep it simple and just poll. It's not ideal and definitely needs to be
>> revisited for synchronous mode in the future. Same for walsenders.
> 
> Stop me if I misunderstood the case at hand, but while waiting some more
> for having a sizeable batch to send makes a lot of sense to me, waiting
> on the receiver side when there's some work to do will only forbids a
> slow slave to keep up with the load, increasing lag artificially.
> 
> I'm used to asynchronous replication where you're never allowed to rest
> if some batch is ready for you to process.

When the startup process wakes up after sleep to replay WAL, it does
replay all the WAL streamed that far. And if more WAL if streamed during
the replay, it's replayed too before the next sleep. But when it does
reach the end of already-streamed WAL, it sleeps for 100ms, and doesn't
wake up earlier if a WAL record arrives during the sleep.

So, it does increase the lag artificially, but it will keep up with the
volume just fine.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


On Tue, 2010-04-06 at 14:06 +0300, Heikki Linnakangas wrote:

> If you insist, I'll work out a patch to send a signal to startup process
> after every fsync(), but it really doesn't seem very important given
> that there's always a delay there anyway.
> 
> > We both know you can fix these things easily and quickly. Please do so.
> 
> That's a plural form. What's the other thing you're referring to?

I mentioned 3 things right here:
* signal
* parameter to control delay (2 separate delays)
* docs

Many other things have been mentioned on other posts and I am unhappy
with the answer "lets defer everything til 9.1". Yes, some things need
to be deferred, but not everything. I feel for you both as developers,
and apologise if you don't like what I say, but we need to get things
into a better state for 9.0. Please.

-- Simon Riggs           www.2ndQuadrant.com



On Tue, Apr 6, 2010 at 7:06 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Simon Riggs wrote:
>> I am surprised at your arguments for simplicity. With Hot Standby you
>> have insisted that everything should be in place. With SR you seem to
>> have just stopped at a barely working, poorly documented implementation.
>
> That's opposite to my recollection of the hot standby development. I
> simplified and ripped out a lot of stuff from the original patch.
>
> If you insist, I'll work out a patch to send a signal to startup process
> after every fsync(), but it really doesn't seem very important given
> that there's always a delay there anyway.

I would like to vote strongly against doing this.  We all know that
Streaming Replication in 9.0 is not going to be as mature as we'd like
it to be; but we should be putting our time into fixing things that
are broken rather than tinkering with the avoidance of 100 ms waits.

>> We both know you can fix these things easily and quickly. Please do so.
>
> That's a plural form. What's the other thing you're referring to?
>
>> Not because I say so, but because everybody else will soon notice that
>> you could have and did not.
>
> Bollocks.

I've been thinking that the reason we weren't going to beta was
because of the SR open items, but I'm starting to think there's not
much left that really needs to be dealt with.  The ones from that list
I think we should fix yet are:

- Walreceiver and dblink are not interruptible on win32.
- The documentation needs to be improved (if there's still more to do)
- Redefine smart shutdown in standby mode? (i'm working on this)
- The replication connections consume superuser_reserved_connections slots.

The other stuff strikes me as all window dressing.  I also think we
need to deal with the shutdown checkpoint issue, which is HS rather
than SR.

...Robert


On Tue, 2010-04-06 at 07:47 -0400, Robert Haas wrote:

> I've been thinking that the reason we weren't going to beta was
> because of the SR open items, but I'm starting to think there's not
> much left that really needs to be dealt with.  The ones from that list
> I think we should fix yet are:
> 
> - Walreceiver and dblink are not interruptible on win32.
> - The documentation needs to be improved (if there's still more to do)
> - Redefine smart shutdown in standby mode? (i'm working on this)
> - The replication connections consume superuser_reserved_connections slots.
> 
> The other stuff strikes me as all window dressing. 

I'm not happy that "other stuff" just gets punted. Things should
definitely not be removed from Open Items list when there is still
discussion/objection on them. The purpose of discussion on hackers is so
that we take note of those items, not just shrug and walk away from them
because some weeks have passed since they were mentioned. I shouldn't
have to keep a personal list of things I've objected to, so I can refute
what other people say.

-- Simon Riggs           www.2ndQuadrant.com



On Tue, Apr 6, 2010 at 8:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2010-04-06 at 07:47 -0400, Robert Haas wrote:
>
>> I've been thinking that the reason we weren't going to beta was
>> because of the SR open items, but I'm starting to think there's not
>> much left that really needs to be dealt with.  The ones from that list
>> I think we should fix yet are:
>>
>> - Walreceiver and dblink are not interruptible on win32.
>> - The documentation needs to be improved (if there's still more to do)
>> - Redefine smart shutdown in standby mode? (i'm working on this)
>> - The replication connections consume superuser_reserved_connections slots.
>>
>> The other stuff strikes me as all window dressing.
>
> I'm not happy that "other stuff" just gets punted. Things should
> definitely not be removed from Open Items list when there is still
> discussion/objection on them. The purpose of discussion on hackers is so
> that we take note of those items, not just shrug and walk away from them
> because some weeks have passed since they were mentioned. I shouldn't
> have to keep a personal list of things I've objected to, so I can refute
> what other people say.

If we never removed anything upon which there wasn't 100% consensus,
we would never release.  But no one is talking about removing items
without discussing them.  We are talking about discussing the
possibility of removing some of them.

...Robert


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> When the startup process wakes up after sleep to replay WAL, it does
> replay all the WAL streamed that far. And if more WAL if streamed during
> the replay, it's replayed too before the next sleep. But when it does
> reach the end of already-streamed WAL, it sleeps for 100ms, and doesn't
> wake up earlier if a WAL record arrives during the sleep.

Thanks for the clear picture. It then works the same as PGQ based
consumers, including londiste. I'm now happy ☻

> So, it does increase the lag artificially, but it will keep up with the
> volume just fine.

Great. No further complain from me there. I guess it now entirely
depends on how easy it is to wake up before the end of the 100ms: it
might be that it's easier to code the signaling than to add a GUC for
the value?

Regards,
--
dim


Simon Riggs wrote:
> On Tue, 2010-04-06 at 12:38 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Tue, 2010-04-06 at 10:19 +0300, Heikki Linnakangas wrote:
>>>> Fujii Masao wrote:
>>>>> On Tue, Apr 6, 2010 at 3:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>>> I was also surprised to note that the Startup process is not signaled by
>>>>>> WALReceiver when new WAL is received, so it will continue sleeping even
>>>>>> though it has work to do.
>>>>> I don't think this is so useful in 9.0 since synchronous replication
>>>>> (i.e., transaction commit wait for WAL to be replayed by the standby)
>>>>> is not supported.
>>>> Well, it would still be useful, as it would shorten the delay. But yeah,
>>>> there's a delay in asynchronous replication anyway, so we decided to
>>>> keep it simple and just poll. It's not ideal and definitely needs to be
>>>> revisited for synchronous mode in the future. Same for walsenders.
>>> A signal seems fairly straightforward to me, the archiver did this in
>>> 8.0 and it was not considered complex then. Quite why it would be
>>> complex here is not clear.
>> The other side of the problem is that walsender polls too. Eliminating
>> the delay from walreceiver doesn't buy you much unless you eliminate the
>> delay from the walsender too. And things get complicated there. Do you
>> signal the walsenders at every commit? That would be a lot of volume,
>> and adds more work for every normal transaction in the primary. Maybe
>> not much, but it would be one more thing to worry about and test.
>
> You are trying to connect two unrelated things.
>
> We can argue that the WALSender's delay allows it to build up a good
> sized batch of work to transfer.
>
> Having the Startup process wait does not buy us anything at all.
> Currently if the Startup process finishes more quickly than the
> WALreceiver it will wait for 100ms.

Ok, here's a patch to add signaling between walreceiver and startup
process. It indeed isn't much code, and seems pretty safe, so if no-one
objects strongly, I'll commit. It won't help on platforms where
pg_usleep() isn't interrupted by signals, though, but we can live with that.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index abdf4d8..47cd6e9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8682,6 +8682,16 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
         shutdown_requested = true;
 }

+/*
+ * SIGUSR1: do nothing, but receiving the signal will wake us up if we're
+ * sleeping (on platforms where usleep() is interrupted by sleep, on other
+ * platforms this is useless).
+ */
+static void
+DoNothingHandler(SIGNAL_ARGS)
+{
+}
+
 /* Handle SIGHUP and SIGTERM signals of startup process */
 void
 HandleStartupProcInterrupts(void)
@@ -8735,7 +8745,7 @@ StartupProcessMain(void)
     else
         pqsignal(SIGALRM, SIG_IGN);
     pqsignal(SIGPIPE, SIG_IGN);
-    pqsignal(SIGUSR1, SIG_IGN);
+    pqsignal(SIGUSR1, DoNothingHandler);    /* WAL record has been streamed */
     pqsignal(SIGUSR2, SIG_IGN);

     /*
@@ -8859,8 +8869,10 @@ retry:
                         goto triggered;

                     /*
-                     * When streaming is active, we want to react quickly when
-                     * the next WAL record arrives, so sleep only a bit.
+                     * Sleep until the next WAL record arrives, or
+                     * walreceiver dies. On some platforms, signals don't
+                     * interrupt sleep, so poll every 100 ms to ensure we
+                     * respond promptly on such platforms.
                      */
                     pg_usleep(100000L); /* 100ms */
                 }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 98ab484..39d8fb9 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -205,8 +205,8 @@ bool        enable_bonjour = false;
 char       *bonjour_name;

 /* PIDs of special child processes; 0 when not running */
-static pid_t StartupPID = 0,
-            BgWriterPID = 0,
+pid_t        StartupPID = 0;
+static pid_t BgWriterPID = 0,
             WalWriterPID = 0,
             WalReceiverPID = 0,
             AutoVacPID = 0,
@@ -433,6 +433,7 @@ typedef struct
     PMSignalData *PMSignalState;
     InheritableSocket pgStatSock;
     pid_t        PostmasterPid;
+    pid_t        StartupPid;
     TimestampTz PgStartTime;
     TimestampTz PgReloadTime;
     bool        redirection_done;
@@ -4595,6 +4596,7 @@ save_backend_variables(BackendParameters *param, Port *port,
         return false;

     param->PostmasterPid = PostmasterPid;
+    param->StartupPid = StartupPid;
     param->PgStartTime = PgStartTime;
     param->PgReloadTime = PgReloadTime;

@@ -4809,6 +4811,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
     read_inheritable_socket(&pgStatSock, ¶m->pgStatSock);

     PostmasterPid = param->PostmasterPid;
+    StartupPid = param->StartupPid;
     PgStartTime = param->PgStartTime;
     PgReloadTime = param->PgReloadTime;

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index c0e7e0b..c4b4614 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -41,6 +41,7 @@
 #include "access/xlog_internal.h"
 #include "libpq/pqsignal.h"
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walreceiver.h"
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
@@ -532,6 +533,9 @@ XLogWalRcvFlush(void)
         walrcv->receivedUpto = LogstreamResult.Flush;
         SpinLockRelease(&walrcv->mutex);

+        /* Wake up startup process to process the arrived records */
+        kill(StartupPID, SIGUSR1);
+
         /* Report XLOG streaming progress in PS display */
         snprintf(activitymsg, sizeof(activitymsg), "streaming %X/%X",
                  LogstreamResult.Write.xlogid, LogstreamResult.Write.xrecoff);
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 56d7d8e..fcfca78 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -36,6 +36,8 @@ extern HANDLE PostmasterHandle;

 extern const char *progname;

+extern pid_t StartupPID;
+
 extern int    PostmasterMain(int argc, char *argv[]);
 extern void ClosePostmasterPorts(bool am_syslogger);


On Tue, 2010-04-06 at 16:01 +0300, Heikki Linnakangas wrote:

> > Having the Startup process wait does not buy us anything at all.
> > Currently if the Startup process finishes more quickly than the
> > WALreceiver it will wait for 100ms.
> 
> Ok, here's a patch to add signaling between walreceiver and startup
> process. It indeed isn't much code, and seems pretty safe, so if no-one
> objects strongly, I'll commit. It won't help on platforms where
> pg_usleep() isn't interrupted by signals, though, but we can live with that.

Looks good.

There is also the fixed 5 sec wait when polling the archive. I would
like to make that a parameter, since that was previously controllable
with pg_standby.

-- Simon Riggs           www.2ndQuadrant.com



Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Ok, here's a patch to add signaling between walreceiver and startup
> process. It indeed isn't much code, and seems pretty safe, so if no-one
> objects strongly, I'll commit.

I object --- this seems like a large change to be sticking in at this
point with no testing.  I'm concerned about exactly how often the signal
will happen (ie, how much overhead is being added).  I'm also concerned
about the fact that the startup process will now be receiving a constant
storm of "no-op" signals, an operational behavior that is completely
untested.  If there's even one place that is failing to deal with EINTR
retry, for instance, we'll have a problem.  Plus I don't care for the
platform dependency of the "fix".  Being interruptable by signals is
not part of the defined API for pg_usleep.

I agree with the previous opinion that trying to get rid of that delay
is an entirely inappropriate task at this point.  Leave it for 9.1.
        regards, tom lane


Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Ok, here's a patch to add signaling between walreceiver and startup
> process. It indeed isn't much code, and seems pretty safe

Great news! Thanks,
-- 
dim