On Thu, 2010-02-11 at 15:28 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > If you were running pg_standby as the restore_command then this error
> > wouldn't happen. So you need to explain why running pg_standby cannot
> > solve your problem and why we must fix it by replicating code that has
> > previously existed elsewhere.
>
> pg_standby cannot be used with streaming replication.
> I guess you're next question is: why not?
>
> The startup process alternates between streaming, and restoring files
> from archive using restore_command. It will progress using streaming as
> long as it can, but if the connection is lost, it will try to poll the
> archive until the connection is established again. The startup process
> expects the restore_command to try to restore the file and fail if it's
> not found. If the restore_command goes into sleep, waiting for the file
> to arrive, that will defeat the retry logic in the server because the
> startup process won't get control again to retry establishing the
> connection.
Why does the startup process need to regain control? Why not just let it
sit and wait? Have you seen that if someone does use pg_standby or
similar scripts in the restore_command that the server will never regain
control in the way you hope. Would that cause a sporadic hang?
The overall design was previously that the solution implementor was in
charge of the archive and only they knew its characteristics.
It seems strange that we will be forced to explicitly ban people from
using a utility they were previously used to using and is still included
with the distro. Then we implement in the server the very things the
utility did. Only this time the solution implementor will not be in
control.
I would not be against implementing all aspects of pg_standby into the
server. It would make life easier in some ways. I am against
implementing only a *few* of the aspects because that leaves solution
architects in a difficult position to know what to do.
Please lay out some options here for discussion by the community. This
seems like a difficult area and not one to be patched up quickly.
> That's the the essence of my proposal here:
> http://archives.postgresql.org/message-id/4B50AFB4.4060902@enterprisedb.com
> which is what has now been implemented.
>
> To suppport a restore_command that does the sleeping itself, like
> pg_standby, would require a major rearchitecting of the retry logic. And
> I don't see why that'd desirable anyway. It's easier for the admin to
> set up using simple commands like 'cp' or 'scp', than require him/her to
> write scripts that handle the sleeping and retry logic.
>
>
> The real problem we have right now is missing documentation. It's
> starting to hurt us more and more every day, as more people start to
> test this. As shown by this thread and some other recent posts.
-- Simon Riggs www.2ndQuadrant.com