Thread: Missing docs for SR

Missing docs for SR

From
Josh Berkus
Date:
Hackers,

So, here's a must-fix item for SR for release: we need adequate docs.
I'm happy to write these but *I* need to understand the answers first.

The current docs and wiki page do not explain:

* How (technically) the slave listens for LSNs
* Does the walreceiver need the archive (via archive_command) copies of
the WAL files after it's caught up with the master?
** If not, then what are we supposed to do with them?  Write a cron job
to delete them every minute?
** If so, can we somehow use pg_standby with SR?

I've tried to dig this information out of the wiki and mailing list
archives and can't quite figure it out.  Is there a tech doc which was
not posted anywhere public, or do I need to just RTFC?

Also, given that recovery.conf has become a key part of our replication,
shouldn't we supply a recovery.example file which people can rename and
edit?  Happy to write this too.

--Josh Berkus



Re: Missing docs for SR

From
Devrim GÜNDÜZ
Date:
On Wed, 2010-01-20 at 11:30 +1300, Josh Berkus wrote:
>
> Also, given that recovery.conf has become a key part of our
> replication, shouldn't we supply a recovery.example file which people
> can rename and edit?  Happy to write this too.

We ship it already:

src/backend/access/transam/recovery.conf.sample

and it has sections for HS and SR.
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: Missing docs for SR

From
Josh Berkus
Date:
On 1/20/10 12:00 PM, Devrim GÜNDÜZ wrote:
> On Wed, 2010-01-20 at 11:30 +1300, Josh Berkus wrote:
>> Also, given that recovery.conf has become a key part of our
>> replication, shouldn't we supply a recovery.example file which people
>> can rename and edit?  Happy to write this too. 
> 
> We ship it already:
> 
> src/backend/access/transam/recovery.conf.sample
> 
> and it has sections for HS and SR.

Well, it doesn't get created in PGDATA, and the docs don't mention the
above location that I could find.

Can we get initdb to copy it to whereever the config location is?  That
seems a lot more administrator-friendly than asking people to look in
the source code.

--Josh Berkus


Re: Missing docs for SR

From
Devrim GÜNDÜZ
Date:
On Wed, 2010-01-20 at 13:36 +1300, Josh Berkus wrote:

> Well, it doesn't get created in PGDATA, and the docs don't mention the
> above location that I could find.

It is installed under share/ directory for source installations. RPMs
install it under /usr/share/pgsql.

> Can we get initdb to copy it to whereever the config location is?
> That seems a lot more administrator-friendly than asking people to
> look in the source code.

Even though I agree that this file became really important for us, I
think we don't copy any .sample files under $PGDATA.

Maybe we should clarify docs and point admins to relevant dirs.
--
Devrim GÜNDÜZ, RHCE
Command Prompt - http://www.CommandPrompt.com
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

Re: Missing docs for SR

From
Andrew Dunstan
Date:

Josh Berkus wrote:
> On 1/20/10 12:00 PM, Devrim GÜNDÜZ wrote:
>   
>> On Wed, 2010-01-20 at 11:30 +1300, Josh Berkus wrote:
>>     
>>> Also, given that recovery.conf has become a key part of our
>>> replication, shouldn't we supply a recovery.example file which people
>>> can rename and edit?  Happy to write this too. 
>>>       
>> We ship it already:
>>
>> src/backend/access/transam/recovery.conf.sample
>>
>> and it has sections for HS and SR.
>>     
>
> Well, it doesn't get created in PGDATA, and the docs don't mention the
> above location that I could find.
>
> Can we get initdb to copy it to whereever the config location is?  That
> seems a lot more administrator-friendly than asking people to look in
> the source code.
>
>
>   

Devrim gave you the wrong location. It is already installed where we put 
other samples like postgresql.conf.sample, namely $SHAREDIR.

Nobody has to look in the source.

I don't think initdb should copy it by default, many users won't want 
such an animal at all. Maybe an initdb switch would meet the case.

cheers

andrew


Re: Missing docs for SR

From
Josh Berkus
Date:
> Devrim gave you the wrong location. It is already installed where we put
> other samples like postgresql.conf.sample, namely $SHAREDIR.
> 
> Nobody has to look in the source.
> 
> I don't think initdb should copy it by default, many users won't want
> such an animal at all. Maybe an initdb switch would meet the case.

OK, then it just needs to be in the docs.

So my other questions about the recovery files remain ...

--Josh


Re: Missing docs for SR

From
Heikki Linnakangas
Date:
Josh Berkus wrote:
> So, here's a must-fix item for SR for release: we need adequate docs.
> I'm happy to write these but *I* need to understand the answers first.

Many thanks!

> The current docs and wiki page do not explain:
> 
> * How (technically) the slave listens for LSNs

Hmm, not sure what you mean. In a nutshell, the slave connects to the
master over a libpq connection, and master sends the logs as they are
generated.

> * Does the walreceiver need the archive (via archive_command) copies of
> the WAL files after it's caught up with the master?

No. But if the  the connection is lost and standby falls behind enough
that the logs are deleted in the master already, it will need the
archive to catch up again.

> ** If so, can we somehow use pg_standby with SR?

No. pg_standby would do the waiting, and the streaming would never start.

I've been wondering if we should actually deprecate pg_standby in favor
of having the startup process do the retrying. So you would configure
restore_command to use 'cp' as in normal archive recovery, and the
server would retry running it every few seconds. It seems we're going to
need that retry logic in the slave anyway, to catch up automatically
after a lost connection. Once it's in there, you could use it without
streaming replication just as well.

> I've tried to dig this information out of the wiki and mailing list
> archives and can't quite figure it out.  Is there a tech doc which was
> not posted anywhere public, or do I need to just RTFC?

Feel free to ask.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Missing docs for SR

From
Fujii Masao
Date:
On Wed, Jan 20, 2010 at 7:30 AM, Josh Berkus <josh@agliodbs.com> wrote:
> So, here's a must-fix item for SR for release: we need adequate docs.
> I'm happy to write these but *I* need to understand the answers first.

Thanks a lot!

> The current docs and wiki page do not explain:
>
> * How (technically) the slave listens for LSNs

(Though I might not have understood your point correctly,) LSN is sent
from the master together with the WAL records. The protocol which SR
uses has been documented in the following page.
http://developer.postgresql.org/pgdocs/postgres/protocol-replication.html

> * Does the walreceiver need the archive (via archive_command) copies of
> the WAL files after it's caught up with the master?

No. And, an archived WAL file is not required for the slave even before
it's caught up with the master.

When the slave is started from the base backup;

1. The startup process tries to perform a normal archive recovery. If  restore_command is not supplied in the
recovery.conf,only the WAL  files in pg_xlog are replayed. So restore_command is optional for SR. 

2. When the startup process finds the invalid record (including "ENOENT"  of the next WAL file), it requests the
postmasterto start walreceiver  process. 

3. The walreceiver connects to the master and requests the WAL following  the LSN of that invalid record. Then the WAL
recordsare shipped  continuously to the walreceiver, and written to the slave's disk. 
  OTOH, the startup process waits until the next record has been written  by the walreceiver, and then reads and
appliesit. The startup process  continues this stop-and-go recovery. 

When you use the old base backup for the slave, not all of the WAL files
required for the slave exist in the master's pg_xlog, i.e., the master
might be unable to ship some of those files. In this case, if you use the
restore_command which accesses the master's archive, those missing files
can be applied on the slave in the phase #1.

> I've tried to dig this information out of the wiki and mailing list
> archives and can't quite figure it out.  Is there a tech doc which was
> not posted anywhere public, or do I need to just RTFC?

Nope. If you have any questions, please feel free to get back to me.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center