Thread: Help setting up warm standby replication

Help setting up warm standby replication

From

Nico Sabbi

Date:

14 April 2007, 19:43:52

Hi,
after having read the documentation and the few posts in this list I
tried to set up
a warm standby replication between two instances of  postgres running on
my pc
in 2 different base directories and 2 different ports.
(The second one was a cp -a of the first one after having created a
backup point) .
I don't need a realtime update: even few minutes of lag are good enough
for my needs.


To begin with I followed the example of the docs: in the recovery.conf
file of the
slave instance I  set

restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f  %p'

but at restart I got these messages in the logs (some translation from
italian to english by me) :
LOG:  system shutdown at 2007-04-13 13:02:51 CEST
LOG:  starting archive recovery
LOG:  restore_command = "cp -av /var/lib/pgsql/data/pg_xlog/%f %p"
cp: impossibile fare stat di
`/var/lib/pgsql/data/pg_xlog/00000001.history': No such file or directory
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG:  restored file "000000010000000000000007" from archive
LOG:  invalid record length at 0/7000218
LOG:  invalid primary record checkpoint
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG:  restored file "000000010000000000000007" from archive
LOG:  invalid record length at 0/70001D0
LOG:  invalid primary record checkpoint
PANIC:  impossible locating a valid record checkpoint
LOG:  startup process (PID 20250) was terminated by signal 6
LOG:  aborting startup due to startup process failure

same thing if I empty the pg_xlog/ dir (except archive_status/)

I hope someone can explain what  I did wrong.

Another thing I didn't understand is the continous replication mechanism:
will the slave server periodically call the recovery_command or do I
have to setup a cronjob to instruct it to search updates?

Sorry if these questions are stupid :) and thanks in advance.

    Nico

Re: Help setting up warm standby replication

From

Tom Lane

Date:

14 April 2007, 22:09:36

Nico Sabbi <nsabbi@officinedigitali.it> writes:
> To begin with I followed the example of the docs: in the recovery.conf
> file of the
> slave instance I  set

> restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f  %p'

Hm, it looks like you are trying to copy xlog segments straight from the
pg_xlog directory of the master?  That's not going to work, because the
master will rename/overwrite those files as soon as it thinks it's done
with them.  You need to have an archive_command on the master that is
really truly copying the data to somewhere else, and then the
restore_command should copy from the somewhere else.

            regards, tom lane

Re: Help setting up warm standby replication

From

Nico Sabbi

Date:

16 April 2007, 11:11:24

Tom Lane wrote:

>Nico Sabbi <nsabbi@officinedigitali.it> writes:
>
>
>>To begin with I followed the example of the docs: in the recovery.conf
>>file of the
>>slave instance I  set
>>
>>
>
>
>
>>restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f  %p'
>>
>>
>
>Hm, it looks like you are trying to copy xlog segments straight from the
>pg_xlog directory of the master?  That's not going to work, because the
>master will rename/overwrite those files as soon as it thinks it's done
>with them.  You need to have an archive_command on the master that is
>really truly copying the data to somewhere else, and then the
>restore_command should copy from the somewhere else.
>
>
>
after having set up shared directory (one for archive_command and one
for recover_command)
I could successfully start and sync the slave dbms, but after the
initial recovery
the file recovery.conf is renamed to recovery.done and no more
updates/WAL are
asked (the slave's log reads:

LOG:  archive recovery complete
LOG:  system database ready

)

Is there any parameter that I have to pass to the second server to keep
on requesting WALs?
I still don't understand what instructs the server to continously
request the master's logs.

Re: Help setting up warm standby replication

From

"Merlin Moncure"

Date:

16 April 2007, 12:42:44

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:
> Is there any parameter that I have to pass to the second server to keep
> on requesting WALs?
> I still don't understand what instructs the server to continously
> request the master's logs.

google pg_standby.  I've set it up and it works.

merlin

Re: Help setting up warm standby replication

From

Nico Sabbi

Date:

16 April 2007, 14:24:28

Merlin Moncure wrote:

> On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:
>
>> Is there any parameter that I have to pass to the second server to keep
>> on requesting WALs?
>> I still don't understand what instructs the server to continously
>> request the master's logs.
>
>
> google pg_standby.  I've set it up and it works.
>
> merlin
>

Hi,
I installed the second version of pg_standby.tar that you posted here.

In recovery.conf I set
restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p'

and I see in the log file that something goes on after every update:

Trigger file         : (null)
Waiting for WAL file : /tmp/pg//000000010000000000000021
WAL file path        : 000000010000000000000021
Restoring to...      : pg_xlog/RECOVERYXLOG
Sleep interval       : 5 seconds
Max wait interval    : 0 forever
Command for restore  : cp /tmp/pg//000000010000000000000021
pg_xlog/RECOVERYXLOG
running restore      : success
LOG:  restored file "000000010000000000000021" from archivio

but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks
suspicious to me.

Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ?
If I stop the replication and start the slave after having removed the
recovery.conf
the server doesn't come up:

LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted while in recovery at log time
2007-04-16 15:29:42 CEST
HINT:  If this has occurred more than once some data may be corrupted
and you may need to choose an earlier recovery target.
LOG:  impossibile aprire il file "pg_xlog/000000010000000000000009"
(file registro 0, segmento 9): No such file or directory
LOG:  invalid primary checkpoint record
LOG:  impossibile aprire il file "pg_xlog/000000010000000000000021"
(file registro 0, segmento 33): No such file or directory
LOG:  invalid secondary checkpoint record
PANIC:  impossibile localizzare un checkpoint record valido
LOG:  startup process (PID 10824) was terminated by signal 6
LOG:  aborting startup due to startup process failure

infact pg_xlog/ of the slave contains only RECOVERHISTORY:

cat pg_xlog/RECOVERYHISTORY
START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A)
CHECKPOINT LOCATION: 0/9000020
START TIME: 2007-04-16 15:29:42 CEST
LABEL: label
STOP TIME: 2007-04-16 15:31:39 CEST

Did I do something wrong?
Thanks for your help,
    Nico

Re: Help setting up warm standby replication

From

"Merlin Moncure"

Date:

16 April 2007, 14:35:13

On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:
> Merlin Moncure wrote:
>
> > On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:
> >
> >> Is there any parameter that I have to pass to the second server to keep
> >> on requesting WALs?
> >> I still don't understand what instructs the server to continously
> >> request the master's logs.
> >
> >
> > google pg_standby.  I've set it up and it works.
> >
> > merlin
> >
>
> Hi,
> I installed the second version of pg_standby.tar that you posted here.
>
> In recovery.conf I set
> restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p'
>
> and I see in the log file that something goes on after every update:
>
> Trigger file         : (null)
> Waiting for WAL file : /tmp/pg//000000010000000000000021
> WAL file path        : 000000010000000000000021
> Restoring to...      : pg_xlog/RECOVERYXLOG
> Sleep interval       : 5 seconds
> Max wait interval    : 0 forever
> Command for restore  : cp /tmp/pg//000000010000000000000021
> pg_xlog/RECOVERYXLOG
> running restore      : success
> LOG:  restored file "000000010000000000000021" from archivio
>
>
> but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks
> suspicious to me.
>
> Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ?
> If I stop the replication and start the slave after having removed the
> recovery.conf
> the server doesn't come up:
>
> LOG:  aborting startup due to startup process failure
> LOG:  database system was interrupted while in recovery at log time
> 2007-04-16 15:29:42 CEST
> HINT:  If this has occurred more than once some data may be corrupted
> and you may need to choose an earlier recovery target.
> LOG:  impossibile aprire il file "pg_xlog/000000010000000000000009"
> (file registro 0, segmento 9): No such file or directory
> LOG:  invalid primary checkpoint record
> LOG:  impossibile aprire il file "pg_xlog/000000010000000000000021"
> (file registro 0, segmento 33): No such file or directory
> LOG:  invalid secondary checkpoint record
> PANIC:  impossibile localizzare un checkpoint record valido
> LOG:  startup process (PID 10824) was terminated by signal 6
> LOG:  aborting startup due to startup process failure
>
>
> infact pg_xlog/ of the slave contains only RECOVERHISTORY:
>
> cat pg_xlog/RECOVERYHISTORY
> START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
> STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A)
> CHECKPOINT LOCATION: 0/9000020
> START TIME: 2007-04-16 15:29:42 CEST
> LABEL: label
> STOP TIME: 2007-04-16 15:31:39 CEST
>
> Did I do something wrong?
> Thanks for your help,


try link mode, not copy mode (-l). make sure you read about the -k switch.

merlin

Re: Help setting up warm standby replication

From

Nico Sabbi

Date:

16 April 2007, 15:02:15

Merlin Moncure wrote:

>
>
> try link mode, not copy mode (-l). make sure you read about the -k
> switch.
>
> merlin
>
replaced -c with -l, but the result was the same.
I assume that with -k you mean the socket dir for postgres - since
pg_standby doesn't
seek to recognize that switch -  but I didn't need it: I connected to -p
5433.

Anyway, after having stopped the replication I had the same problem;
in order to make the slave start I had to manually copy the log files
from the
archive to pg_xlog/ .

Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?

Re: Help setting up warm standby replication

From

Nico Sabbi

Date:

16 April 2007, 15:26:55

Nico Sabbi wrote:

> Merlin Moncure wrote:
>
>>
>>
>> try link mode, not copy mode (-l). make sure you read about the -k
>> switch.
>>
>> merlin
>>
> replaced -c with -l, but the result was the same.
> I assume that with -k you mean the socket dir for postgres - since
> pg_standby doesn't
> seek to recognize that switch -  but I didn't need it: I connected to
> -p 5433.
>
> Anyway, after having stopped the replication I had the same problem;
> in order to make the slave start I had to manually copy the log files
> from the
> archive to pg_xlog/ .
>
> Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?
>
>
update: touching the right trigger makes the replication stop, thus I
can connect to the server
and see the updated data.
Thanks a lot,
    Nico