Thread: streaming replication question

streaming replication question

From

Rajesh Kumar Mallah

Date:

02 October 2010, 13:02:53

Dear List,

Firstly thanks to the postgresql global development team for releasing postgresql
with such a great enterprise feature of SR & HS.

As an enduser i setup SR based on http://wiki.postgresql.org/wiki/Streaming_Replication
The master & standby are both powerful machines and are on same gigabit switch.
Things worked as expected I updated 1000's of records in single txn and saw them
appearing on the standby instantly.

Then i wanted to see the impact of shutting down the standby momentarily . This apparently
failed with following messages in standby logs:

DETAIL: last completed transaction was at log time 2010-10-02 20:57:26.17677+05:30
LOG: restartpoint starting: time
LOG: received smart shutdown request
FATAL: terminating walreceiver process due to administrator command
LOG: restartpoint complete: wrote 1568 buffers (0.3%); write=146.237 s, sync=0.251 s, total=146.489 s
LOG: recovery restart point at 25/EA87BA18
DETAIL: last completed transaction was at log time 2010-10-02 21:02:21.89303+05:30
LOG: shutting down <--- - - - - - - - - - - - - -- - - shutdown command issued here
LOG: database system is shut down
LOG: database system was shut down in recovery at 2010-10-02 21:02:22 IST <---------------- first attempt to start was made
LOG: entering standby mode
LOG: restored log file "0000000100000025000000EA" from archive
LOG: redo starts at 25/EA87BA18
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16399/303892827; tid 1503/119
LOG: startup process (PID 20527) exited with exit code 1
LOG: terminating any other active server processes

LOG: database system was interrupted while in recovery at log time 2010-10-02 20:57:12 IST <--------------- second attempt was made.
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
LOG: entering standby mode
LOG: restored log file "0000000100000025000000EA" from archive
LOG: redo starts at 25/EA87BA18
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16399/303892827; tid 1503/119
LOG: startup process (PID 20693) exited with exit code 1
LOG: terminating any other active server processes

I confess that i have not RT(Fine)M yet .

Regds
Rajesh Kumar Mallah.

Re: streaming replication question

From

Fujii Masao

Date:

04 October 2010, 09:52:18

On Sun, Oct 3, 2010 at 1:02 AM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:
> As an enduser i setup SR based on
> http://wiki.postgresql.org/wiki/Streaming_Replication
> The master & standby are both powerful machines and are on same gigabit
> switch.
> Things worked as expected I updated 1000's of records in single txn and saw
> them
> appearing on the standby instantly.
>
> Then i wanted  to see the impact of shutting down the standby momentarily .
> This apparently
> failed with following messages in standby logs:
>
> DETAIL:  last completed transaction was at log time 2010-10-02
> 20:57:26.17677+05:30
> LOG:  restartpoint starting: time
> LOG:  received smart shutdown request
> FATAL:  terminating walreceiver process due to administrator command
> LOG:  restartpoint complete: wrote 1568 buffers (0.3%); write=146.237 s,
> sync=0.251 s, total=146.489 s
> LOG:  recovery restart point at 25/EA87BA18
> DETAIL:  last completed transaction was at log time 2010-10-02
> 21:02:21.89303+05:30
> LOG:  shutting down <--- - - - - - - - - - - - - -- - -  shutdown command
> issued here
> LOG:  database system is shut down
> LOG:  database system was shut down in recovery at 2010-10-02 21:02:22 IST
> <---------------- first attempt to start was made
> LOG:  entering standby mode
> LOG:  restored log file "0000000100000025000000EA" from archive
> LOG:  redo starts at 25/EA87BA18
> FATAL:  too many KnownAssignedXids
> CONTEXT:  xlog redo insert: rel 1663/16399/303892827; tid 1503/119
> LOG:  startup process (PID 20527) exited with exit code 1
> LOG:  terminating any other active server processes
>
> LOG:  database system was interrupted while in recovery at log time
> 2010-10-02 20:57:12 IST <--------------- second attempt was made.
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOG:  entering standby mode
> LOG:  restored log file "0000000100000025000000EA" from archive
> LOG:  redo starts at 25/EA87BA18
> FATAL:  too many KnownAssignedXids
> CONTEXT:  xlog redo insert: rel 1663/16399/303892827; tid 1503/119
> LOG:  startup process (PID 20693) exited with exit code 1
> LOG:  terminating any other active server processes

This looks like the bug of HS. But I'm not sure why KnownAssignedXids
overflowed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: streaming replication question

From

Rajesh Kumar Mallah

Date:

04 October 2010, 14:57:42

Dear Masao ,

Thanks so much for the response ,

I actually i am in a position to do rigorous stress testing because i have the machines at my disposal.
I am sorry that i was not in touch with the lists for a prolong duration and could not followup much
on the discussions while HS & SR were being developed.hence i am mostly relying on the published
documents only. The wiki that i used does not say much about how to manage the wal files that are
archived in primary and standby(possibly).

I am currently aiming to setup only SR between 2 servers only.

in primary server i have
---------------------------------------------------------------------
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode    = on
archive_command = 'cp %p /mnt/disk1/pgdatadir/wal_archives/%f'
-----------------------------------------------------------------------------

in slave sever i have   additionally
hot_standby = on # in postgresql.conf

and recovery.con in slave is:
--------------
standby_mode          = 'on'
primary_conninfo      = 'host=172.16.10.139 port=5432 user=postgres'
trigger_file = '/mnt/disk1/pgdatadir/trigger'
restore_command = 'cp /mnt/disk1/pgdatadir/wal_archives/%f "%p"'
-----------------------

My question is in SR setup do i need to keep transferring the archived WAL files also ?
when the slave requests WAL records from the master does the master consult the archived
wal files also for sending the records ?

anticipating your enlightening answers

your respectfully
Rajesh Kumar Mallah.

Re: streaming replication question

From

Rajesh Kumar Mallah

Date:

04 October 2010, 15:03:09

Dear Masao,
I would also like to inform that i observed the phenomenon only once. And i am still not
too clear on the correct setup.
Regds
Rajesh Kumar Mallah.

This looks like the bug of HS. But I'm not sure why KnownAssignedXids
overflowed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: streaming replication question

From

Fujii Masao

Date:

05 October 2010, 01:10:06

On Tue, Oct 5, 2010 at 2:57 AM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:
> I am currently aiming to setup only SR  between 2 servers only.

"only SR" means that you don't need Hot Standby (i.e., you don't need to
run any query on the standby server)? If so, you can set wal_level to
archive instead of hot_standby, and disable hot_standby parameter on the
standby.

> My question is in SR setup do i need to keep transferring the archived WAL
> files also ?

No if wal_keep_segments is high enough that WAL files required for the standby
can be saved in pg_xlog directory of the master.

> when the slave requests WAL records from the master does the master consult
> the archived
> wal files also for sending the records ?

No. In 9.0, the master doesn't read and send the archived WAL files.
But I agree to make the master send the archived WAL files, and I'm
now proposing that.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg02040.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: streaming replication question

From

Rajesh Kumar Mallah

Date:

05 October 2010, 04:05:07

On Tue, Oct 5, 2010 at 12:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Oct 5, 2010 at 2:57 AM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:
> I am currently aiming to setup only SR between 2 servers only.

"only SR" means that you don't need Hot Standby (i.e., you don't need to
run any query on the standby server)? If so, you can set wal_level to
archive instead of hot_standby, and disable hot_standby parameter on the
standby.

Thanks for the kind replies,
May be I am mistaken in understanding of the terms. I do issue read only
queries to the standby server. So I think its HS. But I also stream the
wal records.

> My question is in SR setup do i need to keep transferring the archived WAL
> files also ?

No if wal_keep_segments is high enough that WAL files required for the standby
can be saved in pg_xlog directory of the master.

> when the slave requests WAL records from the master does the master consult
> the archived
> wal files also for sending the records ?

No. In 9.0, the master doesn't read and send the archived WAL files.
But I agree to make the master send the archived WAL files, and I'm
now proposing that.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg02040.php

Regards,

I just wanted to let you know how i am now managing the wal logs.
i have mounted the wal archive folder(Xp) of primary on the slave via NFS to
a folder (Ys).
The folder Xp of primary is periodically rsynced to folder Xs on slave with option --delete
in recovery.conf of slave archive_cleanup_command has been specified to cleanup Ys
(not Xs), since it is a NFS rw mount it removes the unneeded archived WAL files from the
source also ie Xp . subsequently the files are removed from Xs also because of the
--delete option of rsync.

My original requirement is that , I should be able to divert certain amount of Read Only
queries from the master to salve so as to reduce load in primary. Our pilot applications
have been modified so that they always contact master when that have to modify data
and slave when they have to read data. The current setup provides for the requirement
but I am concerned what happens if the slave is shutdown for a prolonged duration how
will i get to know when a base backup is required. I am trying to read the docs and
practically observing also the effects of various steps. We have a switch in the software
that lets was not to use the slave at all ! .

Regds
Rajesh Kumar Mallah.
Tradeindia.com - India's Largest B2B MarketPlace.
(uses PostgreSQL for past 10 years)

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: streaming replication question

From

Fujii Masao

Date:

05 October 2010, 05:38:57

On Tue, Oct 5, 2010 at 4:04 PM, Rajesh Kumar Mallah
<mallah.rajesh@gmail.com> wrote:
>> > I am currently aiming to setup only SR  between 2 servers only.
>>
>> "only SR" means that you don't need Hot Standby (i.e., you don't need to
>> run any query on the standby server)? If so, you can set wal_level to
>> archive instead of hot_standby, and disable hot_standby parameter on the
>> standby.
>
> Thanks for the kind replies,
> May be I am mistaken in understanding of the terms. I do issue read only
> queries to the standby server. So I think its HS. But I also stream the
> wal records.

SR is capability to stream WAL records from the master to the standby and
keep the database on the standby up to date by applying the WAL records.
HS is capability to allow us to run read-only queries on the standby.
You seem to need both SR and HS. So you don't need to change wal_level and
hot_standby parameters.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: streaming replication question

From

Simon Riggs

Date:

05 October 2010, 06:08:12

On Sat, 2010-10-02 at 12:02 -0400, Rajesh Kumar Mallah wrote:
> FATAL:  too many KnownAssignedXids

That's a bug.

I've a few guesses about that, but I'll put some better instrumentation
in to see if we can prove what's causing it.

Can you reproduce that again, or was that failure isolated to that
particular point in the WAL stream? You can try turning HS off, then
turning it back on again later.

If you suspect a bug in Hot Standby, please set
    trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening

Thanks

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

Re: streaming replication question

From

Rajesh Kumar Mallah

Date:

05 October 2010, 07:16:17

Dear Riggs,
I am so sorry that i am not able to replicate it this time. May be i did something really stupid that day .
I remember at one point of time i ran the standy without any recorvery.conf even.
If i ever get that again i will surely try to replicate it and inform.

Regds
Rajesh Kumar Mallah.

On Tue, Oct 5, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2010-10-02 at 12:02 -0400, Rajesh Kumar Mallah wrote:
> FATAL: too many KnownAssignedXids

That's a bug.

I've a few guesses about that, but I'll put some better instrumentation
in to see if we can prove what's causing it.

Can you reproduce that again, or was that failure isolated to that
particular point in the WAL stream? You can try turning HS off, then
turning it back on again later.

If you suspect a bug in Hot Standby, please set
trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening

Thanks

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services