Thread: Streaming replication: sequences on slave seemingly ahead of sequences on master

Hi list,

we have two 9.1.2 servers on debian squeeze, and are setting up a simple
streaming replication between the two.

* wal_keep_segments is set high on the master
* the slave's recovery.conf contains just standbay_mode=on and
  primary_conninfo=foo
* we use a simple start_backup/rsync/stop_backup to create the base copy
  before starting the slave.


It all seems to be working fine, except that when checking the data (selecting
latest primary key and sequence value for all tables) on master and slave,
some sequence ids are higher on the slave than on the master. I could
understand if they were lower, but this is weird.

* The slave's sequences can be anywhere between 1 and 50 ids ahead.
* The actual table data is properly in sync.
* We look at the slave before the master.
* We ignore readings where pg_current_xlog_location() !=
  pg_last_xlog_replay_location().
* It only happens on frequently-updated sequences.
* During recovery, we have warnings of the form:
  2012-05-04 10:32:08 CEST WARNING: xlog min recovery request 16A/2A03BDD0 is
  past current point 16A/1E72A880
  2012-05-04 10:32:08 CEST CONTEXT: writing block 0 of relation
  base/35355/42224_vm
  xlog redo vacuum: rel 1663/1562168/1563037; blk 12122, lastBlockVacuumed
  12070
  2012-05-04 10:32:12 CEST WARNING: xlog min recovery request 16A/469F2120 is
  past current point 16A/1E9B6EB8
  2012-05-04 10:32:12 CEST CONTEXT: writing block 0 of relation
  base/56308/57181_vm
  xlog redo vacuum: rel 1663/1562168/1563037; blk 21875, lastBlockVacuumed
  21329
  2012-05-04 10:32:17 CEST WARNING: xlog min recovery request 16A/22D497B8 is
  past current point 16A/1FF69258
* servers have near-identical hardware and software
* monitoring via munin show at most 1-2 KB of replication lag
* we retried the base backup twice


So...
* any likely mistake on our side ?
* can it be fixed ?
* is this harmless and to be ignored ?


Thank you.

--
Vincent de Phily


On Fri, May 4, 2012 at 8:55 AM, Vincent de Phily
<vincent.dephily@mobile-devices.fr> wrote:
> Hi list,
>
> we have two 9.1.2 servers on debian squeeze, and are setting up a simple
> streaming replication between the two.
>
> * wal_keep_segments is set high on the master
> * the slave's recovery.conf contains just standbay_mode=on and
>  primary_conninfo=foo
> * we use a simple start_backup/rsync/stop_backup to create the base copy
>  before starting the slave.
>
>
> It all seems to be working fine, except that when checking the data (selecting
> latest primary key and sequence value for all tables) on master and slave,
> some sequence ids are higher on the slave than on the master. I could
> understand if they were lower, but this is weird.
>
> * The slave's sequences can be anywhere between 1 and 50 ids ahead.

how did you determine that exactly?  how do you know the transactions
are committing in sequence order?

merlin

On Friday 04 May 2012 09:47:16 Merlin Moncure wrote:
> On Fri, May 4, 2012 at 8:55 AM, Vincent de Phily
>
> <vincent.dephily@mobile-devices.fr> wrote:
> > Hi list,
> >
> > we have two 9.1.2 servers on debian squeeze, and are setting up a simple
> > streaming replication between the two.
> >
> > * wal_keep_segments is set high on the master
> > * the slave's recovery.conf contains just standbay_mode=on and
> >  primary_conninfo=foo
> > * we use a simple start_backup/rsync/stop_backup to create the base copy
> >  before starting the slave.
> >
> >
> > It all seems to be working fine, except that when checking the data
> > (selecting latest primary key and sequence value for all tables) on
> > master and slave, some sequence ids are higher on the slave than on the
> > master. I could understand if they were lower, but this is weird.
> >
> > * The slave's sequences can be anywhere between 1 and 50 ids ahead.
>
> how did you determine that exactly?

Quick and dirty :

SQL=$(psql -tA -h $MASTER $DB <<< "select E'select \''||table_name||E'\', '||
column_name||' from '||table_name||' order by '||column_name||' desc limit 1;'
from information_schema.columns where table_schema='public' and
ordinal_position=1 order by table_name;select E'select \''||sequence_name||
E'\', last_value from '||sequence_name||';' from information_schema.sequences
where sequence_schema='public' order by sequence_name;")
psql -tA -h $SLAVE $DB <<< "select pg_last_xlog_replay_location();$SQL" >
$SLAVE.check
psql -tA -h $MASTER $DB <<< "select pg_current_xlog_location();$SQL" >
$MASTER.check
if diff -u $MASTER.check $SLAVE.check; then
    cat $MASTER.check
    echo -e "\e[32msync ok\e[m"
else
    echo -e "\e[31msync bad\e[m"
fi


> how do you know the transactions
> are committing in sequence order?

I dont, actually. But whichever order the transactions eventually commit in,
I'd expect that order to be the same on the slave and the host ? And I
wouldn't expect anything to finish on the slave before it finishes on the
master ?

--
Vincent de Phily
Mobile Devices
+33 (0) 142 119 325
+353 (0) 85 710 6320

Warning
This message (and any associated files) is intended only for the use of its
intended recipient and may contain information that is confidential, subject
to copyright or constitutes a trade secret. If you are not the intended
recipient you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message, is
strictly prohibited. If you have received this message in error, please
notify us immediately by replying to the message and deleting it from your
computer. Any views or opinions presented are solely those of the author
vincent.dephily@mobile-devices.fr and do not necessarily represent those of
the
company. Although the company has taken reasonable precautions to ensure no
viruses are present in this email, the company cannot accept responsibility
for any loss or damage arising from the use of this email or attachments.

This is due to how sequences are pre-allocated in blocks to sessions running on the master.

Since the slave is updated via the WALs, and not via 'nextval' function calls in queries, the sequences that are actually used will remain in sync with the master.
--
Mike Nolan

On 4 May 2012 14:55, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:

> It all seems to be working fine, except that when checking the data (selecting
> latest primary key and sequence value for all tables) on master and slave,
> some sequence ids are higher on the slave than on the master. I could
> understand if they were lower, but this is weird.
>
> * The slave's sequences can be anywhere between 1 and 50 ids ahead.

This is normal. The sequences are advanced in chunks of 100, so the
master's value will be the nextval() while the value on standby will
be the start of the next chunk, so as you say, slightly ahead of the
master.

The same thing would also happen in case of a crash.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

On Sunday 06 May 2012 10:29:17 Simon Riggs wrote:
> On 4 May 2012 14:55, Vincent de Phily <vincent.dephily@mobile-devices.fr>
wrote:
> > It all seems to be working fine, except that when checking the data
> > (selecting latest primary key and sequence value for all tables) on
> > master and slave, some sequence ids are higher on the slave than on the
> > master. I could understand if they were lower, but this is weird.
> >
> > * The slave's sequences can be anywhere between 1 and 50 ids ahead.
>
> This is normal. The sequences are advanced in chunks of 100, so the
> master's value will be the nextval() while the value on standby will
> be the start of the next chunk, so as you say, slightly ahead of the
> master.
>
> The same thing would also happen in case of a crash.

Thanks for the explanation (Michael's too).

Would be nice to see it added to the documentation (unless I just didn't find
it ?), as it is quite surprising, and might lead to problems if people expect
to be able to read sequence values from the slave.

As a bonus question, I guess it would be the same if using synchroneous
replication ?

--
Vincent de Phily

On 7 May 2012 09:01, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:

> Would be nice to see it added to the documentation (unless I just didn't find
> it ?), as it is quite surprising, and might lead to problems if people expect
> to be able to read sequence values from the slave.

If you think so, please submit a patch. That's how it works here.

> As a bonus question, I guess it would be the same if using synchroneous
> replication ?

Yes


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

On Mon, May 7, 2012 at 10:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 7 May 2012 09:01, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:
>
>> Would be nice to see it added to the documentation (unless I just didn't find
>> it ?), as it is quite surprising, and might lead to problems if people expect
>> to be able to read sequence values from the slave.
>
> If you think so, please submit a patch. That's how it works here.

FWIW, I think this would be a reasonable thing to document, given that
it violates the principle of least surprise for people who are not
intimately familiar with how replication and/or sequences work wrt wal
logging in postgresql. (but no, I'm not actually volunteering at this
point to write said patch due to my backlog already being too large
:P)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

On 7 May 2012 09:19, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, May 7, 2012 at 10:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 7 May 2012 09:01, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:
>>
>>> Would be nice to see it added to the documentation (unless I just didn't find
>>> it ?), as it is quite surprising, and might lead to problems if people expect
>>> to be able to read sequence values from the slave.
>>
>> If you think so, please submit a patch. That's how it works here.
>
> FWIW, I think this would be a reasonable thing to document, given that
> it violates the principle of least surprise for people who are not
> intimately familiar with how replication and/or sequences work wrt wal
> logging in postgresql. (but no, I'm not actually volunteering at this
> point to write said patch due to my backlog already being too large
> :P)

Mine also. Its important that everybody understands that submitting a
patch is the way to get change, plus its a good test of whether the
change is actually worth the effort to make it happen.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 7 May 2012 09:01, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:
>> Would be nice to see it added to the documentation (unless I just didn't find
>> it ?), as it is quite surprising, and might lead to problems if people expect
>> to be able to read sequence values from the slave.

> If you think so, please submit a patch. That's how it works here.

Any documentation patch should be written by somebody who's actually
researched the behavior a bit; in particular I believe this can be
adjusted with the sequence CACHE setting.

            regards, tom lane



On Mon, May 7, 2012 at 4:01 AM, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:
On Sunday 06 May 2012 10:29:17 Simon Riggs wrote:
> On 4 May 2012 14:55, Vincent de Phily <vincent.dephily@mobile-devices.fr>
wrote:


Would be nice to see it added to the documentation (unless I just didn't find
it ?), as it is quite surprising, and might lead to problems if people expect
to be able to read sequence values from the slave.

What people need to understand is that there is no way to 'read' a sequence value from a slave.  'SELECT * from sequence_name' will not reliably give you either the most recently assigned or the next sequence value.  This is currently covered in the documentation for sequences, but could probably be improved upon and mentioned somewhere in the documentation on setting up slave servers.  (I will look at adding it to the binary replication tutorial wiki page.)

Since 'nextval' cannot be called on a sequence on a slave (because a slave can only support read-only transactions), 'currval' will by definition return an error.

To cross-pollinate with another thread, if temporary tables (and insert/delete/update transactions to them) are to be supported on a slave, will the applications using those temporary tables expect to be able to use 'nextval' on inserts to temporary tables as well?
 
As a bonus question, I guess it would be the same if using synchroneous
replication ?

Yes.
--
Mike Nolan
On Mon, May 7, 2012 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 7 May 2012 09:01, Vincent de Phily <vincent.dephily@mobile-devices.fr> wrote:
>>> Would be nice to see it added to the documentation (unless I just didn't find
>>> it ?), as it is quite surprising, and might lead to problems if people expect
>>> to be able to read sequence values from the slave.
>
>> If you think so, please submit a patch. That's how it works here.
>
> Any documentation patch should be written by somebody who's actually
> researched the behavior a bit; in particular I believe this can be
> adjusted with the sequence CACHE setting.

No. That behavior is caused by the hard-coded value SEQ_LOG_VALS
(= 32 in sequence.c) rather than CACHE setting.

Regards,

--
Fujii Masao

On Mon, May 7, 2012 at 8:52 AM, Michael Nolan <htfoot@gmail.com> wrote:
> To cross-pollinate with another thread, if temporary tables (and
> insert/delete/update transactions to them) are to be supported on a slave,
> will the applications using those temporary tables expect to be able to use
> 'nextval' on inserts to temporary tables as well?

That's a good question.  I re-asked it for you on -hackers on the GTT
thread. Look for follow ups there.

merlin