Thread: master and sync-replica diverging

master and sync-replica diverging

From
"Erik Rijkers"
Date:
AMD FX 8120 / centos 6.2 / latest source (git head)


It seems to be quite easy to force a 'sync' replica to not be equal to master by
recreating+loading a table in a while loop.


For this test I compiled+checked+installed three separate instances on the same machine.  The
replica application_name are names 'wal_receiver_$copy' where $copy is 01, resp. 02.

$ ./sync_state.sh pid  | application_name |   state   | sync_state
-------+------------------+-----------+------------19520 | wal_receiver_01  | streaming | sync19567 | wal_receiver_02
|streaming | async
 
(2 rows)
port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------6564 | on                 | wal_receiver_01
(1 row)
port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------6565 | off                |
(1 row)
port | synchronous_commit | synchronous_standby_names
------+--------------------+---------------------------6566 | off                |
(1 row)



The test consists of creating a table and loading tab-separated data from file with COPY and then
taking the rowcount of that table (13 MB, almost 200k rows) in all three instances:


# wget http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz

slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz

zcat $slurp_file \| grep -v '^#' \| grep -Ev '^[[:space:]]*$' \| psql -c "   drop table if exists $table cascade;
createtable $table (            gene_symbol      text       ,    primary_fbgn     text       ,    secondary_fbgns  text
     ,    annotation_id    text       ,    secondary_annotation_ids text   );   copy $table from stdin csv delimiter
E'\t';";

# count on master:
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6564

# count on wal_receiver_01 (sync replica):
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6565

# count on wal_receiver_02 (async replica):
echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6566



I expected the rowcounts from master and sync replica to always be the same.

Initially this seemed to be the case, but when I run the above sequence in a while loop for a few
minutes about 10% of rowcounts from the sync-replica are not equal to the master.

Perhaps not a likely scenario, but surely such a deviating rowcount on a sync replica should not
be possible?


thank you,


Erik Rijkers






Re: master and sync-replica diverging

From
Joshua Berkus
Date:
Erik,

Are you taking the counts *while* the table is loading?  In sync replication, it's possible for the counts to differ
fora short time due to one of three things:
 

* transaction has been saved to the replica and confirm message hasn't reached the master yet
* replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet applied it to the tables
inmemory.
 
* updating the master with synchronous_commit = local.

----- Original Message -----
> AMD FX 8120 / centos 6.2 / latest source (git head)
> 
> 
> It seems to be quite easy to force a 'sync' replica to not be equal
> to master by
> recreating+loading a table in a while loop.
> 
> 
> For this test I compiled+checked+installed three separate instances
> on the same machine.  The
> replica application_name are names 'wal_receiver_$copy' where $copy
> is 01, resp. 02.
> 
> $ ./sync_state.sh
>   pid  | application_name |   state   | sync_state
> -------+------------------+-----------+------------
>  19520 | wal_receiver_01  | streaming | sync
>  19567 | wal_receiver_02  | streaming | async
> (2 rows)
> 
>  port | synchronous_commit | synchronous_standby_names
> ------+--------------------+---------------------------
>  6564 | on                 | wal_receiver_01
> (1 row)
> 
>  port | synchronous_commit | synchronous_standby_names
> ------+--------------------+---------------------------
>  6565 | off                |
> (1 row)
> 
>  port | synchronous_commit | synchronous_standby_names
> ------+--------------------+---------------------------
>  6566 | off                |
> (1 row)
> 
> 
> 
> The test consists of creating a table and loading tab-separated data
> from file with COPY and then
> taking the rowcount of that table (13 MB, almost 200k rows) in all
> three instances:
> 
> 
> # wget
> http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz
> 
> slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz
> 
> zcat $slurp_file \
>  | grep -v '^#' \
>  | grep -Ev '^[[:space:]]*$' \
>  | psql -c "
>     drop table if exists $table cascade;
>     create table $table (
>              gene_symbol      text
>         ,    primary_fbgn     text
>         ,    secondary_fbgns  text
>         ,    annotation_id    text
>         ,    secondary_annotation_ids text
>     );
>     copy $table from stdin csv delimiter E'\t';
>  ";
> 
> # count on master:
> echo "select current_setting('port') port,count(*) from $table"|psql
> -qtXp 6564
> 
> # count on wal_receiver_01 (sync replica):
> echo "select current_setting('port') port,count(*) from $table"|psql
> -qtXp 6565
> 
> # count on wal_receiver_02 (async replica):
> echo "select current_setting('port') port,count(*) from $table"|psql
> -qtXp 6566
> 
> 
> 
> I expected the rowcounts from master and sync replica to always be
> the same.
> 
> Initially this seemed to be the case, but when I run the above
> sequence in a while loop for a few
> minutes about 10% of rowcounts from the sync-replica are not equal to
> the master.
> 
> Perhaps not a likely scenario, but surely such a deviating rowcount
> on a sync replica should not
> be possible?
> 
> 
> thank you,
> 
> 
> Erik Rijkers
> 
> 
> 
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


Re: master and sync-replica diverging

From
"Erik Rijkers"
Date:
On Thu, May 17, 2012 14:32, Joshua Berkus wrote:
> Erik,
>
> Are you taking the counts *while* the table is loading?  In sync replication, it's possible for
> the counts to differ for a short time due to one of three things:
>
> * transaction has been saved to the replica and confirm message hasn't reached the master yet

The count(*) was done in the way that I showed, i.e. *after* psql had exited.  My understanding is
that, with synchronous replication 'on' and configured properly, psql could only return *after*
the sync-replica had the data safely on disk.  Either my understanding is not correct or there is
a bug in postgres sync-rep.

> * replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet
> applied it to the tables in memory.

settings are untouched.

> * updating the master with synchronous_commit = local.

During my test:  synchronous_commit = on


Thanks,


Erik Rijkers

> ----- Original Message -----
>> AMD FX 8120 / centos 6.2 / latest source (git head)
>>
>>
>> It seems to be quite easy to force a 'sync' replica to not be equal
>> to master by
>> recreating+loading a table in a while loop.
>>
>>
>> For this test I compiled+checked+installed three separate instances
>> on the same machine.  The
>> replica application_name are names 'wal_receiver_$copy' where $copy
>> is 01, resp. 02.
>>
>> $ ./sync_state.sh
>>   pid  | application_name |   state   | sync_state
>> -------+------------------+-----------+------------
>>  19520 | wal_receiver_01  | streaming | sync
>>  19567 | wal_receiver_02  | streaming | async
>> (2 rows)
>>
>>  port | synchronous_commit | synchronous_standby_names
>> ------+--------------------+---------------------------
>>  6564 | on                 | wal_receiver_01
>> (1 row)
>>
>>  port | synchronous_commit | synchronous_standby_names
>> ------+--------------------+---------------------------
>>  6565 | off                |
>> (1 row)
>>
>>  port | synchronous_commit | synchronous_standby_names
>> ------+--------------------+---------------------------
>>  6566 | off                |
>> (1 row)
>>
>>
>>
>> The test consists of creating a table and loading tab-separated data
>> from file with COPY and then
>> taking the rowcount of that table (13 MB, almost 200k rows) in all
>> three instances:
>>
>>
>> # wget
>> http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz
>>
>> slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz
>>
>> zcat $slurp_file \
>>  | grep -v '^#' \
>>  | grep -Ev '^[[:space:]]*$' \
>>  | psql -c "
>>     drop table if exists $table cascade;
>>     create table $table (
>>              gene_symbol      text
>>         ,    primary_fbgn     text
>>         ,    secondary_fbgns  text
>>         ,    annotation_id    text
>>         ,    secondary_annotation_ids text
>>     );
>>     copy $table from stdin csv delimiter E'\t';
>>  ";
>>
>> # count on master:
>> echo "select current_setting('port') port,count(*) from $table"|psql
>> -qtXp 6564
>>
>> # count on wal_receiver_01 (sync replica):
>> echo "select current_setting('port') port,count(*) from $table"|psql
>> -qtXp 6565
>>
>> # count on wal_receiver_02 (async replica):
>> echo "select current_setting('port') port,count(*) from $table"|psql
>> -qtXp 6566
>>
>>
>>
>> I expected the rowcounts from master and sync replica to always be
>> the same.
>>
>> Initially this seemed to be the case, but when I run the above
>> sequence in a while loop for a few
>> minutes about 10% of rowcounts from the sync-replica are not equal to
>> the master.
>>
>> Perhaps not a likely scenario, but surely such a deviating rowcount
>> on a sync replica should not
>> be possible?
>>
>>
>> thank you,
>>
>>
>> Erik Rijkers
>>
>>
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>




Re: master and sync-replica diverging

From
Ants Aasma
Date:
On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote:
> The count(*) was done in the way that I showed, i.e. *after* psql had exited.  My understanding is
> that, with synchronous replication 'on' and configured properly, psql could only return *after*
> the sync-replica had the data safely on disk.  Either my understanding is not correct or there is
> a bug in postgres sync-rep.

Commit can only return when sync-replica has the data safely on disk,
but this doesn't mean that it's visible yet.

The sequence of events is in dot notation:
commit_command -> master_wal_sync -> replica_wal_sync ->
master_commit_visible -> commit_response
replica_wal_sync -> replica_replay_wal -> replica_commit_visible

If you issue a select on the replica after getting a commit response
from master you can see that the query getting a snapshot races with
replay of the commit record.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


Re: master and sync-replica diverging

From
"Erik Rijkers"
Date:
On Thu, May 17, 2012 16:10, Ants Aasma wrote:
> On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote:
>> The count(*) was done in the way that I showed, i.e. *after* psql had exited.  My understanding
>> is
>> that, with synchronous replication 'on' and configured properly, psql could only return *after*
>> the sync-replica had the data safely on disk.  Either my understanding is not correct or there
>> is
>> a bug in postgres sync-rep.
>
> Commit can only return when sync-replica has the data safely on disk,
> but this doesn't mean that it's visible yet.
>
> The sequence of events is in dot notation:
> commit_command -> master_wal_sync -> replica_wal_sync ->
> master_commit_visible -> commit_response
> replica_wal_sync -> replica_replay_wal -> replica_commit_visible
>
> If you issue a select on the replica after getting a commit response
> from master you can see that the query getting a snapshot races with
> replay of the commit record.
>

Ah yes, that makes sense. I hadn't thought of that.

Thank you for that explanation.