Thread: master and sync-replica diverging
AMD FX 8120 / centos 6.2 / latest source (git head) It seems to be quite easy to force a 'sync' replica to not be equal to master by recreating+loading a table in a while loop. For this test I compiled+checked+installed three separate instances on the same machine. The replica application_name are names 'wal_receiver_$copy' where $copy is 01, resp. 02. $ ./sync_state.sh pid | application_name | state | sync_state -------+------------------+-----------+------------19520 | wal_receiver_01 | streaming | sync19567 | wal_receiver_02 |streaming | async (2 rows) port | synchronous_commit | synchronous_standby_names ------+--------------------+---------------------------6564 | on | wal_receiver_01 (1 row) port | synchronous_commit | synchronous_standby_names ------+--------------------+---------------------------6565 | off | (1 row) port | synchronous_commit | synchronous_standby_names ------+--------------------+---------------------------6566 | off | (1 row) The test consists of creating a table and loading tab-separated data from file with COPY and then taking the rowcount of that table (13 MB, almost 200k rows) in all three instances: # wget http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz zcat $slurp_file \| grep -v '^#' \| grep -Ev '^[[:space:]]*$' \| psql -c " drop table if exists $table cascade; createtable $table ( gene_symbol text , primary_fbgn text , secondary_fbgns text , annotation_id text , secondary_annotation_ids text ); copy $table from stdin csv delimiter E'\t';"; # count on master: echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6564 # count on wal_receiver_01 (sync replica): echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6565 # count on wal_receiver_02 (async replica): echo "select current_setting('port') port,count(*) from $table"|psql -qtXp 6566 I expected the rowcounts from master and sync replica to always be the same. Initially this seemed to be the case, but when I run the above sequence in a while loop for a few minutes about 10% of rowcounts from the sync-replica are not equal to the master. Perhaps not a likely scenario, but surely such a deviating rowcount on a sync replica should not be possible? thank you, Erik Rijkers
Erik, Are you taking the counts *while* the table is loading? In sync replication, it's possible for the counts to differ fora short time due to one of three things: * transaction has been saved to the replica and confirm message hasn't reached the master yet * replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet applied it to the tables inmemory. * updating the master with synchronous_commit = local. ----- Original Message ----- > AMD FX 8120 / centos 6.2 / latest source (git head) > > > It seems to be quite easy to force a 'sync' replica to not be equal > to master by > recreating+loading a table in a while loop. > > > For this test I compiled+checked+installed three separate instances > on the same machine. The > replica application_name are names 'wal_receiver_$copy' where $copy > is 01, resp. 02. > > $ ./sync_state.sh > pid | application_name | state | sync_state > -------+------------------+-----------+------------ > 19520 | wal_receiver_01 | streaming | sync > 19567 | wal_receiver_02 | streaming | async > (2 rows) > > port | synchronous_commit | synchronous_standby_names > ------+--------------------+--------------------------- > 6564 | on | wal_receiver_01 > (1 row) > > port | synchronous_commit | synchronous_standby_names > ------+--------------------+--------------------------- > 6565 | off | > (1 row) > > port | synchronous_commit | synchronous_standby_names > ------+--------------------+--------------------------- > 6566 | off | > (1 row) > > > > The test consists of creating a table and loading tab-separated data > from file with COPY and then > taking the rowcount of that table (13 MB, almost 200k rows) in all > three instances: > > > # wget > http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz > > slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz > > zcat $slurp_file \ > | grep -v '^#' \ > | grep -Ev '^[[:space:]]*$' \ > | psql -c " > drop table if exists $table cascade; > create table $table ( > gene_symbol text > , primary_fbgn text > , secondary_fbgns text > , annotation_id text > , secondary_annotation_ids text > ); > copy $table from stdin csv delimiter E'\t'; > "; > > # count on master: > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6564 > > # count on wal_receiver_01 (sync replica): > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6565 > > # count on wal_receiver_02 (async replica): > echo "select current_setting('port') port,count(*) from $table"|psql > -qtXp 6566 > > > > I expected the rowcounts from master and sync replica to always be > the same. > > Initially this seemed to be the case, but when I run the above > sequence in a while loop for a few > minutes about 10% of rowcounts from the sync-replica are not equal to > the master. > > Perhaps not a likely scenario, but surely such a deviating rowcount > on a sync replica should not > be possible? > > > thank you, > > > Erik Rijkers > > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Thu, May 17, 2012 14:32, Joshua Berkus wrote: > Erik, > > Are you taking the counts *while* the table is loading? In sync replication, it's possible for > the counts to differ for a short time due to one of three things: > > * transaction has been saved to the replica and confirm message hasn't reached the master yet The count(*) was done in the way that I showed, i.e. *after* psql had exited. My understanding is that, with synchronous replication 'on' and configured properly, psql could only return *after* the sync-replica had the data safely on disk. Either my understanding is not correct or there is a bug in postgres sync-rep. > * replica has synched the transaction to the WAL log, but due to wal_delay settings hasn't yet > applied it to the tables in memory. settings are untouched. > * updating the master with synchronous_commit = local. During my test: synchronous_commit = on Thanks, Erik Rijkers > ----- Original Message ----- >> AMD FX 8120 / centos 6.2 / latest source (git head) >> >> >> It seems to be quite easy to force a 'sync' replica to not be equal >> to master by >> recreating+loading a table in a while loop. >> >> >> For this test I compiled+checked+installed three separate instances >> on the same machine. The >> replica application_name are names 'wal_receiver_$copy' where $copy >> is 01, resp. 02. >> >> $ ./sync_state.sh >> pid | application_name | state | sync_state >> -------+------------------+-----------+------------ >> 19520 | wal_receiver_01 | streaming | sync >> 19567 | wal_receiver_02 | streaming | async >> (2 rows) >> >> port | synchronous_commit | synchronous_standby_names >> ------+--------------------+--------------------------- >> 6564 | on | wal_receiver_01 >> (1 row) >> >> port | synchronous_commit | synchronous_standby_names >> ------+--------------------+--------------------------- >> 6565 | off | >> (1 row) >> >> port | synchronous_commit | synchronous_standby_names >> ------+--------------------+--------------------------- >> 6566 | off | >> (1 row) >> >> >> >> The test consists of creating a table and loading tab-separated data >> from file with COPY and then >> taking the rowcount of that table (13 MB, almost 200k rows) in all >> three instances: >> >> >> # wget >> http://flybase.org/static_pages/downloads/FB2012_03/genes/fbgn_annotation_ID_fb_2012_03.tsv.gz >> >> slurp_file=fbgn_annotation_ID_fb_2012_03.tsv.gz >> >> zcat $slurp_file \ >> | grep -v '^#' \ >> | grep -Ev '^[[:space:]]*$' \ >> | psql -c " >> drop table if exists $table cascade; >> create table $table ( >> gene_symbol text >> , primary_fbgn text >> , secondary_fbgns text >> , annotation_id text >> , secondary_annotation_ids text >> ); >> copy $table from stdin csv delimiter E'\t'; >> "; >> >> # count on master: >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6564 >> >> # count on wal_receiver_01 (sync replica): >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6565 >> >> # count on wal_receiver_02 (async replica): >> echo "select current_setting('port') port,count(*) from $table"|psql >> -qtXp 6566 >> >> >> >> I expected the rowcounts from master and sync replica to always be >> the same. >> >> Initially this seemed to be the case, but when I run the above >> sequence in a while loop for a few >> minutes about 10% of rowcounts from the sync-replica are not equal to >> the master. >> >> Perhaps not a likely scenario, but surely such a deviating rowcount >> on a sync replica should not >> be possible? >> >> >> thank you, >> >> >> Erik Rijkers >> >> >> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote: > The count(*) was done in the way that I showed, i.e. *after* psql had exited. My understanding is > that, with synchronous replication 'on' and configured properly, psql could only return *after* > the sync-replica had the data safely on disk. Either my understanding is not correct or there is > a bug in postgres sync-rep. Commit can only return when sync-replica has the data safely on disk, but this doesn't mean that it's visible yet. The sequence of events is in dot notation: commit_command -> master_wal_sync -> replica_wal_sync -> master_commit_visible -> commit_response replica_wal_sync -> replica_replay_wal -> replica_commit_visible If you issue a select on the replica after getting a commit response from master you can see that the query getting a snapshot races with replay of the commit record. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On Thu, May 17, 2012 16:10, Ants Aasma wrote: > On Thu, May 17, 2012 at 4:53 PM, Erik Rijkers <er@xs4all.nl> wrote: >> The count(*) was done in the way that I showed, i.e. *after* psql had exited. My understanding >> is >> that, with synchronous replication 'on' and configured properly, psql could only return *after* >> the sync-replica had the data safely on disk. Either my understanding is not correct or there >> is >> a bug in postgres sync-rep. > > Commit can only return when sync-replica has the data safely on disk, > but this doesn't mean that it's visible yet. > > The sequence of events is in dot notation: > commit_command -> master_wal_sync -> replica_wal_sync -> > master_commit_visible -> commit_response > replica_wal_sync -> replica_replay_wal -> replica_commit_visible > > If you issue a select on the replica after getting a commit response > from master you can see that the query getting a snapshot races with > replay of the commit record. > Ah yes, that makes sense. I hadn't thought of that. Thank you for that explanation.