Home > mailing lists

"no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication - Mailing list pgsql-bugs

From	Sékine Coulibaly
Subject	"no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication
Date	March 25, 2015 12:21:10
Msg-id	CAD8n-Frk1Sb5Sgd9P_H7+QR4m=6kGZBSkcZdpcGA_KLhAxJ6fw@mail.gmail.com Whole thread Raw
Responses	Re: "no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication (Kevin Grittner <kgrittn@ymail.com>)
List	pgsql-bugs

Tree view

Hi there !

I have a master/slave setup. I use Corosync/Pacemaker for the clustering layer, and repmgr for the PsotgreSQL 9.2.4 synchronous replication. When a transaction is received by the master before the slave is up and running, the transaction seems blocked forever on the backend.

The setup is as follows :

- The master node run a PostgreSQL master instance and an in-house application. The clustering first starts PostgreSQL, then the application. The application then connects to the database, and starts creating tables.

- The slave node only runs a PostgreSQL synchronous streamed replica. The in-house application doesn't run on the slave node.

My understanding of synchronous replication is that unless the slave is up, the master can not commit any transaction (unless the configurations forces it, which is nonesense).

This is the test scenario :

- The master node is started, so repmgr, through the clustering layer, starts PostgreSQL as a master.

- Once the master is started, the application is started. The latter connects to the master and issues a CREATE TABLE query.

- On the slave, repmgr then starts the PostgreSQL slave instance : it first performs a clone from the running master, and then starts streaming.

At this point, a ps auxw | grep "post" command shows the following :

- On the MASTER :

postgres 14495 0.0 0.0 220344 8616 ? S 11:46 0:00 /usr/pgsql-9.2/bin/postgres -D /opt/analytics/pgdata -p 5432

postgres 14496 0.0 0.0 177900 1172 ? Ss 11:46 0:00 postgres: logger process

postgres 14498 0.0 0.0 220488 2420 ? Ss 11:46 0:00 postgres: checkpointer process

postgres 14499 0.0 0.0 220344 1636 ? Ss 11:46 0:00 postgres: writer process

postgres 14500 0.0 0.0 220344 1404 ? Ss 11:46 0:00 postgres: wal writer process

postgres 14501 0.0 0.0 221168 2672 ? Ss 11:46 0:00 postgres: autovacuum launcher process

postgres 14502 0.0 0.0 180000 1260 ? Ss 11:46 0:00 postgres: archiver process last was 000000010000000000000040

postgres 14503 0.0 0.0 180136 1412 ? Ss 11:46 0:00 postgres: stats collector process

postgres 14637 0.0 0.0 222576 7812 ? Ss 11:46 0:00 postgres: postgres logs 127.0.0.1(40436) CREATE TABLE waiting for 0/3F01E730

postgres 15978 0.0 0.0 221316 3132 ? Ss 11:48 0:00 postgres: wal sender process repmgr 10.15.35.5(50844) streaming 0/41002048

- On the SLAVE :

postgres 8004 0.0 0.0 220340 8612 ? S 11:48 0:00 /usr/pgsql-9.2/bin/postgres -D /opt/analytics/pgdata -p 5432

postgres 8005 0.0 0.0 177896 1168 ? Ss 11:48 0:00 postgres: logger process

postgres 8006 0.0 0.0 220420 1968 ? Ss 11:48 0:00 postgres: startup process recovering 000000010000000000000041

postgres 8007 0.0 0.0 227736 3132 ? Ss 11:48 0:02 postgres: wal receiver process streaming 0/41002048

postgres 8008 0.0 0.0 220340 1808 ? Ss 11:48 0:00 postgres: checkpointer process

postgres 8009 0.0 0.0 220340 1632 ? Ss 11:48 0:00 postgres: writer process

postgres 8010 0.0 0.0 180132 1408 ? Ss 11:48 0:00 postgres: stats collector process

My observation here is that :

- the master has a WAL sender up-and-running, streaming WAL 0/41002048.

- on the master node, the in-house application is waiting for the CREATE TABLE transaction to be commited (CREATE TABLE waiting for 0/3F01E730). PostgreSQL masters seems to wait for a WAL that was probably never received (since the data of the WAL was transferred during the cloning by RSYNC'ing the master's filesystem).

- the slave has a WAL receiver up-and-running, streaming WAL 0/41002048.

I may be wrong here, but the expected behaviour in such a case wouldn't be to unblock the CREATE TABLE transaction ? It looks like this transaction will never be able to complete.

I'm running RHEL 6.5, and PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit.

I've not tried to reproduce on 9.4.1+ yet. Is it worth trying or is this a known and solved issue ?

Any hint will be highly appreciated !

Regards,

pgsql-bugs by date:

From: kimthangatm@gmail.com
Date: 25 March 2015, 07:07:07
Subject: BUG #12895: Error: systemctl command not found

From: Michael Paquier
Date: 25 March 2015, 14:46:21
Subject: Re: BUG #12885: The result of casting a double to an integer depends on the database version

"no-slave yet" early CREATE TABLE transaction gets blocked when synchronous replication - Mailing list pgsql-bugs

Previous

Next