Re: [HACKERS] More replication race conditions - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] More replication race conditions
Date
Msg-id a37dbe7c-3bc1-553b-6d3c-fd070b7101a6@2ndquadrant.com
Whole thread Raw
In response to [HACKERS] More replication race conditions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] More replication race conditions  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 24/08/17 19:54, Tom Lane wrote:
> sungazer just failed with
> 
> pg_recvlogical exited with code '256', stdout '' and stderr 'pg_recvlogical: could not send replication command
"START_REPLICATIONSLOT "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": ERROR:  replication slot
"test_slot"is active for PID 8913148
 
> pg_recvlogical: disconnected
> ' at /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm line 1657.
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2017-08-24%2015%3A16%3A10
> 
> Looks like we're still not there on preventing replication startup
> race conditions.

Hmm, that looks like "by design" behavior. Slot acquiring will throw
error if the slot is already used by somebody else (slots use their own
locking mechanism that does not wait). In this case it seems the
walsender which was using slot for previous previous step didn't finish
releasing the slot by the time we start new command. We can work around
this by changing the test to wait perhaps.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: [HACKERS] MAIN, Uncompressed?
Next
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] Proposal: global index