Re: CREATE REPLICATION SLOT fails on a timeout - Mailing list pgsql-hackers

From Andres Freund
Subject Re: CREATE REPLICATION SLOT fails on a timeout
Date
Msg-id 20140516204331.GE13967@awork2.anarazel.de
Whole thread Raw
In response to CREATE REPLICATION SLOT fails on a timeout  (Steve Singer <steve@ssinger.info>)
Responses Re: CREATE REPLICATION SLOT fails on a timeout
List pgsql-hackers
Hi,

On 2014-05-16 16:37:16 -0400, Steve Singer wrote:
> I am finding that my logical walsender connections are being terminated due
> to a timeout on the CREATE REPLICATION SLOT command. with "terminating
> walsender process due to replication timeout"
> 
> Below is the stack trace when this happens
> 
> #3  0x000000000067df28 in WalSndCheckTimeOut (now=now@entry=453585463823871)
> at walsender.c:1748
> #4  0x000000000067eedc in WalSndWaitForWal (loc=691727888) at
> walsender.c:1216
> ...
> #9  0x0000000000680f16 in CreateReplicationSlot (cmd=0x1798b50) at
> walsender.c:800
> #10 exec_replication_command ()  at walsender.c:1291
> #11 0x00000000006bf4a1 in PostgresMain (argc=<optimized out>,
> argv=argv@entry=0x177db50, dbname=0x177db30 "test1",
> 
> (gdb) p last_reply_timestamp
> $1 = 0
> 
> 
> I propose the attached patch sets last_reply_timestamp to now() it starts
> processing a command.  Since receiving a command is hearing something from
> the client.

Hm. Yes, that's a problem.

> diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> new file mode 100644
> index 5c11d68..56a2f10
> *** a/src/backend/replication/walsender.c
> --- b/src/backend/replication/walsender.c
> *************** exec_replication_command(const char *cmd
> *** 1276,1281 ****
> --- 1276,1282 ----
>                                     parse_rc))));
>   
>       cmd_node = replication_parse_result;
> +     last_reply_timestamp = GetCurrentTimestamp();
>   
>       switch (cmd_node->type)
>       {

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Steve Singer
Date:
Subject: CREATE REPLICATION SLOT fails on a timeout
Next
From: Steve Singer
Date:
Subject: Re: CREATE REPLICATION SLOT fails on a timeout