Logical replication wal sender timestamp bug - Mailing list pgsql-hackers

From Jeff Janes
Subject Logical replication wal sender timestamp bug
Date
Msg-id CAMkU=1z=WMn8jt7iEdC5sYNaPgAgOASb_OW5JYv-vMdYaJSL-w@mail.gmail.com
Whole thread Raw
Responses Re: Logical replication wal sender timestamp bug  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
While monitoring pg_stat_subscription, I noticed that last_msg_send_time was usually NULL, which doesn't make sense.  Why would we have a message, but not know when it was sent?

Looking in src/backend/replication/walsender.c, there is this:

    /* output previously gathered data in a CopyData packet */
    pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

    /*
     * Fill the send timestamp last, so that it is taken as late as possible.
     * This is somewhat ugly, but the protocol is set as it's already used for
     * several releases by streaming physical replication.
     */
    resetStringInfo(&tmpbuf);
    now = GetCurrentTimestamp();
    pq_sendint64(&tmpbuf, now);
    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
           tmpbuf.data, sizeof(int64));

Filling out the timestamp after the message has already been sent is taking "as late as possible" a little too far.  It results in every message having a zero timestamp, other than keep-alives which go through a different path.

Re-ordering the code blocks as in the attached seems to fix it.  But I have to wonder, if this has been broken from the start and no one ever noticed, is this even valuable information to relay in the first place?  We could just take the column out of the view, and not bother calling  GetCurrentTimestamp() for each message.

Cheers,

Jeff
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Remove configure --disable-float4-byval and--disable-float8-byval
Next
From: Artur Zakirov
Date:
Subject: Re: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+