Re: Inconsistent error handling in START_REPLICATION command - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Inconsistent error handling in START_REPLICATION command
Date
Msg-id CAMsr+YEb19wo0HvN0FXnN9FYxMaEUL7x-VtuDvj1Uu3nK5n8nQ@mail.gmail.com
Whole thread Raw
In response to Re: Inconsistent error handling in START_REPLICATION command  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: Inconsistent error handling in START_REPLICATION command
Re: Inconsistent error handling in START_REPLICATION command
List pgsql-hackers


On 5 January 2016 at 18:35, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
 
psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
unexpected PQresultStatus: 8

I think the point here is that START_REPLICATION issues useful errors and returns to the normal protocol up until the logical decoding startup callback is invoked.

It enters COPY BOTH mode before it invokes the startup callback. The client has no way to unilaterally terminate COPY BOTH mode and return to the normal walsender protocol. The server doesn't do it when there's an ERROR. So the only option the client has for recovery is to disconnect and reconnect.
 
I recall Craig Rigner mentioning this issue in context of the pglogical_output plugin, but I thought that was something to do with the startup packet.  The behavior above doesn't strike me as very consistent, we should be able to detect and report errors during output plugin startup and let the client retry with the corrected command as we do for syntax or other errors.

I agree. This would also let clients have some limited facility for negotiating options with the output plugin, albeit via the very clumsy channel of libpq protocol error message records and their text.


 
Looks like the attached patch fixes it for me:

For those who didn't read it, the patch moves the start of COPY BOTH mode to after the logical decoding startup callback is invoked.
 
This certainly fixes the immediate issue. It also forecloses something else I'd really like though, which is the ability for plugins to send output to clients from within their decoding startup hook.

For that reason I'd actually like to enter COPY BOTH mode before the startup callback, as is currently the case. So I'd like to wrap the decoding startup callback in a PG_TRY that catches an ERROR raised by the startup callback (if any) and exits COPY BOTH mode before re-throwing.

Sound reasonable?


I'm looking toward a future where the decoding startup callback can emit a structured, plugin-specific CopyData-wrapped message with information the downstream might need to negotiate a connection. It could then exit back to walsender mode by throwing an ERROR. Or, better, by setting some new decoding context field that's checked after each callback to see if the plugin wants to exit COPY BOTH mode and return to walsender protocol mode without spamming the logs with an ERROR. That way we can finally do two-way negotiation between downstream and upstream.

(Related to this, I also want to add a new decoding callback that lets the downstream send arbitrary CopyData back up the wire to the upstream where it's passed to the decoding callback. This would give the downstream a backchannel to change settings, request things from the output plugin, etc, w/o lots of disconnects and reconnects.)

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Logical decoding on standby
Next
From: Andres Freund
Date:
Subject: Re: Logical decoding on standby