Inconsistent error handling in START_REPLICATION command - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Inconsistent error handling in START_REPLICATION command
Date
Msg-id CACACo5QVAEOo0n6HFkuX_0_RO6oDLEbg9qk1CREyxz0i-zuVDw@mail.gmail.com
Whole thread Raw
Responses Re: Inconsistent error handling in START_REPLICATION command  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers
Hackers,

It looks like there's an inconsistency in error handling during START_REPLICATION command of replication protocol:

$ psql postgres://localhost/psycopg2test?replication=database
psql (9.6devel)
Type "help" for help.

psycopg2test=# IDENTIFY_SYSTEM;
      systemid       | timeline |  xlogpos  |    dbname    
---------------------+----------+-----------+--------------
 6235978519197579707 |        1 | 0/2CE0F78 | psycopg2test
(1 row)

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid" "value");
ERROR:  syntax error

1) syntax errors are reported and client can retry with corrected command:

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid" 'value');
ERROR:  replication slot name "TEST1" contains invalid character
HINT:  Replication slot names may only contain lower case letters, numbers, and the underscore character.

2) further errors are reported and we can retry:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
ERROR:  replication slot "test1" does not exist

psycopg2test=# CREATE_REPLICATION_SLOT "test1" LOGICAL "test_decoding";
 slot_name | consistent_point | snapshot_name | output_plugin 
-----------+------------------+---------------+---------------
 test1     | 0/2CF5340        | 0000088C-1    | test_decoding
(1 row)

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
unexpected PQresultStatus: 8

The last command results in the following output sent to the server log:

ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup callback

But the client has no way to learn about the error, nor can it restart with correct one (the server has entered COPY protocol mode):

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
PQexec not allowed during COPY BOTH

I recall Craig Rigner mentioning this issue in context of the pglogical_output plugin, but I thought that was something to do with the startup packet.  The behavior above doesn't strike me as very consistent, we should be able to detect and report errors during output plugin startup and let the client retry with the corrected command as we do for syntax or other errors.

I didn't look in the code yet, but if someone knows off top of the head the reason to this behavior, I'd be glad to learn it.

Cheers.
--
Oleksandr "Alex" Shulgin
Database Engineer

Mobile: +49 160 84-90-639
Email: oleksandr.shulgin@zalando.de

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: GIN data corruption bug(s) in 9.6devel
Next
From: Christoph Berg
Date:
Subject: Re: Building pg_xlogdump reproducibly