Re: COPY IN/BOTH vs. extended query mode - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: COPY IN/BOTH vs. extended query mode
Date
Msg-id dac86f9315450b12d419c602281fe682f2b24e84.camel@j-davis.com
Whole thread Raw
In response to [HACKERS] COPY IN/BOTH vs. extended query mode  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, 2017-01-23 at 21:12 -0500, Robert Haas wrote:
> According to the documentation for COPY IN mode, "If the COPY command
> was issued via an extended-query message, the backend will now
> discard
> frontend messages until a Sync message is received, then it will
> issue
> ReadyForQuery and return to normal processing."  I added a similar
> note to the documentation for COPY BOTH mode in
> 91fa8532f4053468acc08534a6aac516ccde47b7, and the documentation
> accurately describes the behavior of the server.  However, this seems
> to make fully correct error handling for clients using libpq almost
> impossible, because PQsendQueryGuts() sends
> Parse-Bind-Describe-Execute-Sync in one shot without regard to
> whether
> the command that was just sent invoked COPY mode (cf. the note in
> CopyGetData about why we ignore Flush and Sync in that function).
> 
> So imagine that the client uses libpq to send (via the extended query
> protocol) a COPY IN command (or some hypothetical command that starts
> COPY BOTH mode to begin).  If the server throws an error before the
> Sync message is consumed, it will bounce back to PostgresMain which
> will set doing_extended_query_message = true after which it will
> consume messages, find the Sync, reset that flag, and send
> ReadyForQuery.  On the other hand, if the server enters CopyBoth
> mode,
> consumes the Sync message in CopyGetData (or a similar function), and
> *then* throws an ERROR, the server will wait for a second Sync
> message
> from the client before issuing ReadyForQuery.  There is no sensible
> way of coping with this problem in libpq, because there is no way for
> the client to know which part of the server code consumed the Sync
> message that it already sent.  In short, from the client's point of
> view, if it enters COPY IN or COPY BOTH mode via the extend query
> protocol, and an error occurs on the server, the server MAY OR MAY
> NOT
> expect a further Sync message before issuing ReadyForQuery, and the
> client has no way of knowing -- except maybe waiting for a while to
> see what happens.

I investigated a bit deeper here, and I'm not sure this is a real
problem (aside from ambiguity in the protocol docs).

If you send "COPY ... FROM STDIN" using the extended query protocol in
libpq, the non-error message flow is something like:

  -> Parse + Bind + Describe + Execute + Sync
  [ server processes Parse + Bind + Describe + Execute ]
  [ server enters copy-in mode ]
  <- CopyInResponse
  [ server ignores Sync ]
  -> CopyData
  [ server processes CopyData ]
  -> CopyDone
  [ server processes CopyDone ]
  [ server exits copy-in mode ]
  -> Sync
  [ server processes Sync ]
  <- ReadyForQuery

If an error happens before the server enters copy-in mode (e.g. syntax
error), then you get something like:

  -> Parse + Bind + Describe + Execute + Sync
  [ server processes
Parse, encounters error ]
  <- ErrorResponse
  [ server ignores Bind +
Describe + Execute ]
  [ server processes Sync ]
  <- ReadyForQuery
  [
client never got CopyInResponse, so never sent copy messages ]

If an error happens after the CopyInResponse is sent (e.g. malformed
data), you get something like:

  -> Parse + Bind + Describe + Execute + Sync
  [ server processes Bind + Describe + Execute ]
  [ server enters copy-in mode ]
  <- CopyInResponse
  [ server ignores Sync ]
  -> CopyData
  [ server processes CopyData, encounters error ]
  [ server exits copy-in mode ]
  <- ErrorResponse
  -> CopyDone
  [ server ignores CopyDone ]
  -> Sync
  [ server processes Sync ]
  <- ReadyForQuery

If the backend is canceled after the server sends CopyInResponse but
before it consumes (and ignores) the Sync, you get something like:

  -> Parse + Bind + Describe + Execute + Sync
  [ server processes Bind + Describe + Execute ]
  [ server enters copy-in mode ]
  <- CopyInResponse
  [ SIGINT, server encounters error ]
  <- ErrorResponse
  [ server exits copy-in mode ]
  [ server processes Sync ]
  <- ReadyForQuery
  -> CopyData
  [ server ignores CopyData ]
  -> CopyDone
  [ server ignores CopyDone ]
  -> Sync
  [ server processes Sync ]
  <- ReadyForQuery

The last case is not great, because I could imagine it confusing a
client, but I'm not sure about exactly how, and maybe it's something we
can document around?

Regards,
    Jeff Davis







pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A micro-optimisation for ProcSendSignal()
Next
From: Greg Stark
Date:
Subject: Re: [PATCH] Document heuristics for parameterized paths