Thread: PQputCopyEnd doesn't adhere to its API contract

PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
According to the documentation for PQputCopyEnd:

> The result is 1 if the termination data was sent, zero if it was not sent because the attempt would block (this case
isonly possible if the connection is in
 
> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to retrieve details if the return value is -1. If
thevalue is zero, wait for write-ready and try again.)
 

However, pqPutCopyEnd contains no return statement that can ever
possibly return 0.  I think the problem is approximately here:
   /* Try to flush data */   if (pqFlush(conn) < 0)       return -1;

pqFlush() returns 0 if no data is waiting to be sent, or otherwise the
return value of pqSendSome().  pqSendSome() returns -1 if an error
occurs, 0 if all data is sent, or 1 if some data was sent but the
socket is non-blocking and the caller must try again later.  It seems
to me that when pqSendSome() returns 1, pqPutCopyEnd ought to return 0
in order to meet its API contract - and then the client, presumably,
should repeatedly wait for the socket to become write-ready and then
try PQflush() until PQflush() returns non-zero.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> According to the documentation for PQputCopyEnd:
>> The result is 1 if the termination data was sent, zero if it was not sent because the attempt would block (this case
isonly possible if the connection is in
 
>> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to retrieve details if the return value is -1. If
thevalue is zero, wait for write-ready and try again.)
 

> However, pqPutCopyEnd contains no return statement that can ever
> possibly return 0.

IIRC, for some of the libpq functions, the API spec was intended to allow
for future async behavior that wasn't actually implemented.  So the mere
fact that it can't return zero today is not a problem.  If it might block
instead, then we have a problem, but that's not what you're saying.

Also, what the API spec says is that clients should retry PQputCopyEnd
on a zero return.  We do *not* want them to do that here; once we've
changed asyncStatus, a retry would report an error.  So the API spec
in this case is intended to require a retry loop that the current
implementation doesn't actually need, but that conceivably could be
needed after some future refactoring.  In particular, we might want
to fix the code so that it returns zero if it fails to queue the
COPY END message at all.

> pqFlush() returns 0 if no data is waiting to be sent, or otherwise the
> return value of pqSendSome().  pqSendSome() returns -1 if an error
> occurs, 0 if all data is sent, or 1 if some data was sent but the
> socket is non-blocking and the caller must try again later.  It seems
> to me that when pqSendSome() returns 1, pqPutCopyEnd ought to return 0
> in order to meet its API contract - and then the client, presumably,
> should repeatedly wait for the socket to become write-ready and then
> try PQflush() until PQflush() returns non-zero.

That would be a change in the API spec, not just the implementation,
and it's not clear to me that it would actually be helpful to any
clients.
        regards, tom lane



Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, May 8, 2014 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> According to the documentation for PQputCopyEnd:
>>> The result is 1 if the termination data was sent, zero if it was not sent because the attempt would block (this
caseis only possible if the connection is in
 
>>> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to retrieve details if the return value is -1.
Ifthe value is zero, wait for write-ready and try again.)
 
>
>> However, pqPutCopyEnd contains no return statement that can ever
>> possibly return 0.
>
> IIRC, for some of the libpq functions, the API spec was intended to allow
> for future async behavior that wasn't actually implemented.  So the mere
> fact that it can't return zero today is not a problem.  If it might block
> instead, then we have a problem, but that's not what you're saying.
>
> Also, what the API spec says is that clients should retry PQputCopyEnd
> on a zero return.  We do *not* want them to do that here; once we've
> changed asyncStatus, a retry would report an error.  So the API spec
> in this case is intended to require a retry loop that the current
> implementation doesn't actually need, but that conceivably could be
> needed after some future refactoring.  In particular, we might want
> to fix the code so that it returns zero if it fails to queue the
> COPY END message at all.

Yeah, good point.  I think what confused me is the fact that the
documentation says that when PQputCopyEnd() returns 1, that means that
"the termination data was sent".  So my application sat there and
waited for a server response that never showed up, because in fact the
termination data had *not* been sent.

What I'm now thinking I need to do is something like this:

1. If PQputCopyEnd returns -1, error.
2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
wait for socket to become read-ready or write-ready; } }
3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
4. PQgetResult()

Does that sound right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> What I'm now thinking I need to do is something like this:

> 1. If PQputCopyEnd returns -1, error.
> 2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
> wait for socket to become read-ready or write-ready; } }
> 3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
> 4. PQgetResult()

> Does that sound right?

Yeah, more or less --- I think you need a PQconsumeInput there somewhere.

There is a PQflush call in PQconsumeInput that is intended to keep clients
from having to do that for themselves; but I'm not sure that it helps,
since clients probably only call PQconsumeInput when the socket is
read-ready --- and it wouldn't be in this situation.
        regards, tom lane



Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, May 8, 2014 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> What I'm now thinking I need to do is something like this:
>
>> 1. If PQputCopyEnd returns -1, error.
>> 2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
>> wait for socket to become read-ready or write-ready; } }
>> 3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
>> 4. PQgetResult()
>
>> Does that sound right?
>
> Yeah, more or less --- I think you need a PQconsumeInput there somewhere.
>
> There is a PQflush call in PQconsumeInput that is intended to keep clients
> from having to do that for themselves; but I'm not sure that it helps,
> since clients probably only call PQconsumeInput when the socket is
> read-ready --- and it wouldn't be in this situation.

OK.  It still seems to me that there's a doc fix needed here, if
nothing else.  The documentation for PQputCopyEnd() clearly implies
that if you get a return value of 1, the message is sent, and that's
just not true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> OK.  It still seems to me that there's a doc fix needed here, if
> nothing else.  The documentation for PQputCopyEnd() clearly implies
> that if you get a return value of 1, the message is sent, and that's
> just not true.

That's fair.
        regards, tom lane



Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, May 8, 2014 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> OK.  It still seems to me that there's a doc fix needed here, if
>> nothing else.  The documentation for PQputCopyEnd() clearly implies
>> that if you get a return value of 1, the message is sent, and that's
>> just not true.
>
> That's fair.

Something like the attached?

I don't really know what to say about 0, since it's never actually
returned, so I left that out.  But I'm up for suggestions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: PQputCopyEnd doesn't adhere to its API contract

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 8, 2014 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> OK.  It still seems to me that there's a doc fix needed here, if
>>> nothing else.  The documentation for PQputCopyEnd() clearly implies
>>> that if you get a return value of 1, the message is sent, and that's
>>> just not true.

>> That's fair.

> Something like the attached?

> I don't really know what to say about 0, since it's never actually
> returned, so I left that out.  But I'm up for suggestions.

I think you should leave the existing verbiage about zero alone.
If we take it out, we're painting ourselves into a corner about
ever fixing the buffer-is-too-full failure case.

Perhaps the text should be like this:

The result is 1 if the termination message was sent; or in nonblocking
mode, this may only indicate that the termination message was successfully
queued.  (In nonblocking mode, to be certain that the data has been sent,
you should next wait for write-ready and call <function>PQflush</>,
repeating until it returns zero.)  Zero indicates that the function could
not queue the termination message because of full buffers; this will only
happen in nonblocking mode.  (In this case, wait for write-ready and try
the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
can use <function>PQerrorMessage</function> to retrieve details.

If it bothers you to document behavior that's not actually there yet,
maybe you should go fix the bug ;-).  I had thought this would require
a lot of refactoring, but it might work to just check if there's enough
buffer space for the message and return zero immediately if not.
        regards, tom lane



Re: PQputCopyEnd doesn't adhere to its API contract

From
David G Johnston
Date:
Tom Lane-2 wrote
> Robert Haas <

> robertmhaas@

> > writes:
>> On Thu, May 8, 2014 at 1:51 PM, Tom Lane <

> tgl@.pa

> > wrote:
>>> Robert Haas <

> robertmhaas@

> > writes:
>>>> OK.  It still seems to me that there's a doc fix needed here, if
>>>> nothing else.  The documentation for PQputCopyEnd() clearly implies
>>>> that if you get a return value of 1, the message is sent, and that's
>>>> just not true.
> 
>>> That's fair.
> 
>> Something like the attached?
> 
>> I don't really know what to say about 0, since it's never actually
>> returned, so I left that out.  But I'm up for suggestions.
> 
> I think you should leave the existing verbiage about zero alone.
> If we take it out, we're painting ourselves into a corner about
> ever fixing the buffer-is-too-full failure case.
> 
> Perhaps the text should be like this:
> 
> The result is 1 if the termination message was sent; or in nonblocking
> mode, this may only indicate that the termination message was successfully
> queued.  (In nonblocking mode, to be certain that the data has been sent,
> you should next wait for write-ready and call 
> <function>
> PQflush
> </>
> ,
> repeating until it returns zero.)  Zero indicates that the function could
> not queue the termination message because of full buffers; this will only
> happen in nonblocking mode.  (In this case, wait for write-ready and try
> the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> can use 
> <function>
> PQerrorMessage
> </function>
>  to retrieve details.
> 
> If it bothers you to document behavior that's not actually there yet,
> maybe you should go fix the bug ;-).  I had thought this would require
> a lot of refactoring, but it might work to just check if there's enough
> buffer space for the message and return zero immediately if not.

Had trouble comprehending Tom's wording so decided to word-smith a little:

The return value can be one of [-1, 0 , 1] whose meaning depends on whether
you are in blocking or non-blocking mode.  
In blocking mode a 1 means success and -1 means permanent failure; it is not
possible to receive a 0 in blocking mode - all calls will either succeed (1)
or fail (-1).
In non-blocking mode a 1 means that the termination message was placed in
the queue while a -1 means an immediate and permanent failure. A return
value of 0 means that the queue was full and that you need to try again at
the next wait-ready.
[?]While in non-blocking mode once you have received a return value of 1 you
will need to continually poll, at each wait-ready, <function>PQflush</>
until it returns 0 (queue empty) or -1 (failure).
The error associated with the -1 return can be retrieved using
<function>PQerrorMessage</>

[?]Isn't the whole PQflush comment simply repeating what is written
elsewhere?  The requirement is standard when working in non-blocking mode.

It does seem like the "buffer full" case would be fairly direct to
resolve...and I'm not sure what the explanation would be, in the doc above,
as to why "0 is not returned by the current implementation because (the
queue can never be full, or, even in non-blocking mode the system will block
to add to the queue in this function)....

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PQputCopyEnd-doesn-t-adhere-to-its-API-contract-tp5803240p5803297.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Perhaps the text should be like this:
>
> The result is 1 if the termination message was sent; or in nonblocking
> mode, this may only indicate that the termination message was successfully
> queued.  (In nonblocking mode, to be certain that the data has been sent,
> you should next wait for write-ready and call <function>PQflush</>,
> repeating until it returns zero.)  Zero indicates that the function could
> not queue the termination message because of full buffers; this will only
> happen in nonblocking mode.  (In this case, wait for write-ready and try
> the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> can use <function>PQerrorMessage</function> to retrieve details.

That looks pretty good.   However, I'm realizing this isn't the only
place where we probably need to clarify the language.  Just to take
one example near at hand, PQputCopyData may also return 1 when it's
only queued the data; it seems to try even less hard than PQputCopyEnd
to ensure that the data is actually sent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> That looks pretty good.   However, I'm realizing this isn't the only
> place where we probably need to clarify the language.  Just to take
> one example near at hand, PQputCopyData may also return 1 when it's
> only queued the data; it seems to try even less hard than PQputCopyEnd
> to ensure that the data is actually sent.

Well, we certainly do NOT want a flush in PQputCopyData.
        regards, tom lane



Re: PQputCopyEnd doesn't adhere to its API contract

From
Bruce Momjian
Date:
On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Perhaps the text should be like this:
> >
> > The result is 1 if the termination message was sent; or in nonblocking
> > mode, this may only indicate that the termination message was successfully
> > queued.  (In nonblocking mode, to be certain that the data has been sent,
> > you should next wait for write-ready and call <function>PQflush</>,
> > repeating until it returns zero.)  Zero indicates that the function could
> > not queue the termination message because of full buffers; this will only
> > happen in nonblocking mode.  (In this case, wait for write-ready and try
> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> > can use <function>PQerrorMessage</function> to retrieve details.
> 
> That looks pretty good.   However, I'm realizing this isn't the only
> place where we probably need to clarify the language.  Just to take
> one example near at hand, PQputCopyData may also return 1 when it's
> only queued the data; it seems to try even less hard than PQputCopyEnd
> to ensure that the data is actually sent.

Uh, where are we on this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: PQputCopyEnd doesn't adhere to its API contract

From
David G Johnston
Date:
On Wed, Sep 3, 2014 at 6:25 PM, Bruce Momjian [via PostgreSQL] <[hidden email]> wrote:
On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:

> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <[hidden email]> wrote:
> > Perhaps the text should be like this:
> >
> > The result is 1 if the termination message was sent; or in nonblocking
> > mode, this may only indicate that the termination message was successfully
> > queued.  (In nonblocking mode, to be certain that the data has been sent,
> > you should next wait for write-ready and call <function>PQflush</>,
> > repeating until it returns zero.)  Zero indicates that the function could
> > not queue the termination message because of full buffers; this will only
> > happen in nonblocking mode.  (In this case, wait for write-ready and try
> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> > can use <function>PQerrorMessage</function> to retrieve details.
>
> That looks pretty good.   However, I'm realizing this isn't the only
> place where we probably need to clarify the language.  Just to take
> one example near at hand, PQputCopyData may also return 1 when it's
> only queued the data; it seems to try even less hard than PQputCopyEnd
> to ensure that the data is actually sent.
Uh, where are we on this?


​The comment for "​PQsetnonblocking" in 31.4 says:

"In the nonblocking state, calls to PQsendQuery, PQputline, PQputnbytes, and PQendcopy will not block but instead return an error if they need to be called again."

This is apparently false for PQendcopy - I did not go look at the others

​I stand by my belief that someone who is using "Non-Blocking Mode" on read
​should 
understand how to proceed if they receive a
​1 or 0
 result from one of the "put" calls.  A cross-reference to the relevant section of the docs may be in order if that assumption is felt to be too optimistic.
​  While it could possibly be more prominent the last sentence in 31.4 states:

"​
After sending any command or data on a nonblocking connection, call PQflush. If it returns 1, wait for the socket to be write-ready and call it again; repeat until it returns 0. Once PQflush returns 0, wait for the socket to be read-ready and then read the response as described above.
​"

That said, I imagine a "tip" section for PQputCopyData may be in order if the caller can gain efficiencies by filling up the queue before going and polling with PQflush. I imagine this is the exact reason that the potential for a 0 result exists.  From the comment in 31.4 each call to PQputCopyData should be followed by the call to PQflush...



As is my usual I decided to use my fresh perspective to see if I could organize the material in a more structured, and in this case less repetitive, way.  Sending a diff/patch and a PDF of the result of "make html"

I did not look at the 31.9.3 - Obsolete Functions section

​Is there any particular rule-of-thumb for choosing "0" or "zero" that I should consider?  I tended to pick "0" in almost all cases and even fixed a few otherwise untouched blocks of text.

The adjective "network" in "...to read from or write to the network connection used by libpq." seems redundant...

David J.

P.S. I am also curious as to why the documents generated using "make html" do not more closely (and there is quite a gap) match the style of the online website....




libpq_sgml_31_9_functions_associated_with_the_copy_command.diff (14K) Download Attachment
libpq_sgml_31_9_functions_associated_with_the_copy_command.pdf (263K) Download Attachment


View this message in context: Re: PQputCopyEnd doesn't adhere to its API contract
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
>> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Perhaps the text should be like this:
>> >
>> > The result is 1 if the termination message was sent; or in nonblocking
>> > mode, this may only indicate that the termination message was successfully
>> > queued.  (In nonblocking mode, to be certain that the data has been sent,
>> > you should next wait for write-ready and call <function>PQflush</>,
>> > repeating until it returns zero.)  Zero indicates that the function could
>> > not queue the termination message because of full buffers; this will only
>> > happen in nonblocking mode.  (In this case, wait for write-ready and try
>> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
>> > can use <function>PQerrorMessage</function> to retrieve details.
>>
>> That looks pretty good.   However, I'm realizing this isn't the only
>> place where we probably need to clarify the language.  Just to take
>> one example near at hand, PQputCopyData may also return 1 when it's
>> only queued the data; it seems to try even less hard than PQputCopyEnd
>> to ensure that the data is actually sent.
>
> Uh, where are we on this?

I think someone needs to take Tom's proposed language and make it into
a patch.  And figure out which other functions in the documentation
need similar updates.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
Bruce Momjian
Date:
On Thu, Sep  4, 2014 at 12:52:14PM -0400, Robert Haas wrote:
> On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
> >> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Perhaps the text should be like this:
> >> >
> >> > The result is 1 if the termination message was sent; or in nonblocking
> >> > mode, this may only indicate that the termination message was successfully
> >> > queued.  (In nonblocking mode, to be certain that the data has been sent,
> >> > you should next wait for write-ready and call <function>PQflush</>,
> >> > repeating until it returns zero.)  Zero indicates that the function could
> >> > not queue the termination message because of full buffers; this will only
> >> > happen in nonblocking mode.  (In this case, wait for write-ready and try
> >> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> >> > can use <function>PQerrorMessage</function> to retrieve details.
> >>
> >> That looks pretty good.   However, I'm realizing this isn't the only
> >> place where we probably need to clarify the language.  Just to take
> >> one example near at hand, PQputCopyData may also return 1 when it's
> >> only queued the data; it seems to try even less hard than PQputCopyEnd
> >> to ensure that the data is actually sent.
> >
> > Uh, where are we on this?
> 
> I think someone needs to take Tom's proposed language and make it into
> a patch.  And figure out which other functions in the documentation
> need similar updates.

OK, did David G Johnston email comments from today help here?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, Sep 4, 2014 at 12:53 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Sep  4, 2014 at 12:52:14PM -0400, Robert Haas wrote:
>> On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
>> >> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> > Perhaps the text should be like this:
>> >> >
>> >> > The result is 1 if the termination message was sent; or in nonblocking
>> >> > mode, this may only indicate that the termination message was successfully
>> >> > queued.  (In nonblocking mode, to be certain that the data has been sent,
>> >> > you should next wait for write-ready and call <function>PQflush</>,
>> >> > repeating until it returns zero.)  Zero indicates that the function could
>> >> > not queue the termination message because of full buffers; this will only
>> >> > happen in nonblocking mode.  (In this case, wait for write-ready and try
>> >> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
>> >> > can use <function>PQerrorMessage</function> to retrieve details.
>> >>
>> >> That looks pretty good.   However, I'm realizing this isn't the only
>> >> place where we probably need to clarify the language.  Just to take
>> >> one example near at hand, PQputCopyData may also return 1 when it's
>> >> only queued the data; it seems to try even less hard than PQputCopyEnd
>> >> to ensure that the data is actually sent.
>> >
>> > Uh, where are we on this?
>>
>> I think someone needs to take Tom's proposed language and make it into
>> a patch.  And figure out which other functions in the documentation
>> need similar updates.
>
> OK, did David G Johnston email comments from today help here?

I didn't look at them in detail, but they don't seem to match the
style of our documentation generally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
David G Johnston
Date:
On Thu, Sep 4, 2014 at 1:00 PM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:
On Thu, Sep 4, 2014 at 12:53 PM, Bruce Momjian <[hidden email]> wrote:

> On Thu, Sep  4, 2014 at 12:52:14PM -0400, Robert Haas wrote:
>> On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian <[hidden email]> wrote:
>> > On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
>> >> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <[hidden email]> wrote:
>> >> > Perhaps the text should be like this:
>> >> >
>> >> > The result is 1 if the termination message was sent; or in nonblocking
>> >> > mode, this may only indicate that the termination message was successfully
>> >> > queued.  (In nonblocking mode, to be certain that the data has been sent,
>> >> > you should next wait for write-ready and call <function>PQflush</>,
>> >> > repeating until it returns zero.)  Zero indicates that the function could
>> >> > not queue the termination message because of full buffers; this will only
>> >> > happen in nonblocking mode.  (In this case, wait for write-ready and try
>> >> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
>> >> > can use <function>PQerrorMessage</function> to retrieve details.
>> >>
>> >> That looks pretty good.   However, I'm realizing this isn't the only
>> >> place where we probably need to clarify the language.  Just to take
>> >> one example near at hand, PQputCopyData may also return 1 when it's
>> >> only queued the data; it seems to try even less hard than PQputCopyEnd
>> >> to ensure that the data is actually sent.
>> >
>> > Uh, where are we on this?
>>
>> I think someone needs to take Tom's proposed language and make it into
>> a patch.  And figure out which other functions in the documentation
>> need similar updates.
>
> OK, did David G Johnston email comments from today help here?
I didn't look at them in detail, but they don't seem to match the
style of our documentation generally.


​Specific observations would help though that is partly the idea - I've been more focused on clarity and organization even if it requires deviating from the current general documentation style.​
 
​If this is not acceptable I'm happy to incorporate the ideas of others to try and get the best of both worlds.

David J.



View this message in context: Re: PQputCopyEnd doesn't adhere to its API contract
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, Sep 4, 2014 at 1:18 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Specific observations would help though that is partly the idea - I've been
> more focused on clarity and organization even if it requires deviating from
> the current general documentation style.

OK.

-   to the network connection used by <application>libpq</application>.
+   to the connection used by <application>libpq</application>.

This change is unrelated to the topic at hand and seems not to be an
improvement.

+  <note>
+   <para>
+    Please review the function notes for specific interface protocols -
+    the following is a simplified overview.
+   </para>
+  </note>

This seems pointless.  Of course general documentation will be less
specific than documentation for specific functions.

+   First, the application issues the SQL   <command>COPY</command> command via <function>PQexec</function> or one
+   of the equivalent functions.  The response
+   will either be an error or a <structname>PGresult</> object bearing
+   a status code for <literal>PGRES_COPY_OUT</literal> or
+   <literal>PGRES_COPY_IN</literal> call implied by the specified copy
+   direction (TO or FROM respectively).

This implies that the response won't be a PGresult in the error case,
but of course the function has the same C result type either way.

+  <para>
+   Second, the application should use the functions in this
+   section to receive data rows or transmit buffer loads.  Buffer loads are
+   not guaranteed to be processed until the copy transfer is completed.
+  </para>

The main change here vs. the existing text is that you're now using
the phase "buffer loads" to refer to what gets transmitted, and "data
rows" to refer to what gets received.  The existing text uses the term
"data rows" for both, which seems correct to me.  My first reaction on
reading your revised text was "wait, what's a buffer load?".

+   Third, as lastly, when the data transfer is
+   complete the client must issue a PQgetResult to "commit" the copy transfer
+   and get the final <structname>PGresult</> object that indicates

I assume you mean "and lastly", since "as lastly" doesn't sound like
good English.

-   At this point further SQL commands can be issued via
-   <function>PQexec</function>.  (It is not possible to execute other SQL
-   commands using the same connection while the <command>COPY</command>
-   operation is in progress.)

Removing this text doesn't seem like a good idea.  It's a quite
helpful clarification.  The <note> you've added in its place doesn't
seem like a good substitute for it, and more generally, I think we
should avoid the overuse of constructs like <note>.  Emphasis needs to
be used minimally or it loses its meaning.

> If this is not acceptable I'm happy to incorporate the ideas of others to
> try and get the best of both worlds.

+   <para>
+    The return value of both these function can be one of [-1, 0, 1]
+    whose meaning depends on whether you are in blocking or non-blocking mode.
+   </para>

The use of braces to list a set of possible values is not standard in
mathematics generally, our documentation, or any other documentation I
have seen.

+   <para>
+    Non-Blocking Mode: A value of 1 means that the payload was
+    placed in the queue while a -1 means an immediate and permanent failure.
+    A return value of 0 means that the queue was full: you need to try again
+    at the next wait-ready.
+   </para>

We generally avoid emphasizing captions or headings in this way.  The
markup engine has no knowledge that "Non-Blocking Mode" is special; it
will format and style this just as if you had written "This is how it
works: a value of 1 means...".  That's probably not appropriate. Our
style is to write English prose using full sentences, e.g. "In
non-blocking mode, a value of 1 means...".

-       into buffer loads of any convenient size.  Buffer-load boundaries
+       into buffer-loads of any convenient size.  Buffer-load boundaries

Well, OK, here's a mention of buffer loads in the existing
documentation.  But, when you previously used the term, you didn't
hyphenate it, but here you are changing it to be hyphenated.  Note
that the fact that it's hyphenated the second time on this line is
because the two-word term is being used as an adjective modifying
boundaries, not because every use of those two words should be
hyphenated.  (Consider how odd the previous sentence would look if I
had written it this way: ...not because every use of those two-words
should be hyphenated.  Yet the earlier hyphenation of two-word looks
correct, at least to me, for the same reasons as in the
documentation.)

-       Ends the <literal>COPY_IN</> operation successfully if
-       <parameter>errormsg</> is <symbol>NULL</symbol>.  If
-       <parameter>errormsg</> is not <symbol>NULL</symbol> then the
-       <command>COPY</> is forced to fail, with the string pointed to by
-       <parameter>errormsg</> used as the error message.  (One should not
-       assume that this exact error message will come back from the server,
-       however, as the server might have already failed the
-       <command>COPY</> for its own reasons.  Also note that the option
-       to force failure does not work when using pre-3.0-protocol
-       connections.)
+       If <parameter>errormsg</> is <symbol>NULL</symbol> this ends
+       the <literal>COPY_IN</> operation successfully; otherwise the
+       <command>COPY</> is forced to fail with the string pointed to by
+       <parameter>errormsg</> used as the error message - though
+       this exact error message may not come back
+       as server might have already failed the
+       <command>COPY</> for its own reasons.

I agree that the current wording isn't great, in that you've kind of
got to parse through those first two sentences to understand what it's
really saying.  But I don't see your rewording as an imprvement.  The
current wording at least makes one thing clear right at the outset:
we're going to end a COPY_IN.  If I were going to try to improve this
I'd probably say something like "Ends the COPY_IN operation.
Normally, errormsg will be NULL, in which case the COPY will succeed
if no error has occurred on the server side.  However, errormsg can
also be non-NULL if the client wishes to force the server to fail the
copy with an ERROR." ...and I'd keep the rest of the text as it is.

+       Note: The option to force failure does not work
+       when using pre-3.0-protocol connections.

If you want a note, you have to use <note>.  But as above, it's better
that this *isn't* a note but just an English sentence, to avoid
overuse of emphasis.

I won't go through all of the remaining hunks as the issues are
basically similar.  Most of these changes are unrelated to the actual
problem that I was complaining about here.  If you want to improve the
wording of the documentation generally, that should be a separate
patch, not tied to whatever we're doing about this specific issue;
whatever we do about this specific issue should be a minimal patch
that does only that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
David Johnston
Date:
On Thu, Sep 4, 2014 at 5:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 4, 2014 at 1:18 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Specific observations would help though that is partly the idea - I've been
> more focused on clarity and organization even if it requires deviating from
> the current general documentation style.

OK.

-   to the network connection used by <application>libpq</application>.
+   to the connection used by <application>libpq</application>.

This change is unrelated to the topic at hand and seems not to be an
improvement.

​Fair point - though I did question the necessity of "network" in the accompanying e-mail.

​The implied suggestion is that if I do find any other areas that look like they need fixing - even in the same file - I should separate them out into a separate patch.  Though I have seen various "while I was in there I also fixed such-and-such" commits previously so the line is at least somewhat fluid.​


+  <note>
+   <para>
+    Please review the function notes for specific interface protocols -
+    the following is a simplified overview.
+   </para>
+  </note>

This seems pointless.  Of course general documentation will be less
specific than documentation for specific functions.

​The existing wording was being verbose in order to be correct.  In a summary like this I'd trade being reasonably accurate and general for the precision that is included elsewhere.
 

+   First, the application issues the SQL
    <command>COPY</command> command via <function>PQexec</function> or one
+   of the equivalent functions.  The response
+   will either be an error or a <structname>PGresult</> object bearing
+   a status code for <literal>PGRES_COPY_OUT</literal> or
+   <literal>PGRES_COPY_IN</literal> call implied by the specified copy
+   direction (TO or FROM respectively).

This implies that the response won't be a PGresult in the error case,
but of course the function has the same C result type either way.


​One of the trade-offs I mentioned...its more style than anything but removing the parenthetical (if there is not error in the command) and writing it more directly seemed preferable in an overview such as this.

Better:  The function will either throw an error or return a PGresult object[...]​

+  <para>
+   Second, the application should use the functions in this
+   section to receive data rows or transmit buffer loads.  Buffer loads are
+   not guaranteed to be processed until the copy transfer is completed.
+  </para>

The main change here vs. the existing text is that you're now using
the phase "buffer loads" to refer to what gets transmitted, and "data
rows" to refer to what gets received.  The existing text uses the term
"data rows" for both, which seems correct to me.  My first reaction on
reading your revised text was "wait, what's a buffer load?".

​So, my generalization policy working in reverse - since the transmit side does not have to be in complete rows implying that they are here is (albeit acceptably) inaccurate.​

 

+   Third, as lastly, when the data transfer is
+   complete the client must issue a PQgetResult to "commit" the copy transfer
+   and get the final <structname>PGresult</> object that indicates

I assume you mean "and lastly", since "as lastly" doesn't sound like
good English.

​Yep.

 

-   At this point further SQL commands can be issued via
-   <function>PQexec</function>.  (It is not possible to execute other SQL
-   commands using the same connection while the <command>COPY</command>
-   operation is in progress.)

Removing this text doesn't seem like a good idea.  It's a quite
helpful clarification.  The <note> you've added in its place doesn't
seem like a good substitute for it, and more generally, I think we
should avoid the overuse of constructs like <note>.  Emphasis needs to
be used minimally or it loses its meaning.

​Was trying to remove repetition here - happy to consider alternative way of doing so if the note is objectionable.​

 
> If this is not acceptable I'm happy to incorporate the ideas of others to
> try and get the best of both worlds.

+   <para>
+    The return value of both these function can be one of [-1, 0, 1]
+    whose meaning depends on whether you are in blocking or non-blocking mode.
+   </para>

The use of braces to list a set of possible values is not standard in
mathematics generally, our documentation, or any other documentation I
have seen.

Agreed

 

+   <para>
+    Non-Blocking Mode: A value of 1 means that the payload was
+    placed in the queue while a -1 means an immediate and permanent failure.
+    A return value of 0 means that the queue was full: you need to try again
+    at the next wait-ready.
+   </para>

We generally avoid emphasizing captions or headings in this way.  The
markup engine has no knowledge that "Non-Blocking Mode" is special; it
will format and style this just as if you had written "This is how it
works: a value of 1 means...".  That's probably not appropriate. Our
style is to write English prose using full sentences, e.g. "In
non-blocking mode, a value of 1 means...".

​Was a little lazy on the first pass; was planing on emphasizing it - possibly as list items under the preceding paragraph since that is where the terms are used and this further explains the difference between the two.​

Prose and "headers" are not mutually exclusive - though as noted I didn't get all of the proper syntax laid in on this pass to make that clear.

Its great to be able to read the documentation like a book but it also wants to be useful for scanning and a total lack of emphasis makes that more difficult.  I'm trying to see if there is middle ground to be had.


-       into buffer loads of any convenient size.  Buffer-load boundaries
+       into buffer-loads of any convenient size.  Buffer-load boundaries

Well, OK, here's a mention of buffer loads in the existing
documentation.  But, when you previously used the term, you didn't
hyphenate it, but here you are changing it to be hyphenated.  Note
that the fact that it's hyphenated the second time on this line is
because the two-word term is being used as an adjective modifying
boundaries, not because every use of those two words should be
hyphenated.  (Consider how odd the previous sentence would look if I
had written it this way: ...not because every use of those two-words
should be hyphenated.  Yet the earlier hyphenation of two-word looks
correct, at least to me, for the same reasons as in the
documentation.)

I'll defer on this one - the logic is sound and my sense of when to hyphenate is not particularly strong.


-       Ends the <literal>COPY_IN</> operation successfully if
-       <parameter>errormsg</> is <symbol>NULL</symbol>.  If
-       <parameter>errormsg</> is not <symbol>NULL</symbol> then the
-       <command>COPY</> is forced to fail, with the string pointed to by
-       <parameter>errormsg</> used as the error message.  (One should not
-       assume that this exact error message will come back from the server,
-       however, as the server might have already failed the
-       <command>COPY</> for its own reasons.  Also note that the option
-       to force failure does not work when using pre-3.0-protocol
-       connections.)
+       If <parameter>errormsg</> is <symbol>NULL</symbol> this ends
+       the <literal>COPY_IN</> operation successfully; otherwise the
+       <command>COPY</> is forced to fail with the string pointed to by
+       <parameter>errormsg</> used as the error message - though
+       this exact error message may not come back
+       as server might have already failed the
+       <command>COPY</> for its own reasons.

I agree that the current wording isn't great, in that you've kind of
got to parse through those first two sentences to understand what it's
really saying.  But I don't see your rewording as an imprvement.  The
current wording at least makes one thing clear right at the outset:
we're going to end a COPY_IN.  If I were going to try to improve this
I'd probably say something like "Ends the COPY_IN operation.
Normally, errormsg will be NULL, in which case the COPY will succeed
if no error has occurred on the server side.  However, errormsg can
also be non-NULL if the client wishes to force the server to fail the
copy with an ERROR." ...and I'd keep the rest of the text as it is.


​I like.​

 
+       Note: The option to force failure does not work
+       when using pre-3.0-protocol connections.

If you want a note, you have to use <note>.  But as above, it's better
that this *isn't* a note but just an English sentence, to avoid
overuse of emphasis.

​Mostly I didn't like having half the paragraph is a parenthetical...agree my implementation needs to be reworked.​


I won't go through all of the remaining hunks as the issues are
basically similar.  Most of these changes are unrelated to the actual
problem that I was complaining about here.  If you want to improve the
wording of the documentation generally, that should be a separate
patch, not tied to whatever we're doing about this specific issue;
whatever we do about this specific issue should be a minimal patch
that does only that.

​Will work on forcing myself to submit a surgical patch in response the triggering complaint and then, if desired, a more comprehensive patch.

I appreciate the time you have taken on this and will look at my thoughts with the new understanding you have given me.

Thank You!

David J.

 

Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Thu, Sep 4, 2014 at 6:38 PM, David Johnston
<david.g.johnston@gmail.com> wrote:
> The implied suggestion is that if I do find any other areas that look like
> they need fixing - even in the same file - I should separate them out into a
> separate patch.

Yes.

> Though I have seen various "while I was in there I also
> fixed such-and-such" commits previously so the line is at least somewhat
> fluid.

Yep, it's a judgement call.  As a general rule though, I think there's
often a pretty clear connection between the main topic of the commit
and the changes folded in.  Another way to think about this is that,
at least for non-committers (and frequently also for committers), any
patch someone writes is going to need an upvote from *at least* one
other person in order to go in.  If somebody can look at the patch and
easily determine that it solves some problem and doesn't make anything
worse, then they're likely to like it (and if they're a committer,
maybe commit it).  But if they see changes they like mixed in with
changes they don't like, then they're likely to either bounce it back,
or just say, hmm, this looks like it will take some time to deal with,
let me put that on my TODO list.  And of course sometimes it never
makes it off the TODO list.

Now, it's a fair point that this makes it hard to get large-scale
changes done.  But, to some extent, that's a good thing.  Prudence,
indeed, will dictate that source code or documentation long
established should not be changed for light or transient causes.  For
the rest, if you feel a large scale change is really needed, it's best
to start with a proposal: "I think we need to rehash the documentation
in section X because of reason Y."  You've made a few proposals to
rehash sections of the documentation on pgsql-hackers, but I haven't
actually seen clear and compelling justification for those reworkings.
Clearly, you like it better the new way, but the person who did it the
existing way likely prefers their version, and they've been around
longer than you.  :-)  By stating your objectives up-front, you can
see whether people agree with those objectives or not.  If they do,
you can hope to find the final patch criticized only on fine details
which are easily remedied; but if they don't, then some rethinking may
be needed.

>> This seems pointless.  Of course general documentation will be less
>> specific than documentation for specific functions.
>
> The existing wording was being verbose in order to be correct.  In a summary
> like this I'd trade being reasonably accurate and general for the precision
> that is included elsewhere.

This is an example of a goal where you might solicit people's general
thoughts before starting.  Maybe people will agree that removing
details in a certain place is useful, or maybe they won't, but it
needs discussion.

> One of the trade-offs I mentioned...its more style than anything but
> removing the parenthetical (if there is not error in the command) and
> writing it more directly seemed preferable in an overview such as this.
>
> Better:  The function will either throw an error or return a PGresult
> object[...]

Nope.  This is not C++, nor is it the backend.  It will not throw anything.

>> +  <para>
>> +   Second, the application should use the functions in this
>> +   section to receive data rows or transmit buffer loads.  Buffer loads
>> are
>> +   not guaranteed to be processed until the copy transfer is completed.
>> +  </para>
>>
>> The main change here vs. the existing text is that you're now using
>> the phase "buffer loads" to refer to what gets transmitted, and "data
>> rows" to refer to what gets received.  The existing text uses the term
>> "data rows" for both, which seems correct to me.  My first reaction on
>> reading your revised text was "wait, what's a buffer load?".
>
> So, my generalization policy working in reverse - since the transmit side
> does not have to be in complete rows implying that they are here is (albeit
> acceptably) inaccurate.

The existing wording doesn't say that each call to one of the
functions in question must contain only whole data rows of itself.  It
merely says that these are the functions you need to use to send data
rows, which is true.

>> -   At this point further SQL commands can be issued via
>> -   <function>PQexec</function>.  (It is not possible to execute other SQL
>> -   commands using the same connection while the <command>COPY</command>
>> -   operation is in progress.)
>>
>> Removing this text doesn't seem like a good idea.  It's a quite
>> helpful clarification.  The <note> you've added in its place doesn't
>> seem like a good substitute for it, and more generally, I think we
>> should avoid the overuse of constructs like <note>.  Emphasis needs to
>> be used minimally or it loses its meaning.
>
> Was trying to remove repetition here - happy to consider alternative way of
> doing so if the note is objectionable.

I guess it doesn't seem repetitive to me.

> Its great to be able to read the documentation like a book but it also wants
> to be useful for scanning and a total lack of emphasis makes that more
> difficult.  I'm trying to see if there is middle ground to be had.

Maybe, but if so, that's a general issue that should be discussed
aside from this patch.  You can't create a new policy without
discussion and start trying to enforce it on the documentation one bit
at a time.  That will drive everyone crazy.

> I appreciate the time you have taken on this and will look at my thoughts
> with the new understanding you have given me.
>
> Thank You!

Thanks for getting involved.  I apologize if I was brusque here or at
other times in the past (or future).  Unfortunately I've been insanely
busy and it doesn't bring out the best in me, but I really do value
your (and everyone's efforts) to move PostgreSQL forward.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
David G Johnston
Date:
On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:
On Thu, Sep 4, 2014 at 6:38 PM, David Johnston
<[hidden email]> wrote:

> One of the trade-offs I mentioned...its more style than anything but
> removing the parenthetical (if there is not error in the command) and
> writing it more directly seemed preferable in an overview such as this.
>
> Better:  The function will either throw an error or return a PGresult
> object[...]

Nope.  This is not C++, nor is it the backend.  It will not throw anything.


​The current sentence reads:
"​The response to this (if there is no error in the command) will be a PGresult object bearing a status code of PGRES_COPY_OUT or PGRES_COPY_IN (depending on the specified copy direction)."

Maybe this is taken for granted by others, and thus can be excluded here, but I'm trying to specify what happens if there is an error in the command...  Apparently one does not get back a PGresult object...

Would simply saying: "A successful response to this will be a PGresult object..." be accurate (enough)?


> I appreciate the time you have taken on this and will look at my thoughts
> with the new understanding you have given me.
>
> Thank You!

Thanks for getting involved.  I apologize if I was brusque here or at
other times in the past (or future).  Unfortunately I've been insanely
busy and it doesn't bring out the best in me, but I really do value
your (and everyone's efforts) to move PostgreSQL forward.


​The tone of all your responses, including the first one pointing out my lack of supporting context and initially over-ambitious patch, have been very professional.  A lot of what you've said resonates since I think subconsciously I understood that I needed to make fundamental changes to my approach and style but it definitely helps to have someone point out specific items and concerns to move those thoughts to the conscious part of the mind.  Thank you again for doing that for me.

​Ignoring style and such did anything I wrote help to clarify your understanding of why pgPutCopyEnd does what it does?  As I say this and start to try and read the C code I think that it is a good source for understanding novice assumptions but there is a gap between the docs and the code - though I'm not sure you've identified the only (or even proper) location.



Unlike PQputCopyData there is no explicit logic (pqIsnonblocking(conn)" in PQputCopyEnd for non-blocking mode (there is an implicit one via pqFlush).  This does seem like an oversight - if a minor one since the likihood of not being able to add the EOF to the connection's buffer seems highly unlikely - but I would expect on the basis of symmetry alone - for both of them to have buffer filled testing logic.  And, depending on how large *errormsg is, the risk of being unable to place the data in the buffer isn't as small and the expected EOF case.

I'm getting way beyond my knowledge level here but my assumption from the documentation was that the async mode behavior of returning zero revolved around retrying in order to place the data into the buffer - not retrying to make sure the buffer is empty.  The caller deals with that on their own based upon their blocking mode decision.  Though we would want to call pqFlush for blocking mode callers here since this function should block until the buffer is clear.

​So, I thought I agreed with your suggestion that if the final pqFlush returns a 1 that pqPutCopyEnd should return 0.  Additionally, if in non-blocking mode, and especially if *errormsg is not NULL, the available connection buffer length logic in pqPutCopyData should be evaluated here as well.

However, the 0 being returned due to the pqFlush really isn't anything special for a non-blocking mode user (they have already been told to call pqFlush themselves after calling pqPutCopyEnd) and doesn't apply for blocking mode.  Admittedly they could skip their call if they get a return value of 1 from pqPutCopyEnd but I'm not sure recommending that optimization has much going for it.  And, again as you said (I am just discovering this for myself as much as possible), if the pqFlush caused the 0 you would not want to retry whereas in the filled buffer version you would.  So we do have to pick one or the other situation and adjust the documentation accordingly.

The most useful and compatible solution is to make pqPutCopyEnd synchronous regardless of the user selected blocking mode - which it mostly is now but the final pqFlush should be in a loop - and adjust the documentation in the areas noted in my patch-email to accommodate that fact.

Regardless, the logic surrounding placing the *errormsg string into the buffer needs affirmation and a note whether it will block waiting to be put on a full connection buffer.

Note that non-blocking users seeing a 1 on the pqPutCopyEnd probably still end up doing their own pqFlush looping calls but that is not something that we can be certain of.  What happens if the buffer isn't empty and the non-blocking caller invokes pqGetResult(...) on the connection because we returned a 1 from pqPutCopyEnd?  Two situations, the buffer would be expected to eventually clear without error and the buffer would eventually clear with an error (i.e., the pqPutCopyEnd should have returned -1)?  Does it even matter what the end result of the copy should have been?

David J.






View this message in context: Re: PQputCopyEnd doesn't adhere to its API contract
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: PQputCopyEnd doesn't adhere to its API contract

From
David G Johnston
Date:
On Mon, Sep 8, 2014 at 6:19 PM, David Johnston <[hidden email]> wrote:
On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:
On Thu, Sep 4, 2014 at 6:38 PM, David Johnston
<[hidden email]> wrote:

> One of the trade-offs I mentioned...its more style than anything but
> removing the parenthetical (if there is not error in the command) and
> writing it more directly seemed preferable in an overview such as this.
>
> Better:  The function will either throw an error or return a PGresult
> object[...]

Nope.  This is not C++, nor is it the backend.  It will not throw anything.


​The current sentence reads:
"​The response to this (if there is no error in the command) will be a PGresult object bearing a status code of PGRES_COPY_OUT or PGRES_COPY_IN (depending on the specified copy direction)."

Maybe this is taken for granted by others, and thus can be excluded here, but I'm trying to specify what happens if there is an error in the command...  Apparently one does not get back a PGresult object...

Would simply saying: "A successful response to this will be a PGresult object..." be accurate (enough)?


​Apparently, NULL is the correct answer.  Cannot that just be assumed to be the case or does saying that a failure scenario here returns NULL (or something other than pqResult object) impart useful knowledge?
Dave



View this message in context: Re: PQputCopyEnd doesn't adhere to its API contract
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: PQputCopyEnd doesn't adhere to its API contract

From
Robert Haas
Date:
On Mon, Sep 8, 2014 at 6:20 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] <[hidden
> email]> wrote:
>>
>> On Thu, Sep 4, 2014 at 6:38 PM, David Johnston
>> <[hidden email]> wrote:
>>
>> > One of the trade-offs I mentioned...its more style than anything but
>> > removing the parenthetical (if there is not error in the command) and
>> > writing it more directly seemed preferable in an overview such as this.
>> >
>> > Better:  The function will either throw an error or return a PGresult
>> > object[...]
>>
>> Nope.  This is not C++, nor is it the backend.  It will not throw
>> anything.
>>
> The current sentence reads:
> "The response to this (if there is no error in the command) will be a
> PGresult object bearing a status code of PGRES_COPY_OUT or PGRES_COPY_IN
> (depending on the specified copy direction)."
>
> Maybe this is taken for granted by others, and thus can be excluded here,
> but I'm trying to specify what happens if there is an error in the
> command...  Apparently one does not get back a PGresult object...

Well, there's the point of confusion, because the function ALWAYS
returns a PGresult, except maybe in some obscure corner case where it
can't allocate one and therefore returns NULL.  What differs is the
result status.

> Ignoring style and such did anything I wrote help to clarify your
> understanding of why pgPutCopyEnd does what it does?  As I say this and
> start to try and read the C code I think that it is a good source for
> understanding novice assumptions but there is a gap between the docs and the
> code - though I'm not sure you've identified the only (or even proper)
> location.

Honestly, not really.  I thought the language I previously discussed
with Tom was adequate for that; and I'm a little confused as to why
we've gotten on what seems to be somewhat of a digression into nearby
but otherwise-unrelated documentation issues.

> Unlike PQputCopyData there is no explicit logic (pqIsnonblocking(conn)" in
> PQputCopyEnd for non-blocking mode (there is an implicit one via pqFlush).
> This does seem like an oversight - if a minor one since the likihood of not
> being able to add the EOF to the connection's buffer seems highly unlikely -
> but I would expect on the basis of symmetry alone - for both of them to have
> buffer filled testing logic.  And, depending on how large *errormsg is, the
> risk of being unable to place the data in the buffer isn't as small and the
> expected EOF case.

Yeah, I think that's a bug in PQputCopyEnd().  It should have the same
kind of pqCheckOutBufferSpace() check that PQputCopyData() does.

> I'm getting way beyond my knowledge level here but my assumption from the
> documentation was that the async mode behavior of returning zero revolved
> around retrying in order to place the data into the buffer - not retrying to
> make sure the buffer is empty.  The caller deals with that on their own
> based upon their blocking mode decision.  Though we would want to call
> pqFlush for blocking mode callers here since this function should block
> until the buffer is clear.

PQputCopyEnd() already does flush; and PQputCopyData() shouldn't flush.

> So, I thought I agreed with your suggestion that if the final pqFlush
> returns a 1 that pqPutCopyEnd should return 0.

Well, Tom set me straight on that; so I don't think we're considering
any such change.  I think you need to make sure you understand the
previous discussion in detail before proposing how to adjust the
documentation (or the code).

> Additionally, if in
> non-blocking mode, and especially if *errormsg is not NULL, the available
> connection buffer length logic in pqPutCopyData should be evaluated here as
> well.

Yes.  See the comment three up from this one.

> The most useful and compatible solution is to make pqPutCopyEnd synchronous
> regardless of the user selected blocking mode - which it mostly is now but
> the final pqFlush should be in a loop - and adjust the documentation in the
> areas noted in my patch-email to accommodate that fact.

Uh, no.  If you do that, you'll break the code that I spent so much
time writing, which led to my original post.  I wouldn't be using
non-blocking mode if it were OK for it to block.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PQputCopyEnd doesn't adhere to its API contract

From
David Johnston
Date:
On Tue, Sep 9, 2014 at 12:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 8, 2014 at 6:20 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> On Mon, Sep 8, 2014 at 11:45 AM, Robert Haas [via PostgreSQL] <[hidden
> email]> wrote:
>>
>> On Thu, Sep 4, 2014 at 6:38 PM, David Johnston
>> <[hidden email]> wrote:
>>


> Ignoring style and such did anything I wrote help to clarify your
> understanding of why pgPutCopyEnd does what it does?  As I say this and
> start to try and read the C code I think that it is a good source for
> understanding novice assumptions but there is a gap between the docs and the
> code - though I'm not sure you've identified the only (or even proper)
> location.

Honestly, not really.  I thought the language I previously discussed
with Tom was adequate for that; and I'm a little confused as to why
we've gotten on what seems to be somewhat of a digression into nearby
but otherwise-unrelated documentation issues.

​My theory here is that if you discover a single point of confusion that cannot by attributed to a typo (or something basic like that) then why not go looking around and see if there are any other issues in nearby code/documentation.  Furthermore, not being the writer of said code/documentation, ​a fresh perspective of the entire body of work - and not just the few lines you quoted - has value.  For me, if nothing else I took it as a chance to learn and evaluate the status quo.  And as noted below there are a couple of items to address in the nearby code even if the documentation itself is accurate - I just took a very indirect route to discover them.

Much of the current documentation is copy-and-pasted directly from the source code - with a few comments tacked on for clarity.  I'm sure some layout and style concerns were present during the copy-and-paste but the user-facing documentation does not, and probably should not ideally, directly emulate the source code comments.  Admittedly, in the libpq section this can be considerably relaxed since the target user is likely a C programmer.
 

> Unlike PQputCopyData there is no explicit logic (pqIsnonblocking(conn)" in
> PQputCopyEnd for non-blocking mode (there is an implicit one via pqFlush).
> This does seem like an oversight - if a minor one since the likihood of not
> being able to add the EOF to the connection's buffer seems highly unlikely -
> but I would expect on the basis of symmetry alone - for both of them to have
> buffer filled testing logic.  And, depending on how large *errormsg is, the
> risk of being unable to place the data in the buffer isn't as small and the
> expected EOF case.

Yeah, I think that's a bug in PQputCopyEnd().  It should have the same
kind of pqCheckOutBufferSpace() check that PQputCopyData() does.

​Anybody disagree here?​

 

> I'm getting way beyond my knowledge level here but my assumption from the
> documentation was that the async mode behavior of returning zero revolved
> around retrying in order to place the data into the buffer - not retrying to
> make sure the buffer is empty.  The caller deals with that on their own
> based upon their blocking mode decision.  Though we would want to call
> pqFlush for blocking mode callers here since this function should block
> until the buffer is clear.

PQputCopyEnd() already does flush; and PQputCopyData() shouldn't flush.

​My point being pgFlush is sensitive to whether the caller is in blocking mode.

Also, pgPutCopyData DOES FLUSH if the buffer is full...then attempts to resize the buffer...then returns 0 (or -1 in blocking mode).  How absolute is Tom's

"Well, we certainly do NOT want a flush in PQputCopyData."

?


> So, I thought I agreed with your suggestion that if the final pqFlush
> returns a 1 that pqPutCopyEnd should return 0.

Well, Tom set me straight on that; so I don't think we're considering
any such change.  I think you need to make sure you understand the
previous discussion in detail before proposing how to adjust the
documentation (or the code). 

​That was largely what this exercise was about for me...having gone through all this and now re-reading Tom's wording I do understand and agree.

For surround documentation ​I think we are good since pqPutCopyData already tests the buffer and correctly returns 0 in non-blocking mode.  The issue with pqPutCopyEnd seems to be a copy/paste error and an incorrect assumption regarding non-blocking mode.


This doesn't address the usage bug we've been propagating where someone very well might use non-blocking mode and, upon seeing a return value of 1 from pqPutCopyEnd, immediately call pqGetResult even if the buffer is not completed flushed.  Our proposed documentation directs people to always pgFlush poll after calling pqPutCopyEnd while the current documentation is not so strict.  Is there anything to say/do about that fact or - more importantly - is there any real risk here?

​David J.

Re: PQputCopyEnd doesn't adhere to its API contract

From
Bruce Momjian
Date:
On Thu, Sep  4, 2014 at 12:52:14PM -0400, Robert Haas wrote:
> On Wed, Sep 3, 2014 at 6:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, May  9, 2014 at 12:03:36PM -0400, Robert Haas wrote:
> >> On Thu, May 8, 2014 at 5:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Perhaps the text should be like this:
> >> >
> >> > The result is 1 if the termination message was sent; or in nonblocking
> >> > mode, this may only indicate that the termination message was successfully
> >> > queued.  (In nonblocking mode, to be certain that the data has been sent,
> >> > you should next wait for write-ready and call <function>PQflush</>,
> >> > repeating until it returns zero.)  Zero indicates that the function could
> >> > not queue the termination message because of full buffers; this will only
> >> > happen in nonblocking mode.  (In this case, wait for write-ready and try
> >> > the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> >> > can use <function>PQerrorMessage</function> to retrieve details.
> >>
> >> That looks pretty good.   However, I'm realizing this isn't the only
> >> place where we probably need to clarify the language.  Just to take
> >> one example near at hand, PQputCopyData may also return 1 when it's
> >> only queued the data; it seems to try even less hard than PQputCopyEnd
> >> to ensure that the data is actually sent.
> >
> > Uh, where are we on this?
>
> I think someone needs to take Tom's proposed language and make it into
> a patch.  And figure out which other functions in the documentation
> need similar updates.

I have developed such a patch --- attached.  I didn't see any other
mentions of blocking in the docs.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: PQputCopyEnd doesn't adhere to its API contract

From
Bruce Momjian
Date:
On Fri, Jan  9, 2015 at 03:04:19PM -0500, Bruce Momjian wrote:
> > > Uh, where are we on this?
> > 
> > I think someone needs to take Tom's proposed language and make it into
> > a patch.  And figure out which other functions in the documentation
> > need similar updates.
> 
> I have developed such a patch --- attached.  I didn't see any other
> mentions of blocking in the docs.

Patch applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +