Thread: libpq COPY handling

libpq COPY handling

From
Robert Haas
Date:
Noah Misch pointed out something interesting to me:

/*
 * PQputCopyEnd - send EOF indication to the backend during COPY IN
 *
 * After calling this, use PQgetResult() to check command completion status.
 *
 * Returns 1 if successful, 0 if data could not be sent (only possible
 * in nonblock mode), or -1 if an error occurs.
 */

The comment alleges that 0 is a possible return value, but the only
return statements in the code for that function return literal values
of either 1 or -1.  I'm not sure whether that's a bug in the code or
the documentation.

Also, I noticed that there are a few places in fe-protocol3.c that
seem not to know about COPY-BOTH mode.  I'm not sure that any of these
are actually bugs right now given the current very limited use of
COPY-BOTH mode, but I'm wondering whether it wouldn't be better to
minimize the chance of future surprises.  Patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: libpq COPY handling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Noah Misch pointed out something interesting to me:
> /*
>  * PQputCopyEnd - send EOF indication to the backend during COPY IN
>  *
>  * After calling this, use PQgetResult() to check command completion status.
>  *
>  * Returns 1 if successful, 0 if data could not be sent (only possible
>  * in nonblock mode), or -1 if an error occurs.
>  */

> The comment alleges that 0 is a possible return value, but the only
> return statements in the code for that function return literal values
> of either 1 or -1.  I'm not sure whether that's a bug in the code or
> the documentation.

Hm.  pqFlush has a three-way return value which PQputCopyEnd is only
checking as a boolean.  Probably the intent was to return "data not
sent" if pqFlush reports that.

However, the documentation in libpq.sgml is a bit bogus too, because it
counsels trying the PQputCopyEnd() call again, which will not work
(since we already changed the asyncStatus).  We could make that say "a
zero result is informational, you might want to try PQflush() later".
The trouble with this, though, is that any existing callers that were
coded to the old spec would now be broken.

It might be better to consider that the code is correct and fix the
documentation.  I notice that the other places in fe-exec.c that call
pqFlush() generally say "In nonblock mode, don't complain if we're unable
to send it all", which is pretty much the spirit of what this is doing
though it lacks that comment.

> Also, I noticed that there are a few places in fe-protocol3.c that
> seem not to know about COPY-BOTH mode.  I'm not sure that any of these
> are actually bugs right now given the current very limited use of
> COPY-BOTH mode, but I'm wondering whether it wouldn't be better to
> minimize the chance of future surprises.  Patch attached.

+1 for these changes, anyway.
        regards, tom lane



Re: libpq COPY handling

From
Robert Haas
Date:
On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, the documentation in libpq.sgml is a bit bogus too, because it
> counsels trying the PQputCopyEnd() call again, which will not work
> (since we already changed the asyncStatus).  We could make that say "a
> zero result is informational, you might want to try PQflush() later".
> The trouble with this, though, is that any existing callers that were
> coded to the old spec would now be broken.
>
> It might be better to consider that the code is correct and fix the
> documentation.  I notice that the other places in fe-exec.c that call
> pqFlush() generally say "In nonblock mode, don't complain if we're unable
> to send it all", which is pretty much the spirit of what this is doing
> though it lacks that comment.

Changing the meaning of a 0 return code seems like a bad idea.
However, not ever returning 0 isn't great either: someone could be
forgiven for writing code that calls PQputCopyData/End() until they
get a 0 result, then waits for the socket to become write-OK before
continuing.  They might be surprised to find that this results in
libpq's internal buffer expanding until malloc fails.

I'm gathering that what I should probably do is something like this.
Send some data using PQputCopyData() or an end-of-copy using
PQputCopyEnd().  Then, call PQflush().  If the latter returns 1, then
wait for the socket to become write-OK and try again.  Once it returns
0, send the next hunk of copy-data.  This is a bit of an exasperating
interface because when the socket buffer fills up, PQputCopyData()
will do a write() but we won't know what happened; we'll have to call
PQflush() and try a second write of the same data without an
intervening wait to find out that the buffer is full.  If this worked
as currently documented, that wouldn't be necessary.  But it seems
tolerable.

>> Also, I noticed that there are a few places in fe-protocol3.c that
>> seem not to know about COPY-BOTH mode.  I'm not sure that any of these
>> are actually bugs right now given the current very limited use of
>> COPY-BOTH mode, but I'm wondering whether it wouldn't be better to
>> minimize the chance of future surprises.  Patch attached.
>
> +1 for these changes, anyway.

OK, committed, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq COPY handling

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, the documentation in libpq.sgml is a bit bogus too, because it
>> counsels trying the PQputCopyEnd() call again, which will not work
>> (since we already changed the asyncStatus).  We could make that say "a
>> zero result is informational, you might want to try PQflush() later".
>> The trouble with this, though, is that any existing callers that were
>> coded to the old spec would now be broken.

> Changing the meaning of a 0 return code seems like a bad idea.
> However, not ever returning 0 isn't great either: someone could be
> forgiven for writing code that calls PQputCopyData/End() until they
> get a 0 result, then waits for the socket to become write-OK before
> continuing.

Anybody who tried that would have already discovered that it doesn't
work, so that argument seems pretty hollow.

What I'm suggesting is that we fix the documentation to match what
the code actually does, ie 1 and -1 are the only return codes, but
in nonblock mode it may not actually have flushed all the data.
I do not see how that can break any code that works now.

An alternative possibility is to document the zero return case as
"can't happen now, but defined for possible future expansion", which
I rather suspect was the thought process last time this was looked at.
The trouble with that is that, if we ever did try to actually return
zero, we'd more than likely break some code that should have been
checking for the case and wasn't.

Anyway, in view of the lack of complaints from the field, I think
changing the behavior of this code would be much more likely to cause
problems than fix them.
        regards, tom lane



Re: libpq COPY handling

From
Robert Haas
Date:
On Fri, Apr 26, 2013 at 10:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> However, the documentation in libpq.sgml is a bit bogus too, because it
>>> counsels trying the PQputCopyEnd() call again, which will not work
>>> (since we already changed the asyncStatus).  We could make that say "a
>>> zero result is informational, you might want to try PQflush() later".
>>> The trouble with this, though, is that any existing callers that were
>>> coded to the old spec would now be broken.
>
>> Changing the meaning of a 0 return code seems like a bad idea.
>> However, not ever returning 0 isn't great either: someone could be
>> forgiven for writing code that calls PQputCopyData/End() until they
>> get a 0 result, then waits for the socket to become write-OK before
>> continuing.
>
> Anybody who tried that would have already discovered that it doesn't
> work, so that argument seems pretty hollow.
>
> What I'm suggesting is that we fix the documentation to match what
> the code actually does, ie 1 and -1 are the only return codes, but
> in nonblock mode it may not actually have flushed all the data.
> I do not see how that can break any code that works now.
>
> An alternative possibility is to document the zero return case as
> "can't happen now, but defined for possible future expansion", which
> I rather suspect was the thought process last time this was looked at.
> The trouble with that is that, if we ever did try to actually return
> zero, we'd more than likely break some code that should have been
> checking for the case and wasn't.
>
> Anyway, in view of the lack of complaints from the field, I think
> changing the behavior of this code would be much more likely to cause
> problems than fix them.

Sure, I don't think we can change the behavior now; I just think it's
a shame that the documented behavior was never implemented.  But at
this point we don't have much choice but to live with it.

By the by, might it be time to start thinking about deprecating
protocol version 2, at least for COPY?  It seems that supporting it
requires nontrivial contortions of both the front-end and back-end
code, some of which seem possibly relevant to performance.  Obviously
we're not going to do anything to 9.3 at this point, but maybe for the
next release...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq COPY handling

From
Gavin Flower
Date:
<div class="moz-cite-prefix">On 27/04/13 02:48, Tom Lane wrote:<br /></div><blockquote
cite="mid:11420.1366987729@sss.pgh.pa.us"type="cite"><pre wrap="">Robert Haas <a class="moz-txt-link-rfc2396E"
href="mailto:robertmhaas@gmail.com"><robertmhaas@gmail.com></a>writes: 
</pre><blockquote type="cite"><pre wrap="">On Thu, Apr 25, 2013 at 1:56 PM, Tom Lane <a class="moz-txt-link-rfc2396E"
href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>wrote: 
</pre><blockquote type="cite"><pre wrap="">However, the documentation in libpq.sgml is a bit bogus too, because it
counsels trying the PQputCopyEnd() call again, which will not work
(since we already changed the asyncStatus).  We could make that say "a
zero result is informational, you might want to try PQflush() later".
The trouble with this, though, is that any existing callers that were
coded to the old spec would now be broken.
</pre></blockquote></blockquote><pre wrap="">
</pre><blockquote type="cite"><pre wrap="">Changing the meaning of a 0 return code seems like a bad idea.
However, not ever returning 0 isn't great either: someone could be
forgiven for writing code that calls PQputCopyData/End() until they
get a 0 result, then waits for the socket to become write-OK before
continuing.
</pre></blockquote><pre wrap="">
Anybody who tried that would have already discovered that it doesn't
work, so that argument seems pretty hollow.

What I'm suggesting is that we fix the documentation to match what
the code actually does, ie 1 and -1 are the only return codes, but
in nonblock mode it may not actually have flushed all the data.
I do not see how that can break any code that works now.

An alternative possibility is to document the zero return case as
"can't happen now, but defined for possible future expansion", which
I rather suspect was the thought process last time this was looked at.
The trouble with that is that, if we ever did try to actually return
zero, we'd more than likely break some code that should have been
checking for the case and wasn't.

Anyway, in view of the lack of complaints from the field, I think
changing the behavior of this code would be much more likely to cause
problems than fix them.
        regards, tom lane


</pre></blockquote><font size="-1">The original usage of returns codes was a<font size="-1">s</font> an offset for the
OperatingSystem to jump to in a list of addresses to execute, after program co<font size="-1">mp<font
size="-1">letion.</font></font></font> Zero offset was to the first address for normal completion, then 4 for the next
address... Addresses were stored in 4 bytes. Hence an historical tendency to define return codes in multiple of 4. 
Thisdates back to the Mainframe days, pre UNIX, even!<br /><br /> Personally I would prefer zero to indicate an error,
onthe basis that is the default value for memory, but historical usage makes that impractical!  So we are stuck with
zerobeing interpreted as the return code indicating 'Normal' completion - bash, and other Linux shells, are designed on
thisassumption.<br /><br /> I first came across this return code convention when I was programming Mainframes in the
early1970's.<br /><br /><br /> Cheers,<br /> Gavin<br />