Re: pg_dump / copy bugs with "big lines" ? - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: pg_dump / copy bugs with "big lines" ?
Date
Msg-id 9cfb98ae-e4a4-4654-bf1a-d37e7d27c075@manitou-mail.org
Whole thread Raw
In response to Re: pg_dump / copy bugs with "big lines" ?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pg_dump / copy bugs with "big lines" ?
Re: pg_dump / copy bugs with "big lines" ?
List pgsql-hackers
    Alvaro Herrera wrote:

> But I realized that doing it this way is simple enough;
> and furthermore, in any platforms where int is 8 bytes (ILP64), this
> would automatically allow for 63-bit-sized stringinfos

On such platforms there's the next problem that we can't
send COPY rows through the wire protocol when they're larger
than 2GB.

Based on the tests with previous iterations of the patch that used
int64 for the StringInfo length, the COPY backend code does not
gracefully fail in that case.

What happened when trying (linux x86_64) with a 2GB-4GB row
is that the size seems to overflow and is sent as a 32-bit integer
with the MSB set, which is confusing for the client side (at least
libpq cannot deal with it).

If we consider what would happen with the latest patch on a platform
with sizeof(int)=8 and a \copy invocation like this:

\copy (select big,big,big,big,big,big,big,big,...... FROM   (select lpad('', 1024*1024*200) as big) s) TO /dev/null

if we put enough copies of "big" in the select-list to grow over 2GB,
and then over 4GB.

On a platform with sizeof(int)=4 this should normally fail over 2GB with
"Cannot enlarge string buffer containing $X bytes by $Y more bytes"

I don't have an ILP64 environment myself to test, but I suspect
it would malfunction instead of cleanly erroring out on such
platforms.

One advantage of hardcoding the StringInfo limit to 2GB independantly
of sizeof(int) is to basically avoid the problem.

Also, without this limit, we can "COPY FROM/TO file" really huge rows, 4GB
and beyond, like I tried successfully during the tests mentioned upthread
(again with len as int64 on x86_64).
So such COPYs would succeed or fail depending on whether they deal with
a file or a network connection.
Do we want this difference in behavior?
(keeping in mind that they will both fail anyway if any individual field
in the row is larger than 1GB)


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Next
From: Simon Riggs
Date:
Subject: Re: Proposal for changes to recovery.conf API