Home > mailing lists

Re: pg_dump / copy bugs with "big lines" ? - Mailing list pgsql-hackers

From	Jim Nasby
Subject	Re: pg_dump / copy bugs with "big lines" ?
Date	April 8, 2015 08:07:05
Msg-id	5524B762.5060407@BlueTreble.com Whole thread Raw
In response to	Re: pg_dump / copy bugs with "big lines" ? (Michael Paquier <michael.paquier@gmail.com>)
List	pgsql-hackers

Tree view

On 4/7/15 10:29 PM, Michael Paquier wrote:
> On Wed, Apr 8, 2015 at 11:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>> In any case, I don't think it would be terribly difficult to allow a bit
>>> more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
>>> some 1GB limits there too.
>>
>> The point is, those limits are there on purpose.  Changing things
>> arbitrarily wouldn't be hard, but doing it in a principled way is
>> likely to require some thought.  For example, in the COPY OUT case,
>> presumably what's happening is that we palloc a chunk for each
>> individual datum, and then palloc a buffer for the whole row.  Now, we
>> could let the whole-row buffer be bigger, but maybe it would be better
>> not to copy all of the (possibly very large) values for the individual
>> columns over into a row buffer before sending it.  Some refactoring
>> that avoids the need for a potentially massive (1.6TB?) whole-row
>> buffer would be better than just deciding to allow it.
>
> I think that something to be aware of is that this is as well going to
> require some rethinking of the existing libpq functions that are here
> to fetch a row during COPY with PQgetCopyData, to make them able to
> fetch chunks of data from one row.

The discussion about upping the StringInfo limit was for cases where a 
change in encoding blows up because it's now larger. My impression was 
that these cases don't expand by a lot, so we wouldn't be significantly 
expanding StringInfo.

I agree that buffering 1.6TB of data would be patently absurd. Handling 
the case of COPYing a row that's >1GB clearly needs work than just 
bumping up some size limits. That's why I was wondering whether this was 
a real scenario or just hypothetical... I'd be surprised if someone 
would be happy with the performance of 1GB tuples, let alone even larger 
than that.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

pgsql-hackers by date:

From: Michael Paquier
Date: 08 April 2015, 07:59:54
Subject: Re: Replication identifiers, take 4

From: Jim Nasby
Date: 08 April 2015, 08:09:34
Subject: Re: Re: File count restriction of directory limits number of relations inside a database.

Re: pg_dump / copy bugs with "big lines" ? - Mailing list pgsql-hackers

Previous

Next