Re: pg_dump / copy bugs with "big lines" ? - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: pg_dump / copy bugs with "big lines" ?
Date
Msg-id d3fe524a-1c78-4cb3-9814-849cd4f43fe6@mm
Whole thread Raw
In response to Re: pg_dump / copy bugs with "big lines" ?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pg_dump / copy bugs with "big lines" ?
List pgsql-hackers
    Alvaro Herrera wrote:

> >   tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
> >
> > which fails because (HEAPTUPLESIZE + len) is again considered
> > an invalid size, the  size being 1468006476 in my test.
>
> Um, it seems reasonable to make this one be a huge-zero-alloc:
>
>     MemoryContextAllocExtended(CurrentMemoryContext,
>                  HEAPTUPLESIZE + len,
>        MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO)

Good, this allows the tests to go to completion! The tests in question
are dump/reload of a row with several fields totalling 1.4GB (deflated),
with COPY TO/FROM file and psql's \copy in both directions, as well as
pg_dump followed by pg_restore|psql.

The modified patch is attached.

It provides a useful mitigation to dump/reload databases having
rows in the 1GB-2GB range, but it works under these limitations:

- no single field has a text representation exceeding 1GB.
- no row as text exceeds 2GB (\copy from fails beyond that. AFAICS we
  could push this to 4GB with limited changes to libpq, by
  interpreting the Int32 field in the CopyData message as unsigned).

It's also possible to go beyond 4GB per row with this patch, but
when not using the protocol. I've managed to get a 5.6GB single-row
file with COPY TO file. That doesn't help with pg_dump, but that might
be useful in other situations.

In StringInfo, I've changed int64 to Size, because on 32 bits platforms
the downcast from int64 to Size is problematic, and as the rest of the
allocation routines seems to favor Size, it seems more consistent
anyway.

I couldn't test on 32 bits though, as I seem to never have enough
free contiguous memory available on a 32 bits VM to handle
that kind of data.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Next
From: Robert Haas
Date:
Subject: Re: Rationalizing code-sharing among src/bin/ directories