PGDump / copy bugs with "big lines" ? - Mailing list pgsql-bugs

From Ronan Dunklau
Subject PGDump / copy bugs with "big lines" ?
Date
Msg-id 3859545.fB9yxixrjG@ronan.dunklau.fr
Whole thread Raw
List pgsql-bugs
Hello.

The documentation mentions the following limits for sizes:

Maximum Field Size      1 GB
Maximum Row Size        1.6 TB

However, it seems like rows bigger than 1GB can't be COPYed out:

ro=3D# create table test_text (c1 text, c2 text);
CREATE TABLE
ro=3D# insert into test_text (c1) VALUES (repeat('a', 536870912));
INSERT 0 1
ro=3D# update test_text set c2 =3D c1;
UPDATE 1

Then, trying to dump or copy that results in the following error:

ro=3D# COPY test_text TO '/tmp/test';
ERROR:  out of memory
D=C9TAIL : Cannot enlarge string buffer containing 536870913 bytes by 5=
36870912=20
more bytes.

In fact, the same thing happens when using a dumb SELECT:

ro=3D# select * from test_text ;
ERROR:  out of memory
D=C9TAIL : Cannot enlarge string buffer containing 536870922 bytes by 5=
36870912=20
more bytes.

In the case of COPY, the server uses a StringInfo to output the row. Th=
e=20
problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a r=
ow=20
should be able to hold much more than that.

So, is this a bug ? Or is there a caveat I would have missed in the=20
documentation ?

We also hit a second issue, this time related to bytea encoding.

This test case is a bit more complicated, since I had to use an externa=
l=20
(client) program to insert my data. It involves inserting a string that=
 fit=20
into 1GB when encoded in escape format, but is larger than that in hex,=
 and=20
another string which fits in 1GB using the hex format, but is larger th=
an that=20
in escape:

from=20psycopg2 import connect
from=20io import BytesIO

conn =3D connect(dbname=3D"ro")
cur =3D conn.cursor()
fullcontent =3D BytesIO()

# Write a binary string that weight less
# than 1 GB when escape encoded, but more than
# that if hex encoded
for i in range(200):
    fullcontent.write(b"aaa" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

fullcontent.seek(0)
fullcontent.truncate()

# Write another binary string that weight
# less than 1GB when hex encoded, but more than
# that if escape encoded
cur.execute("SET bytea_output =3D 'hex'")
fullcontent.write(b"\\\\x")
for i in range(300):
    fullcontent.write(b"00" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

cur.execute("COMMIT;")
cur.close()

I couldn't find an invocation of pg_dump which would allow me to dump b=
oth=20
lines:

ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Descape" pg_dump  =
=2DFc  >=20
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult=
()=20
failed.
pg_dump: Error message from server: ERROR:  invalid memory alloc reques=
t size=20
1200000001
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;
ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Dhex" pg_dump  -Fc=
  >=20
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult=
()=20
failed.
pg_dump: Error message from server: ERROR:  invalid memory alloc reques=
t size=20
1200000003
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;


Using a COPY with binary format works:

ro=3D# COPY test_bytea TO '/tmp/test' WITH BINARY;

I'm really surprised by those results: did I do anything wrong ? If it =
is=20
indeed a bug, I'm surprised it wasn't discovered before.

Thank you.

=2D-=20
Ronan Dunklau
http://dalibo.com - http://dalibo.org

pgsql-bugs by date:

Previous
From: Asif Naeem
Date:
Subject: Re: pg_upgrade failure on Windows Server
Next
From: "Ruth Melendo"
Date:
Subject: RE: [BUGS] Truncate cascade doesn´t work with BDR