PGDump / copy bugs with "big lines" ? - Mailing list pgsql-bugs
From | Ronan Dunklau |
---|---|
Subject | PGDump / copy bugs with "big lines" ? |
Date | |
Msg-id | 3859545.fB9yxixrjG@ronan.dunklau.fr Whole thread Raw |
List | pgsql-bugs |
Hello. The documentation mentions the following limits for sizes: Maximum Field Size 1 GB Maximum Row Size 1.6 TB However, it seems like rows bigger than 1GB can't be COPYed out: ro=3D# create table test_text (c1 text, c2 text); CREATE TABLE ro=3D# insert into test_text (c1) VALUES (repeat('a', 536870912)); INSERT 0 1 ro=3D# update test_text set c2 =3D c1; UPDATE 1 Then, trying to dump or copy that results in the following error: ro=3D# COPY test_text TO '/tmp/test'; ERROR: out of memory D=C9TAIL : Cannot enlarge string buffer containing 536870913 bytes by 5= 36870912=20 more bytes. In fact, the same thing happens when using a dumb SELECT: ro=3D# select * from test_text ; ERROR: out of memory D=C9TAIL : Cannot enlarge string buffer containing 536870922 bytes by 5= 36870912=20 more bytes. In the case of COPY, the server uses a StringInfo to output the row. Th= e=20 problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a r= ow=20 should be able to hold much more than that. So, is this a bug ? Or is there a caveat I would have missed in the=20 documentation ? We also hit a second issue, this time related to bytea encoding. This test case is a bit more complicated, since I had to use an externa= l=20 (client) program to insert my data. It involves inserting a string that= fit=20 into 1GB when encoded in escape format, but is larger than that in hex,= and=20 another string which fits in 1GB using the hex format, but is larger th= an that=20 in escape: from=20psycopg2 import connect from=20io import BytesIO conn =3D connect(dbname=3D"ro") cur =3D conn.cursor() fullcontent =3D BytesIO() # Write a binary string that weight less # than 1 GB when escape encoded, but more than # that if hex encoded for i in range(200): fullcontent.write(b"aaa" * 1000000) fullcontent.seek(0) cur.copy_from(fullcontent, "test_bytea") fullcontent.seek(0) fullcontent.truncate() # Write another binary string that weight # less than 1GB when hex encoded, but more than # that if escape encoded cur.execute("SET bytea_output =3D 'hex'") fullcontent.write(b"\\\\x") for i in range(300): fullcontent.write(b"00" * 1000000) fullcontent.seek(0) cur.copy_from(fullcontent, "test_bytea") cur.execute("COMMIT;") cur.close() I couldn't find an invocation of pg_dump which would allow me to dump b= oth=20 lines: ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Descape" pg_dump = =2DFc >=20 /dev/null pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult= ()=20 failed. pg_dump: Error message from server: ERROR: invalid memory alloc reques= t size=20 1200000001 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; ro@ronan_laptop /tmp % PGOPTIONS=3D"-c bytea_output=3Dhex" pg_dump -Fc= >=20 /dev/null pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult= ()=20 failed. pg_dump: Error message from server: ERROR: invalid memory alloc reques= t size=20 1200000003 pg_dump: The command was: COPY public.test_bytea (c1) TO stdout; Using a COPY with binary format works: ro=3D# COPY test_bytea TO '/tmp/test' WITH BINARY; I'm really surprised by those results: did I do anything wrong ? If it = is=20 indeed a bug, I'm surprised it wasn't discovered before. Thank you. =2D-=20 Ronan Dunklau http://dalibo.com - http://dalibo.org
pgsql-bugs by date: