Thread: pg_dump out of memory for large table with LOB

pg_dump out of memory for large table with LOB

From
Jean-Marc Lessard
Date:

I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit

on win2012 with 12Gb RAM

The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the space.

The pg_dump consumes the entire system memory and swap, then terminates with out of memory error

Is it a bug or normal behavior?

If I do not include LOB in the dump, it works fine.

 

Here is the dump output:

C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures -v > d:\postgresql\sig.dmp

pg_dump: last built-in OID is 16383

...

pg_dump: reading row security enabled for table "ibisl1.signatures"

pg_dump: reading policies for table "ibisl1.signatures"

pg_dump: reading large objects

pg_dump: reading dependency data

pg_dump: saving encoding = UTF8

pg_dump: saving standard_conforming_strings = on

out of memory

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

Re: pg_dump out of memory for large table with LOB

From
Ron
Date:
On 11/09/2018 05:49 PM, Jean-Marc Lessard wrote:

I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit

on win2012 with 12Gb RAM

The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the space.

The pg_dump consumes the entire system memory and swap, then terminates with out of memory error

Is it a bug or normal behavior?

If I do not include LOB in the dump, it works fine.

 

Here is the dump output:

C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures -v > d:\postgresql\sig.dmp

pg_dump: last built-in OID is 16383

...

pg_dump: reading row security enabled for table "ibisl1.signatures"

pg_dump: reading policies for table "ibisl1.signatures"

pg_dump: reading large objects

pg_dump: reading dependency data

pg_dump: saving encoding = UTF8

pg_dump: saving standard_conforming_strings = on

out of memory


This looks similar to the recent thread "Trouble Upgrading Postgres".
https://www.postgresql.org/message-id/flat/CAFw6%3DU2oz9rTF0qa0LFMg91bu%3Dhdisfu2-xXU1%3D%3DD7yBif%2B2uw%40mail.gmail.com

Specifically, message ce239c9c-68f2-43e6-a6b6-81c66d0f46e5@manitou-mail.org

"The hex expansion performed by COPY must allocate twice that size,
plus the rest of the row, and if that resulting size is above 1GB, it
will error out with the message you mentioned upthread:
ERROR: invalid memory alloc request size <some value over 1 billion>.
So there's no way it can deal with the contents over 500MB, and the
ones just under that limit may also be problematic."


And message 89b5b622-4c79-4c95-9ad4-b16d0d0daf9b@manitou-mail.org

"It's undoubtedly very annoying that a database can end up with

non-pg_dump'able contents, but it's not an easy problem to solve. Some time ago, work was done to extend the 1GB limit but eventually it got scratched. The thread in [1] discusses many details of the problem and why the proposed solution were mostly a band aid. Basically, the specs of COPY and other internal aspects of Postgres are from the 32-bit era when putting the size of an entire CDROM in a single row/column was not anticipated as a valid use case. It's still a narrow use case today and applications that need to store big pieces of data like that should slice them in chunks, a bit like in pg_largeobject, except in much larger chunks, like 1MB.

[1] pg_dump / copy bugs with "big lines" ? https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr  "


Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


--
Angular momentum makes the world go 'round.

Re: pg_dump out of memory for large table with LOB

From
Adrian Klaver
Date:
On 11/10/18 2:46 PM, Ron wrote:
> On 11/09/2018 05:49 PM, Jean-Marc Lessard wrote:
>>
>> I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by 
>> gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit
>>
>> on win2012 with 12Gb RAM
>>
>> The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of 
>> the space.
>>
>> The pg_dump consumes the entire system memory and swap, then 
>> terminates with out of memory error
>>
>> Is it a bug or normal behavior?
>>
>> If I do not include LOB in the dump, it works fine.
>>
>> Here is the dump output:
>>
>> C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures 
>> -v > d:\postgresql\sig.dmp
>>
>> pg_dump: last built-in OID is 16383
>>
>> ...
>>
>> pg_dump: reading row security enabled for table "ibisl1.signatures"
>>
>> pg_dump: reading policies for table "ibisl1.signatures"
>>
>> pg_dump: reading large objects
>>
>> pg_dump: reading dependency data
>>
>> pg_dump: saving encoding = UTF8
>>
>> pg_dump: saving standard_conforming_strings = on
>>
>> out of memory
>>
> 
> This looks similar to the recent thread "Trouble Upgrading Postgres".
>
https://www.postgresql.org/message-id/flat/CAFw6%3DU2oz9rTF0qa0LFMg91bu%3Dhdisfu2-xXU1%3D%3DD7yBif%2B2uw%40mail.gmail.com
> 
> Specifically, message 
> ce239c9c-68f2-43e6-a6b6-81c66d0f46e5@manitou-mail.org 
> <https://www.postgresql.org/message-id/ce239c9c-68f2-43e6-a6b6-81c66d0f46e5%40manitou-mail.org>
> 
> "The hex expansion performed by COPY must allocate twice that size,
> plus the rest of the row, and if that resulting size is above 1GB, it
> will error out with the message you mentioned upthread:
> ERROR: invalid memory alloc request size <some value over 1 billion>.
> So there's no way it can deal with the contents over 500MB, and the
> ones just under that limit may also be problematic."

I don't this is the case. The above is an issue because of the maximum 
length of a string that Postgres can process. LO's are different creatures:

https://www.postgresql.org/docs/11/lo-implementation.html


It would help to see the memory configuration values set for the cluster:

https://www.postgresql.org/docs/11/lo-implementation.html

> 
> 
> And message 89b5b622-4c79-4c95-9ad4-b16d0d0daf9b@manitou-mail.org
> 
> "It's undoubtedly very annoying that a database can end up with
> 
> non-pg_dump'able contents, but it's not an easy problem to solve. Some 
> time ago, work was done to extend the 1GB limit but eventually it got 
> scratched. The thread in [1] discusses many details of the problem and 
> why the proposed solution were mostly a band aid. Basically, the specs 
> of COPY and other internal aspects of Postgres are from the 32-bit era 
> when putting the size of an entire CDROM in a single row/column was not 
> anticipated as a valid use case. It's still a narrow use case today and 
> applications that need to store big pieces of data like that should 
> slice them in chunks, a bit like in pg_largeobject, except in much 
> larger chunks, like 1MB.
> 
> [1] pg_dump / copy bugs with "big lines" ? 
> https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr 
> <https://www.postgresql.org/message-id/1836813.YmyOrS99PX@ronan.dunklau.fr>
> 
>   "
> 
> 
>> *Jean-Marc Lessard*
>> Administrateur de base de données / Database Administrator
>> Ultra Electronics Forensic Technology Inc.
>> *T* +1 514 489 4247 x4164
>> www.ultra-forensictechnology.com <http://www.ultra-forensictechnology.com>
>>
> 
> -- 
> Angular momentum makes the world go 'round.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: pg_dump out of memory for large table with LOB

From
Adrien Nayrat
Date:
Hello,

On 11/10/18 12:49 AM, Jean-Marc Lessard wrote:
> The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the space.
>

If I understand, you have 17 million Large Object?

I do not recall exactly and maybe I am wrong. But it seems pg_dump has to
allocate memory for each object to dump :
addBoundaryDependencies:

    for (i = 0; i < numObjs; i++)
[...]

case DO_BLOB_DATA:
    /* Data objects: must come between the boundaries */
    addObjectDependency(dobj, preDataBound->dumpId);
    addObjectDependency(postDataBound, dobj->dumpId);
    break;

addObjectDependency:

[...]
                pg_malloc(dobj->allocDeps * sizeof(DumpId));


With 17 million LO, it could eat lot of memory ;)


Attachment

RE: pg_dump out of memory for large table with LOB

From
Jean-Marc Lessard
Date:
Adrien Nayrat wrote:
> With 17 million LO, it could eat lot of memory ;)
Yes it does.

I did several tests and here are my observations.

First memory settings are:
shared_buffers = 3GB
work_mem = 32Mb
maintenance_work_mem = 1GB
effective_cache_size = 9MB
bytea_output = 'escape'

The largest LO is 160KB for the dumped table
The largest LO is 20MB for the entire DB (fare from the 1GB)

I reduced the shared_buffers to 512Mb and have the same behavior.
I increased the pagefile from 8Gb to 16Gb and the dump processed. The total memory of pg_dump reached nearly 20GB.
I killed the dump after 200GB (20 hrs). It pages aggressively and would have last 4 days to dump 800GB.

Here is the memory pattern:
pg_dump: reading large objects    -> Reads pg_largeobject_metadata. Ramp up gradually from 0 to 4GB during 10mins and jump to nearly 10GB at the end.
pg_dump: reading dependency data  -> drop to 7GB
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on        -> begin paging and memory total reached 20Gb
pg_dump: dumping contents of table "ibisl1.signatures"  -> begin writing to dump file
pg_dump: saving large objects     -> reading pg_largeobject file nodes and writing to dump file

I performed the same test on a smaller DB (60GB)
I dump a single table with LOBs and then the entire DB (3 tables have LOBs).
- Single 1GB table: 1GB including lobs, 80 000 rows, max LOB size 100KB
  pg_dump -h localhost -Fc -a -b -t signatures -v > d:\postgresql\sig.dmp
- Entire 60GB DB:   3 tables have LOBs, 240 000 LOBs, max LOB size 20MB
  pg_dump -h localhost -Fc -v > d:\postgresql\db.dmp
Both dumps used the same amount of memory (160MB) and dump file sizes are more or less the same 53Gb!
We can conclude that the single table dump includes the entire pg_largeobject table, not only the LOBs respective to the table.
So why reading the large objects to build a huge structure in memory if all large objects are going to be dump.
There is certainly something to rework.

I run another test. I dump the application schema that contains the data.
  pg_dump -h localhost -Fc -n ibis* -v > d:\postgresql\ibislx.dmp
The pg_dump used a few MB, so the memory issue is clearly due to LOB.

The relation between the number of large objects and dump memory looks linear ~ 650bytes/LOB
Small DB:    240 000 lobs = 0.160GB
Large DB: 28 500 000 lobs = 19.5GB
Seems a large memory allocation for processing LOB later on.

Another area where LOB hurts is the storage. LOB are broken and stored in 2K pieces.
Due to the block header, only three 2k pieces fit in an 8k block wasting 25% of space (in fact pgstattuple reports ~ 20%).

Would you recommend bytea over LOB considering that the max LOB size is well bellow 1GB?
Are bytea preferable in terms of support by the community, performance, feature, etc?

We choose saving binary data into LOB over bytea because LOB are stored out of row, and more particularly one table is frequently accessed, but not the LOB.
However, bytea would be toasted and stored out of row anyway as the binary data saved is bigger that 2k.
Currently the frequently accessed table contains 40 rows per 8k block (avg row length 200 bytes).
Can we just set the toast_tuple_target to 256 bytes for that table to get similar results?

Does toasted bytea will be stored in 2K pieces and 25% of space (or less if toast_tuple_target is changed) be wasted like for LOB?

I also presume that when a row (columns other that the bytea) is updated, a new copy of the bytea will be made in the toast table?
So LOB will be more suitable for frequently updated table where the binary data rarely change.

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

Re: pg_dump out of memory for large table with LOB

From
Tom Lane
Date:
Jean-Marc Lessard <Jean-Marc.Lessard@ultra-ft.com> writes:
> Would you recommend bytea over LOB considering that the max LOB size is well bellow 1GB?

Yes, probably.  The reason that pg_dump has trouble with lots of small
BLOBs is the 9.0-era decision to treat BLOBs as independent objects
having their own owners, privilege attributes, and archive TOC entries
--- it's really the per-BLOB TOC entries that are causing the issue
for you here.  That model is fine as long as BLOBs are, uh, large.
If you're using them as replacements for bytea, the overhead is going
to be prohibitive.

            regards, tom lane


Re: pg_dump out of memory for large table with LOB

From
Ron
Date:
On 11/14/2018 11:14 AM, Jean-Marc Lessard wrote:
Adrien Nayrat wrote:
> With 17 million LO, it could eat lot of memory ;)
Yes it does.

I did several tests and here are my observations.

First memory settings are:
shared_buffers = 3GB
work_mem = 32Mb
maintenance_work_mem = 1GB
effective_cache_size = 9MB
bytea_output = 'escape'

Why escape instead of hex?


--
Angular momentum makes the world go 'round.

RE: pg_dump out of memory for large table with LOB

From
"Daniel Verite"
Date:
    Jean-Marc Lessard wrote:

> Another area where LOB hurts is the storage. LOB are broken and stored in 2K
> pieces.
> Due to the block header, only three 2k pieces fit in an 8k block wasting 25%
> of space (in fact pgstattuple reports ~ 20%).

Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus
2048 bytes for large objects. And that makes a significant difference
when packing these slices because 2000*4+page overhead+
4*(row overhead) is just under the default size of 8192 bytes per page,
whereas 2048*4+(page overhead)+4*(row overhead)
is obviously a bit over 8192, since 2048*4=8192.

If the data is compressible, the difference may be less obvious because
the slices in pg_largeobject are compressed individually
(as opposed to bytea that gets compressed as a whole),
so more than 3 slices can fit in a page inside pg_largeobject
The post-compression size can be known with pg_column_size(),
versus octet_length() that gives the pre-compression size.

> Would you recommend bytea over LOB considering that the max LOB size is well
> bellow 1GB?
> Are bytea preferable in terms of support by the community, performance,
> feature, etc?

For the storage and pg_dump issues, bytea seems clearly preferable
in your case.
As for the performance aspect, large objects are excellent because their
API never requires a binary<->text conversion.
This may be different with bytea. The C API provided by libpq allows to
retrieve and send bytea in binary format, for instance through
PQexecParams(), but most drivers implemented on top of libpq use only
the text representation for all datatypes, because it's simpler for them.
So you may want to check the difference in sending and retrieving
your biggest binary objects with your particular app/language/framework
stored in a bytea column versus large objects.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


RE: pg_dump out of memory for large table with LOB

From
Jean-Marc Lessard
Date:
Thanks to Daniel Verite, nice answer, really helpful :)
It summarizes what I have read in the doc and blogs.

What about updates where the bytea do not changed. Does a new copy of the bytea will be made in the toast table or new row will point to the original bytea?
> https://www.postgresql.org/docs/current/storage-toast.html says
> The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress
> and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had. During an UPDATE
> operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.

Does it means, no incurs cost to generate the out of line toast, but that a copy of the bytea is still made for the new line?

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

RE: pg_dump out of memory for large table with LOB

From
Jean-Marc Lessard
Date:

JMLessard wrote:

> What about updates where the bytea do not changed. Does a new copy of the bytea will be made in the toast table or new row will point to the original bytea?

> > https://www.postgresql.org/docs/current/storage-toast.html says:

> > The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress

> > and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had. During an UPDATE

> > operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.

> Does it means, no incurs cost to generate the out of line toast, but that a copy of the bytea is still made for the new line?

 

I bench mark it as follow:

UPDATE table SET mod_tim=mod_tim;

The relpages of the table doubled, but the relpages of the toast table did not changed.

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

RE: pg_dump out of memory for large table with LOB

From
Jean-Marc Lessard
Date:
Thanks Tom Lane for your answer
Same issue for pg_dump and LOB was also reported in https://postgrespro.com/list/thread-id/2211186

Concerning the 25% waste of space we experienced with LOB:
We are using LOB because we save jpeg2000 images into the DB.
We display them as thumbnails in a 6x10 multi-viewer (60 images displayed at a time) that the user can scroll to next ones.
When retrieving images into the multi-viewer, we only read the first 30% of the jpeg2000.
Even if jpeg2000 images are relatively small (about 20MB), retrieving thousands of images partially greatly speedup the display.

src/include/storage/large_object.h:
/*
 * Each "page" (tuple) of a large object can hold this much data
 *
 * We could set this as high as BLCKSZ less some overhead, but it seems
 * better to make it a smaller value, so that not as much space is used
 * up when a page-tuple is updated.  Note that the value is deliberately
 * chosen large enough to trigger the tuple toaster, so that we will
 * attempt to compress page tuples in-line.  (But they won't be moved off
 * unless the user creates a toast-table for pg_largeobject...)
 *
 * Also, it seems to be a smart move to make the page size be a power of 2,
 * since clients will often be written to send data in power-of-2 blocks.
 * This avoids unnecessary tuple updates caused by partial-page writes.
 *
 * NB: Changing LOBLKSIZE requires an initdb.
 */
#define LOBLKSIZE       (BLCKSZ / 4)

Here is my understanding, please correct me if I am wrong.
If the data is compressible, each 2k (2048) uncompressed pieces should be compressed to pieces smaller than 2000 bytes, so that 4 rows and more can fit in 8k data block.
In our case as jpeg2000 images are already compressed, the 2K pieces remain 2048 bytes after LO compression and only 3 pieces can fit in a block.
Can we change the LOBLKSIZE to 2000 bytes to fit 4 rows in a block as follow?
#define LOBLKSIZE       2000
Must the LOBLKSIZE be a power-of-2?
Is there any optimization expecting a power-of-2 value?

Thank you.

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com