Re: pg_upgrade with large pg_largeobject table - Mailing list pgsql-general

From Mate Varga
Subject Re: pg_upgrade with large pg_largeobject table
Date
Msg-id CAK4GaZ782vw5vDSV0BEhGVu62GVyTsnHeRctYNzrtMiBwmJ4Dw@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade with large pg_largeobject table  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_upgrade with large pg_largeobject table
List pgsql-general
Thanks.

> You mean 250M rows in pg_largeobject itself, or 250M large objects
(that is, 250M rows in pg_largeobject_metadata)?

250M large objects.

> Are you sure you're using a 64-bit build of pg_dump?

 file /usr/lib/postgresql/10/bin/pg_dump
/usr/lib/postgresql/10/bin/pg_dump: ELF 64-bit LSB  shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=4ff6bf0dd628d151481256723994dd50785e54e9, stripped
Also it really eats all the memory.

> Using the large-object API for things that tend to not actually be very large (which they aren't, if you've got hundreds of millions of 'em)  is an antipattern, I'm afraid.

I know :( So maybe I need to do some refactoring in the application and inline the lobs. The data is binary data with very high entropy (encrypted stuff). Would you recommend bytea for that?

Thanks again.
Mate



On Tue, Aug 14, 2018 at 7:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mate Varga <m@matevarga.net> writes:
> We have a PSQL 9.5 DB with 16G physical RAM and ~ 1 TB data mostly stored
> in the pg_largeobject system table. This table has 250M rows at the moment.

You mean 250M rows in pg_largeobject itself, or 250M large objects
(that is, 250M rows in pg_largeobject_metadata)?

> This command
> fails because of an OOM. Logs say:
> pg_dump: [archiver (db)] query failed: out of memory for query result
> pg_dump: [archiver (db)] query was: SELECT oid, (SELECT rolname FROM
> pg_catalog.pg_roles WHERE oid = lomowner) AS rolname, lomacl, NULL AS
> rlomacl, NULL AS initlomacl, NULL AS initrlomacl  FROM
> pg_largeobject_metadata

The selected rows shouldn't be very wide here, so I'm a bit surprised
you are hitting OOM, even if there are 250M rows to read.  Are you sure
you're using a 64-bit build of pg_dump?

> Would it help if we'd inline these largeobjects as e.g. text cols (instead
> of storing them as lobs)?

Yes, because then they'd not have their own ownership and permissions
for pg_dump to keep track of.  (You might want bytea instead of text,
depending on what the data is.)

Using the large-object API for things that tend to not actually be very
large (which they aren't, if you've got hundreds of millions of 'em)
is an antipattern, I'm afraid.  You could get away with it before we
added per-largeobject permissions, but now it's a problem for pg_dump.

                        regards, tom lane

pgsql-general by date:

Previous
From: Jerry Sievers
Date:
Subject: Re: Duplicating data folder without tablespace, for read access
Next
From: Tom Lane
Date:
Subject: Re: pg_upgrade with large pg_largeobject table