Thread: Improving pg_dump performance when handling large numbers of LOBs

Improving pg_dump performance when handling large numbers of LOBs

From
Wyatt Tellis
Date:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB. When using pg_dump we've found that it takes a couple of weeks to dump out this much data.  We've tried using the jobs option with the directory format but that seems to save each LOB separately which makes moving the resulting dump to another location unwieldy.  Has anyone else had to deal with dumping a database with these many LOBs?  Are there any suggestions for how to improve performance?

Thanks,

Wyatt

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Ron Johnson
Date:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Adrian Klaver
Date:


On 2/5/24 11:35 AM, Ron Johnson wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?


I think it less about the size of the individual objects then the number (150 million) of them.

AFAIK that can't be handled by COPY, therefore they have to be transferred individually.

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Wyatt Tellis
Date:
Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Wyatt

On Mon, Feb 5, 2024 at 11:36 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Andreas Joseph Krogh
Date:
På mandag 05. februar 2024 kl. 20:55:43, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Well, the data is there nonetheless, is it an option to convert it to bytea before migration?

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Wyatt Tellis
Date:
No, we don't have the ability to make schema changes and the schema in the PG15 copy needs to match what's in the PG 12 versions

Wyatt

On Mon, Feb 5, 2024 at 12:05 PM Andreas Joseph Krogh <andreas@visena.com> wrote:
På mandag 05. februar 2024 kl. 20:55:43, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Well, the data is there nonetheless, is it an option to convert it to bytea before migration?

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Ron Johnson
Date:
Just checking... and I sympathize with your situation.

On Mon, Feb 5, 2024 at 2:56 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Yes, the LOBs themselves are tiny, but there are a lot of them (~150 million) which seem to be slowing down pg_dump.  Note, we did not design/build this system and agree that use of LOBs for this purpose was not necessary.

Wyatt

On Mon, Feb 5, 2024 at 11:36 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Andreas Joseph Krogh
Date:
På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <wyatt.tellis@gmail.com>:

No, we don't have the ability to make schema changes and the schema in the PG15 copy needs to match what's in the PG 12 versions

Well then, I guess it boils down to how badly the ones in charge wants this migration…

 

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: Improving pg_dump performance when handling large numbers of LOBs

From
Ken Marshall
Date:
On Mon, Feb 05, 2024 at 09:17:53PM +0100, Andreas Joseph Krogh wrote:
> 
> På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <
> wyatt.tellis@gmail.com <mailto:wyatt.tellis@gmail.com>>:
> 
> No, we don't have the ability to make schema changes and the schema in the 
> PG15 copy needs to match what's in the PG 12 versions
> 
> Well then, I guess it boils down to how badly the ones in charge wants this 
> migration…
> 
> --
> Andreas Joseph Krogh
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andreas@visena.com <mailto:andreas@visena.com>
> www.visena.com <https://www.visena.com>
>  <https://www.visena.com>

Hi,

Can you use logical replication? Then you aren't under the gun to make
a fast copy.

Regards,
Ken



Wyatt Tellis <wyatt.tellis@gmail.com> writes:
> Yes, the LOBs themselves are tiny, but there are a lot of them (~150
> million) which seem to be slowing down pg_dump.  Note, we did not
> design/build this system and agree that use of LOBs for this purpose was
> not necessary.

I don't know of anything much you can do at the user level.  But there
is a patchset under development to improve pg_dump's behavior with
tons of blobs:

https://www.postgresql.org/message-id/842242.1706287466@sss.pgh.pa.us

That's intended for v17, and I'm not sure how tough it might be to
adapt to v15 pg_dump, but maybe you could make that happen.  In any
case, more eyeballs reviewing that patchset would be welcome.

            regards, tom lane



Re: Improving pg_dump performance when handling large numbers of LOBs

From
Adrian Klaver
Date:
On 2/5/24 12:32, Ken Marshall wrote:
> On Mon, Feb 05, 2024 at 09:17:53PM +0100, Andreas Joseph Krogh wrote:
>>
>> På mandag 05. februar 2024 kl. 21:10:53, skrev Wyatt Tellis <
>> wyatt.tellis@gmail.com <mailto:wyatt.tellis@gmail.com>>:
>>
>> No, we don't have the ability to make schema changes and the schema in the
>> PG15 copy needs to match what's in the PG 12 versions
>>
>> Well then, I guess it boils down to how badly the ones in charge wants this
>> migration…
>>
>> --
>> Andreas Joseph Krogh
>> CTO / Partner - Visena AS
>> Mobile: +47 909 56 963
>> andreas@visena.com <mailto:andreas@visena.com>
>> www.visena.com <https://www.visena.com>
>>   <https://www.visena.com>
> 
> Hi,
> 
> Can you use logical replication? Then you aren't under the gun to make
> a fast copy.

https://www.postgresql.org/docs/current/logical-replication-restrictions.html

"Large objects (see Chapter 35) are not replicated. There is no 
workaround for that, other than storing data in normal tables."


> 
> Regards,
> Ken
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Improving pg_dump performance when handling large numbers of LOBs

From
Shaheed Haque
Date:
Might it be worth a modest amount of time using some basic profiling to see where the time is going? A week is a looonnngg time, even for 150e6 operations. For example, if there an unexpectedly high IO load, some temporary M.2 storage might help? 

On Tue, 6 Feb 2024, 01:36 Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 5, 2024 at 2:01 PM Wyatt Tellis <wyatt.tellis@gmail.com> wrote:
Hi,

We've inherited a series of legacy PG 12 clusters that each contain a database that we need to migrate to a PG 15 cluster. Each database contains about 150 million large objects totaling about 250GB.

250*10^9 / (150*10^6) = 1667 bytes.  That's tiny.

Am I misunderstanding you?