Re: Inefficiency in parallel pg_restore with many tables - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Inefficiency in parallel pg_restore with many tables
Date
Msg-id 20230715181916.qluqwiuh5ckg4f75@awork3.anarazel.de
Whole thread Raw
In response to Inefficiency in parallel pg_restore with many tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2023-07-15 13:47:12 -0400, Tom Lane wrote:
> I wonder if we could replace the sorted ready-list with a priority heap,
> although that might be complicated by the fact that pop_next_work_item
> has to be capable of popping something that's not necessarily the
> largest remaining job.  Another idea could be to be a little less eager
> to sort the list every time; I think in practice scheduling wouldn't
> get much worse if we only re-sorted every so often.

Perhaps we could keep track of where the newly inserted items are, and use
insertion sort or such when the number of new elements is much smaller than
the size of the already sorted elements?

As you say, a straight priority heap might not be easy. But we could just open
code using two sorted arrays, one large, one for recent additions that needs
to be newly sorted. And occasionally merge the small array into the big array,
once it has gotten large enough that sorting becomes expensive.  We could go
for a heap of N>2 such arrays, but I doubt it would be worth much.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Inefficiency in parallel pg_restore with many tables
Next
From: Nikita Malakhov
Date:
Subject: Protect extension' internal tables - how?