Home > mailing lists

Re: Improve the performance of nested loop join in the case of partitioned inner table - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Improve the performance of nested loop join in the case of partitioned inner table
Date	March 23, 2023 13:05:41
Msg-id	CAApHDvpNrrjs6pJa=YU1rnqMe=HW_EN_EL5JBGPxYi11+-76bA@mail.gmail.com Whole thread Raw
In response to	Improve the performance of nested loop join in the case of partitioned inner table (Alexandr Nikulin <alexandr.s.nikulin@gmail.com>)
Responses	Re: Improve the performance of nested loop join in the case of partitioned inner table (Alexandr Nikulin <alexandr.s.nikulin@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, 23 Mar 2023 at 19:46, Alexandr Nikulin
<alexandr.s.nikulin@gmail.com> wrote:
> I propose to slightly improve the performance of nested loop join in the case of partitioned inner table.
> As I see in the code, the backend looks for the partition of the inner table each time after fetch a new row from the
outertable.

> These searches can take a significant amount of time.
> But we can skip this step if the nested loop parameter(s) was(re) not changed since the previous row fetched from the
outertable

I think if we were to do something like that, then it should be done
in nodeAppend.c and nodeMergeAppend.c. That way you can limit it to
only checking parameters that partition pruning needs to care about.
That does mean you'd need to find somewhere to cache the parameter
values, however. Doing it in nodeNestloop.c means you're doing it when
the inner subplan is something that does not suffer from the
additional overhead you want to avoid, e.g an Index Scan.

Also, generally, if you want to get anywhere with a performance patch,
you should post performance results from before and after your change.
Also include your benchmark setup and relevant settings for how you
got those results.  For this case, you'll want a best case (parameter
value stays the same) and a worst case, where the parameter value
changes on each outer row.  I expect you're going to add overhead to
this case as your additional checks will always detect the parameter
has changed as that'll always require partition pruning to be executed
again. We'll want to know if that overhead is high enough for us not
to want to do this.

I'll be interested to see a test that as some varlena parameter of say
a few hundred bytes to see how much overhead testing if that parameter
has changed when the pruning is being done on a HASH partitioned
table.  HASH partitioning should prune quite a bit faster than both
LIST and RANGE as the hashing is effectively O(1) vs O(log2 N) (N
being the number of Datums in the partition bounds). I'd expect a
measurable additional overhead with the patch when the parameter
changes on each outer row.

David

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 23 March 2023, 12:55:30
Subject: Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch

From: Alvaro Herrera
Date: 23 March 2023, 13:07:26
Subject: Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

Re: Improve the performance of nested loop join in the case of partitioned inner table - Mailing list pgsql-hackers

Previous

Next