Re: partition routing layering in nodeModifyTable.c - Mailing list pgsql-hackers

From Amit Langote
Subject Re: partition routing layering in nodeModifyTable.c
Date
Msg-id CA+HiwqH+5EYSTZzOQ8FXhySYzQgUb5efqeuh0EeyugAe2=132A@mail.gmail.com
Whole thread Raw
In response to Re: partition routing layering in nodeModifyTable.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: partition routing layering in nodeModifyTable.c
List pgsql-hackers
On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 22/10/2020 16:49, Amit Langote wrote:
> > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>> It's probably true that there's no performance gain from initializing
> >>> them more lazily. But the reasoning and logic around the initialization
> >>> is complicated. After tracing through various path through the code, I'm
> >>> convinced enough that it's correct, or at least these patches didn't
> >>> break it, but I still think some sort of lazy initialization on first
> >>> use would make it more readable. Or perhaps there's some other
> >>> refactoring we could do.
> >>
> >> So the other patch I have mentioned is about lazy initialization of
> >> the ResultRelInfo itself, not the individual fields, but maybe with
> >> enough refactoring we can get the latter too.
> >
> > So, I tried implementing a lazy-initialization-on-first-access
> > approach for both the ResultRelInfos themselves and some of the
> > individual fields of ResultRelInfo that don't need to be set right
> > away.  You can see the end result in the attached 0003 patch.  This
> > slims down ExecInitModifyTable() significantly, both in terms of code
> > footprint and the amount of work that it does.
>
> Have you done any performance testing? I'd like to know how much of a
> difference this makes in practice.

I have shown some numbers here:

https://www.postgresql.org/message-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com

To reiterate, if you apply the following patch:

> Does this patch become moot if we do the "Overhaul UPDATE/DELETE
> processing"?
> (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)?

...and run this benchmark with plan_cache_mode = force_generic_plan

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 1000000)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

you may see roughly the following results:

HEAD:

0 tps = 13045.485121 (excluding connections establishing)
10 tps = 9358.157433 (excluding connections establishing)
100 tps = 1878.274500 (excluding connections establishing)
1000 tps = 84.684695 (excluding connections establishing)

Patched (overhaul update/delete processing):

0 tps = 12743.487196 (excluding connections establishing)
10 tps = 12644.240748 (excluding connections establishing)
100 tps = 4158.123345 (excluding connections establishing)
1000 tps = 391.248067 (excluding connections establishing)

And if you apply the patch being discussed here, TPS shoots up a bit,
especially for higher partition counts:

Patched (lazy-ResultRelInfo-initialization)

0 tps = 13419.283168 (excluding connections establishing)
10 tps = 12588.016095 (excluding connections establishing)
100 tps = 8560.824225 (excluding connections establishing)
1000 tps = 1926.553901 (excluding connections establishing)

To explain these numbers a bit, "overheaul update/delete processing"
patch improves the performance of that benchmark by allowing the
updates to use run-time pruning when executing generic plans, which
they can't today.

However without "lazy-ResultRelInfo-initialization" patch,
ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can
be seen to be spending time initializing all of those result
relations, whereas only one of those will actually be used.

As mentioned further in that email, it's really the locking of all
relations by AcquireExecutorLocks() that occurs even before we enter
the executor that's a much thornier bottleneck for this benchmark.
But the ResultRelInfo initialization bottleneck sounded like it could
get alleviated in a relatively straightforward manner.  The patches
that were developed for attacking the locking bottleneck would require
further reflection on whether they are correct.

(Note: I've just copy pasted the numbers I reported in that email.  To
reproduce, I'll have to rebase the "overhaul update/delete processing"
patch on this one, which I haven't yet done.)

> Another alternative is to continue to create the ResultRelInfos in
> ExecInitModify(), but initialize the individual fields in them lazily.

If you consider the above, maybe you can see how that will not really
eliminate the bottleneck I'm aiming to fix here.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Online checksums verification in the backend
Next
From: John Naylor
Date:
Subject: Re: speed up unicode decomposition and recomposition