Re: speeding up planning with partitions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: speeding up planning with partitions
Date
Msg-id 3529.1554051867@sss.pgh.pa.us
Whole thread Raw
In response to Re: speeding up planning with partitions  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: speeding up planning with partitions
Re: speeding up planning with partitions
List pgsql-hackers
Amit Langote <amitlangote09@gmail.com> writes:
> On Sun, Mar 31, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i443@live.jp> wrote:
>> Certainly, using bitmapset contributes to the performance when scanning
>> one partition(few partitions) from large partitions.

> Thanks Imai-san for testing.

I tried to replicate these numbers with the code as-committed, and
could not.  What I get, using the same table-creation code as you
posted and a pgbench script file like

\set param random(1, :N)
select * from rt where a = :param;

is scaling like this:

N    tps, range    tps, hash

2    10520.519932    10415.230400
8    10443.361457    10480.987665
32    10341.196768    10462.551167
128    10370.953849    10383.885128
512    10207.578413    10214.049394
1024    10042.794340    10121.683993
4096    8937.561825    9214.993778
8192    8247.614040    8486.728918

If I use "-M prepared" the numbers go up a bit for lower N, but
drop at high N:

N    tps, range    tps, hash

2    11449.920527    11462.253871
8    11530.513146    11470.812476
32    11372.412999    11450.213753
128    11289.351596    11322.698856
512    11095.428451    11200.683771
1024    10757.646108    10805.052480
4096    8689.165875    8930.690887
8192    7301.609147    7502.806455

Digging into that, it seems like the degradation with -M prepared is
mostly in LockReleaseAll's hash_seq_search over the locallock hash table.
What I think must be happening is that with -M prepared, at some point the
plancache decides to try a generic plan, which causes opening/locking all
the partitions, resulting in permanent bloat in the locallock hash table.
We immediately go back to using custom plans, but hash_seq_search has
more buckets to look through for the remainder of the process' lifetime.

I do see some cycles getting spent in apply_scanjoin_target_to_paths
that look to be due to scanning over the long part_rels array,
which your proposal would ameliorate.  But (a) that's pretty small
compared to other effects, and (b) IMO, apply_scanjoin_target_to_paths
is a remarkable display of brute force inefficiency to begin with.
I think we should see if we can't nuke that function altogether in
favor of generating the paths with the right target the first time.

BTW, the real elephant in the room is the O(N^2) cost of creating
these tables in the first place.  The runtime for the table-creation
scripts looks like

N    range        hash

2    0m0.011s    0m0.011s
8    0m0.015s    0m0.014s
32    0m0.032s    0m0.030s
128    0m0.132s    0m0.099s
512    0m0.969s    0m0.524s
1024    0m3.306s    0m1.442s
4096    0m46.058s    0m15.522s
8192    3m11.995s    0m58.720s

This seems to be down to the expense of doing RelationBuildPartitionDesc
to rebuild the parent's relcache entry for each child CREATE TABLE.
Not sure we can avoid that, but maybe we should consider adopting a
cheaper-to-read representation of partition descriptors.  The fact that
range-style entries seem to be 3X more expensive to load than hash-style
entries is strange.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Nikolay Petrov
Date:
Subject: Re: [HACKERS][Proposal] LZ4 Compressed Storage Manager
Next
From: Andrey Borodin
Date:
Subject: Re: Google Summer of Code: question about GiST API advancementproject