RE: speeding up planning with partitions - Mailing list pgsql-hackers

From Kato, Sho
Subject RE: speeding up planning with partitions
Date
Msg-id 25C1C6B2E7BE044889E4FE8643A58BA963AF7962@G01JPEXMBKW03
Whole thread Raw
In response to Re: speeding up planning with partitions  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
Hi, Amit. Thank you for your reply.

> I didn't load any data into tables when running the tests, because these patches are meant for reducing planner
latency. More specifically, they're addressed to fix the current planner behavior that its latency increases with
increasingnumber of partitions, with focus on the common case where only a single partition will need to be scanned by
agiven query.
 

Thank you for telling me details. It is very helpful.

> No need to try to measure the INSERT latency, because its planning latency is not affected by the number of
partitions. Moreover, I'd rather suggest you take out the INSERT statement from the benchmark for now, because its
executiontime does vary unfavorably with increasing number of partitions.
 

Thank you for your advice.

> In fact, just measure non-prepared tps and latency, because we're only interested in planning time here.

I got it.

> Hmm, since 0001 is meant to improve update planning time, it seems odd that you'd get poorer results compared to base
source. But, it seems base source is actually master + the patch to improve the execution time of update, so maybe that
patchis playing a part here, although I'm not sure why even that's making this much of a difference.
 
> I suggest that you use un-patched master as base source, that is, leave out any patches to improve execution time.

> By the way, as you may have noticed, I posted a version 2 of the patches on this thread.  If you apply them, you will
bebe able to see almost same TPS for any number of partitions with master + 0001 + 0002 + 0003.
 

> I guess if you had applied the latest version of "Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch" (which is
v8posted at [1]) on top of master + 0001 + 0002 + 0003, you'd get better performance too.
 

Thank you. I will try out these cases.

Thanks,

--
Sho Kato
-----Original Message-----
From: Amit Langote [mailto:Langote_Amit_f8@lab.ntt.co.jp] 
Sent: Monday, September 3, 2018 2:12 PM
To: Kato, Sho/加藤 翔 <kato-sho@jp.fujitsu.com>; Pg Hackers <pgsql-hackers@postgresql.org>
Subject: Re: speeding up planning with partitions

Thank you Kato-san for testing.

On 2018/08/31 19:48, Kato, Sho wrote:
> Hi, Amit
> 
> Great!
> With the total number of records being 6400, I benchmarked while increasing the number of partitions from 100 to
6400.
> Applying three all patches, 20% performance improved for 100 partitions.
> 
> I have the same number of records for each partition, do you do the same?

I didn't load any data into tables when running the tests, because these patches are meant for reducing planner
latency. More specifically, they're addressed to fix the current planner behavior that its latency increases with
increasingnumber of partitions, with focus on the common case where only a single partition will need to be scanned by
agiven query.
 

I'd try to avoid using a benchmark whose results is affected by anything other than the planning latency.  It will be a
goodidea if you leave the tables empty and just vary the number of partitions and nothing else.
 

Also, we're interested in planning latency, so using just SELECT and UPDATE in your benchmark script will be enough,
becausetheir planning time is affected by the number of partitions.  No need to try to measure the INSERT latency,
becauseits planning latency is not affected by the number of partitions.  Moreover, I'd rather suggest you take out the
INSERTstatement from the benchmark for now, because its execution time does vary unfavorably with increasing number of
partitions. Sure, there are other patches to address that, but it's better not to mix the patches to avoid confusion.
 

> Also, in my case, performance was better when not prepare.

Patches in this thread do nothing for the execution, so, there is no need to compare prepared vs non-prepared.  In
fact,just measure non-prepared tps and latency, because we're only interested in planning time here.
 

> I think these patches do not improve execute case, so we need faster runtime pruning patch[1], right?

We already have run-time pruning code (that is the code in the patch you
linked) committed into the tree in PG 11, so the master tree also has it.
But since we're not interested in execution time, no need to worry about run-time pruning.

> Details of measurement conditions and results are as follows.
> - base source
>   master(@777e6ddf17) + Speeding up Insert v8 patch[1]
>
> - table definition(e.g. 100 partition)
>   create table test.accounts(aid serial, abalance int)
>   partition by range(aid);
>   create table test.accounts_history(id serial, aid int, delta int,
>   mtime timestamp without time zone)
>   partition by range(aid);
>
> - command option
>   pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare
>   pgbench -d testdb -f benchmark.pgbench -T 180 -r -n
>   
> -results
>   base source no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 662.414805 |          0.357 |          0.265 |          0.421
>         200 | 494.478431 |          0.439 |          0.349 |          0.579
>         400 | 307.982089 |          0.651 |          0.558 |          0.927
>         800 | 191.360676 |          0.979 |          0.876 |          1.548
>        1600 |  75.344947 |          2.253 |          2.003 |          4.301
>        3200 |  30.643902 |          5.716 |          4.955 |         10.118
>        6400 |  16.726056 |         12.512 |          8.582 |         18.054
> 
>   0001 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 429.816329 |          0.811 |           0.75 |          0.365
>         200 | 275.211531 |          1.333 |          1.248 |          0.501
>         400 | 160.499833 |          2.384 |          2.252 |          0.754
>         800 |  79.387776 |          4.935 |          4.698 |          1.468
>        1600 |  24.787377 |         16.593 |         15.954 |          4.302
>        3200 |   9.846421 |          42.96 |         42.139 |          8.848
>        6400 |   4.919772 |          87.43 |         83.109 |          16.56

Hmm, since 0001 is meant to improve update planning time, it seems odd that you'd get poorer results compared to base
source. But, it seems base source is actually master + the patch to improve the execution time of update, so maybe that
patchis playing a part here, although I'm not sure why even that's making this much of a difference.
 

I suggest that you use un-patched master as base source, that is, leave out any patches to improve execution time.

[ ... ]

>   0001 + 0002 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 |  682.53091 |          0.388 |           0.35 |           0.35
>         200 | 469.906601 |          0.543 |          0.496 |           0.51
>         400 | 321.915349 |           0.78 |          0.721 |          0.752
>         800 | 201.620975 |          1.246 |          1.156 |          1.236
>        1600 |  94.438204 |          2.612 |          2.335 |          2.745
>        3200 |  38.292922 |          6.657 |          5.579 |          6.808
>        6400 |   21.48462 |         11.989 |         10.104 |         12.601
> 

[ ... ]

>        
>   0001 + 0002 + 0003 no prepared
>    part_num |   tps_ex   | update_latency | select_latency | insert_latency 
>   ----------+------------+----------------+----------------+----------------
>         100 | 798.962029 |          0.304 |          0.267 |          0.339
>         200 | 577.893396 |          0.384 |          0.346 |          0.487
>         400 | 426.542177 |          0.472 |          0.435 |          0.717
>         800 | 288.616213 |           0.63 |          0.591 |          1.162
>        1600 | 154.247034 |          1.056 |          0.987 |          2.384
>        3200 |  59.711446 |          2.416 |          2.233 |          6.514
>        6400 |  37.109761 |          3.387 |          3.099 |         11.762
>        

[ ... ]

By the way, as you may have noticed, I posted a version 2 of the patches on this thread.  If you apply them, you will
bebe able to see almost same TPS for any number of partitions with master + 0001 + 0002 + 0003.
 

>   Although it may not be related to this, when measured with pg11 beta2, somehow the performance was better.
> 
>   11beta2 + v1-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch[3] prepared
>    part_num |   tps_ex    | update_latency | select_latency | insert_latency 
>   ----------+-------------+----------------+----------------+----------------
>      100    | 756.07228   |          0.942 |          0.091 |          0.123

I guess if you had applied the latest version of "Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch" (which is v8
postedat [1]) on top of master + 0001 + 0002 + 0003, you'd get better performance too.  But, as mentioned above, we're
interestedin measuring planning latency, not execution latency, so we should leave out any patches that are meant
towardimproving execution latency to avoid confusion.
 


Thanks again.

Regards,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f9T_32Xpb-p8cWwo5ezSfVhXviUW8QTWncP8ksPHDRK8g%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: pg_verify_checksums failure with hash indexes
Next
From: Amit Langote
Date:
Subject: Re: pointless check in RelationBuildPartitionDesc