RE: speeding up planning with partitions - Mailing list pgsql-hackers
From | Kato, Sho |
---|---|
Subject | RE: speeding up planning with partitions |
Date | |
Msg-id | 25C1C6B2E7BE044889E4FE8643A58BA963AF581E@G01JPEXMBKW03 Whole thread Raw |
In response to | speeding up planning with partitions (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: speeding up planning with partitions
|
List | pgsql-hackers |
Hi, Amit Great! With the total number of records being 6400, I benchmarked while increasing the number of partitions from 100 to 6400. Applying three all patches, 20% performance improved for 100 partitions. I have the same number of records for each partition, do you do the same? Also, in my case, performance was better when not prepare. I think these patches do not improve execute case, so we need faster runtime pruning patch[1], right? Details of measurement conditions and results are as follows. - base source master(@777e6ddf17) + Speeding up Insert v8 patch[1] - table definition(e.g. 100 partition) create table test.accounts(aid serial, abalance int) partition by range(aid); create table test.accounts_history(id serial, aid int, delta int, mtime timestamp without time zone) partition by range(aid); create table test.account_part_1 partition of test.accounts for values from (1) to (65); create table test.account_part_2 partition of test.accounts for values from (65) to (129); ... create table test.account_part_100 partition of test.accounts for values from (6337) to (6400); create table test.ah_part_1 partition of test.accounts_history for values from (1) to (65); create table test.ah_part_2 partition of test.accounts_history for values from (65) to (129); ... create table test.ah_part_100 partition of test.accounts_history for values from (6337) to (6400); - benchmark script \set aid random(1, 6400) \set delta random(-5000, 5000) BEGIN; UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM test.accounts WHERE aid = :aid; INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, CURRENT_TIMESTAMP); END; - command option pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -M prepare pgbench -d testdb -f benchmark.pgbench -T 180 -r -n -results base source no prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 662.414805 | 0.357 | 0.265 | 0.421 200 | 494.478431 | 0.439 | 0.349 | 0.579 400 | 307.982089 | 0.651 | 0.558 | 0.927 800 | 191.360676 | 0.979 | 0.876 | 1.548 1600 | 75.344947 | 2.253 | 2.003 | 4.301 3200 | 30.643902 | 5.716 | 4.955 | 10.118 6400 | 16.726056 | 12.512 | 8.582 | 18.054 0001 no prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 429.816329 | 0.811 | 0.75 | 0.365 200 | 275.211531 | 1.333 | 1.248 | 0.501 400 | 160.499833 | 2.384 | 2.252 | 0.754 800 | 79.387776 | 4.935 | 4.698 | 1.468 1600 | 24.787377 | 16.593 | 15.954 | 4.302 3200 | 9.846421 | 42.96 | 42.139 | 8.848 6400 | 4.919772 | 87.43 | 83.109 | 16.56 0001 prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 245.100776 | 2.728 | 0.374 | 0.476 200 | 140.249283 | 5.116 | 0.603 | 0.686 400 | 67.608559 | 11.383 | 1.055 | 1.179 800 | 23.085806 | 35.781 | 2.585 | 2.677 1600 | 6.211247 | 141.093 | 7.096 | 6.785 3200 | 1.808214 | 508.045 | 15.741 | 13.243 6400 | 0.495635 | 1919.362 | 37.691 | 28.177 0001 + 0002 no prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 682.53091 | 0.388 | 0.35 | 0.35 200 | 469.906601 | 0.543 | 0.496 | 0.51 400 | 321.915349 | 0.78 | 0.721 | 0.752 800 | 201.620975 | 1.246 | 1.156 | 1.236 1600 | 94.438204 | 2.612 | 2.335 | 2.745 3200 | 38.292922 | 6.657 | 5.579 | 6.808 6400 | 21.48462 | 11.989 | 10.104 | 12.601 0001 + 0002 prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 591.10863 | 0.433 | 0.342 | 0.422 200 | 393.223638 | 0.625 | 0.522 | 0.614 400 | 253.672736 | 0.946 | 0.828 | 0.928 800 | 146.840262 | 1.615 | 1.448 | 1.604 1600 | 52.805593 | 4.656 | 3.811 | 4.473 3200 | 21.461606 | 11.48 | 9.56 | 10.661 6400 | 11.888232 | 22.869 | 16.841 | 18.871 0001 + 0002 + 0003 no prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 798.962029 | 0.304 | 0.267 | 0.339 200 | 577.893396 | 0.384 | 0.346 | 0.487 400 | 426.542177 | 0.472 | 0.435 | 0.717 800 | 288.616213 | 0.63 | 0.591 | 1.162 1600 | 154.247034 | 1.056 | 0.987 | 2.384 3200 | 59.711446 | 2.416 | 2.233 | 6.514 6400 | 37.109761 | 3.387 | 3.099 | 11.762 0001 + 0002 + 0003 prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+------------+----------------+----------------+---------------- 100 | 662.414805 | 0.357 | 0.265 | 0.421 200 | 494.478431 | 0.439 | 0.349 | 0.579 400 | 307.982089 | 0.651 | 0.558 | 0.927 800 | 191.360676 | 0.979 | 0.876 | 1.548 1600 | 75.344947 | 2.253 | 2.003 | 4.301 3200 | 30.643902 | 5.716 | 4.955 | 10.118 6400 | 16.726056 | 12.512 | 8.582 | 18.054 Although it may not be related to this, when measured with pg11 beta2, somehow the performance was better. 11beta2 + v1-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch[3] prepared part_num | tps_ex | update_latency | select_latency | insert_latency ----------+-------------+----------------+----------------+---------------- 100 | 756.07228 | 0.942 | 0.091 | 0.123 [1] https://www.postgresql.org/message-id/CAKJS1f_QN-nmf6jCQ4gRU_8ab0zrd0ms-U%3D_Dj0KUARJiuGpOA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAKJS1f9T_32Xpb-p8cWwo5ezSfVhXviUW8QTWncP8ksPHDRK8g%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz%3DGVBwvGh4a6xA%40mail.gmail.com regards, Sho Kato -----Original Message----- From: Amit Langote [mailto:Langote_Amit_f8@lab.ntt.co.jp] Sent: Wednesday, August 29, 2018 9:06 PM To: Pg Hackers <pgsql-hackers@postgresql.org> Subject: speeding up planning with partitions It is more or less well known that the planner doesn't perform well with more than a few hundred partitions even when onlya handful of partitions are ultimately included in the plan. Situation has improved a bit in PG 11 where we replaced the older method of pruning partitions one-by-one using constraint exclusion with a much faster methodthat finds relevant partitions by using partitioning metadata. However, we could only use it for SELECT queries, becauseUPDATE/DELETE are handled by a completely different code path, whose structure doesn't allow it to call the new pruningmodule's functionality. Actually, not being able to use the new pruning is not the only problem for UPDATE/DELETE,more on which further below. While situation improved with new pruning where it could, there are still overheads in the way planner handles partitions. As things stand today, it will spend cycles and allocate memory for partitions even before pruning is performed,meaning most of that effort could be for partitions that were better left untouched. Currently, planner will lock,heap_open *all* partitions, create range table entries and AppendRelInfos for them, and finally initialize RelOptInfos for them, eventouching the disk file of each partition in the process, in an earlier phase of planning. All of that processing isvain for partitions that are pruned, because they won't be included in the final plan. This problem grows worse as thenumber of partitions grows beyond thousands, because range table grows too big. That could be fixed by delaying all that per-partition activity to a point where pruning has already been performed, so thatwe know the partitions to open and create planning data structures for, such as somewhere downstream to query_planner. But before we can do that we must do something about the fact that UPDATE/DELETE won't be able to cope withthat because the code path that currently handles the planning of UPDATE/DELETE on partitioned tables (inheritance_plannercalled from subquery_planner) relies on AppendRelInfos for all partitions having been initialized by an earlier planning phase. Delayingit to query_planner would be too late, because inheritance_planner calls query_planner for each partition, not forthe parent. That is, if query_planner, which is downstream to inheritance_planner, was in the charge of determining whichpartitions to open, the latter wouldn't know which partitions to call the former for. :) That would be fixed if there is no longer this ordering dependency, which is what I propose to do with the attached patch0001. I've tried to describe how the patch manages to do that in its commit message, but I'll summarize here. As thingsstand today, inheritance_planner modifies the query for each leaf partition to make the partition act as the query'sresult relation instead of the original partitioned table and calls grouping_planner on the query. That means anythingthat's joined to partitioned table looks to instead be joined to the partition and join paths are generated likewise. Also, the resulting path's targetlist is adjusted to be suitable for the result partition. Upon studying how thisworks, I concluded that the same result can be achieved if we call grouping_planner only once and repeat the portionsof query_planner's and grouping_planner's processing that generate the join paths and appropriate target list, respectively,for each partition. That way, we can rely on query_planner determining result partitions for us, which in turnrelies on the faster partprune.c based method of pruning. That speeds things up in two ways. Faster pruning and weno longer repeat common processing for each partition. With 0001 in place, there is nothing that requires that partitions be opened by an earlier planning phase, so, I proposepatch 0002, which refactors the opening and creation of planner data structures for partitions such that it is nowperformed after pruning. However, it doesn't do anything about the fact that partitions are all still locked in the earlierphase. With various overheads gone thanks to 0001 and 0002, locking of all partitions via find_all_inheritos can be seen as thesingle largest bottleneck, which 0003 tries to address. I've kept it a separate patch, because I'll need to think a bitmore to say that it's actually to safe to defer locking to late planning, due mainly to the concern about the change inthe order of locking from the current method. I'm attaching it here, because I also want to show the performance improvementwe can expect with it. I measured the gain in performance due to each patch on a modest virtual machine. Details of the measurement and resultsfollow. * Benchmark scripts update.sql update ht set a = 0 where b = 1; select.sql select * from ht where b = 1; * Table: create table ht (a int, b int) partition by hash (b) create table ht_1 partition of ht for values with (modulus N, remainder0) .. create table ht_N partition of ht for values with (modulus N, remainder N-1) * Rounded tps with update.sql and select.sql against regular table (nparts = 0) and partitioned table with various partitioncounts: pgbench -n -T 60 -f update.sql nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2856 2893 2862 2816 8 507 1115 1447 1872 16 260 765 1173 1892 32 119 483 922 1884 64 59 282 615 1881 128 29 153 378 1835 256 14 79 210 1803 512 5 40 113 1728 1024 2 17 57 1616 2048 0* 9 30 1471 4096 0+ 4 15 1236 8192 0= 2 7 975 * 0.46 + 0.0064 = 0 (OOM on a virtual machine with 4GB RAM) As can be seen here, 0001 is a big help for update queries. pgbench -n -T 60 -f select.sql For a select query that doesn't contain join and needs to scan only one partition: nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2290 2329 2319 2268 8 1058 1077 1414 1788 16 711 729 1124 1789 32 450 475 879 1773 64 265 272 603 1765 128 146 149 371 1685 256 76 77 214 1678 512 39 39 112 1636 1024 16 17 59 1525 2048 8 9 29 1416 4096 4 4 15 1195 8192 2 2 7 932 Actually, here we get almost same numbers with 0001 as with master, because 0001 changes nothing for SELECT queries. Westart seeing improvement with 0002, the patch to delay opening partitions. Thanks, Amit
pgsql-hackers by date: