Thread: [HACKERS] Declarative partitioning optimization for large amount of partitions
[HACKERS] Declarative partitioning optimization for large amount of partitions
From
Aleksander Alekseev
Date:
Hello. I decided to figure out whether current implementation of declarative partitioning has any bottlenecks when there is a lot of partitions. Here is what I did [1]. ``` -- init schema \timing on CREATE TABLE part_test (pk int not null, k int, v varchar(128)) PARTITION BY RANGE(pk); do $$ declare i integer; begin for i in 1 .. 10000 loop raise notice 'i = %', i; execute ('CREATE TABLE part_test_' || i || ' PARTITION OF part_test FOR VALUES FROM (' || (1 + (i-1)*1000) || ') to (' || ( (i * 1000) + 1)|| ');' ); end loop; end $$; -- fill tables with some data do $$ declare i integer; begin for i in 1 .. 100*1000 loop raise notice 'i = %', i; execute ('insert into part_test values ( ceil(random()*(10000-1)*1000),ceil(random()*10000*1000), '''' || ceil(random()*10000*1000) );'); end loop; end $$; ``` Then: ``` # 2580 is some pk that exists echo 'select * from part_test where pk = 2580;' > t.sql pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax ``` `perf top` showed to bottlenecks [2]. A stacktrace for the first one looks like this [3]: ``` 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 1689 if (entry->t_id == rel_id) #0 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 #1 0x00000000007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at pgstat.c:1666 #2 0x00000000004c7090 in relation_open (relationId=25696, lockmode=0) at heapam.c:1137 #3 0x00000000004c72c9 in heap_open (relationId=25696, lockmode=0) at heapam.c:1291 (skipped) ``` And here is a stacktrace for the second bottleneck [4]: ``` 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 199 forboth(lo, rels_list, li, rel_numparents) #0 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 #1 0x000000000077fc9f in expand_inherited_rtentry (root=0x1badcb8, rte=0x1b630b8, rti=1) at prepunion.c:1408 #2 0x000000000077fb67 in expand_inherited_tables (root=0x1badcb8) at prepunion.c:1335 #3 0x0000000000767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0)at planner.c:568 (skipped) ``` The first one could be easily fixed by introducing a hash table (rel_id -> pgStatList entry). Perhaps hash table should be used only after some threshold. Unless there are any objections I will send a corresponding patch shortly. I didn't explored the second bottleneck closely yet but at first glance it doesn't look much more complicated. Please don't hesitate to share your thoughts regarding this matter. [1] http://afiskon.ru/s/e3/5f47af9102_benchmark.txt [2] http://afiskon.ru/s/00/2008c4ae66_temp.png [3] http://afiskon.ru/s/23/650f0afc89_stack.txt [4] http://afiskon.ru/s/03/a7e685a4db_stack2.txt -- Best regards, Aleksander Alekseev
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Amit Langote
Date:
Hi, On 2017/02/28 23:25, Aleksander Alekseev wrote: > Hello. > > I decided to figure out whether current implementation of declarative > partitioning has any bottlenecks when there is a lot of partitions. Here > is what I did [1]. Thanks for sharing. > Then: > > ``` > # 2580 is some pk that exists > echo 'select * from part_test where pk = 2580;' > t.sql > pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax > ``` > > `perf top` showed to bottlenecks [2]. A stacktrace for the first one > looks like this [3]: > > ``` > 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 > 1689 if (entry->t_id == rel_id) > #0 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 > #1 0x00000000007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at pgstat.c:1666 > #2 0x00000000004c7090 in relation_open (relationId=25696, lockmode=0) at heapam.c:1137 > #3 0x00000000004c72c9 in heap_open (relationId=25696, lockmode=0) at heapam.c:1291 > (skipped) > ``` > > And here is a stacktrace for the second bottleneck [4]: > > ``` > 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 > 199 forboth(lo, rels_list, li, rel_numparents) > #0 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 > #1 0x000000000077fc9f in expand_inherited_rtentry (root=0x1badcb8, rte=0x1b630b8, rti=1) at prepunion.c:1408 > #2 0x000000000077fb67 in expand_inherited_tables (root=0x1badcb8) at prepunion.c:1335 > #3 0x0000000000767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0)at planner.c:568 > (skipped) > ``` > > The first one could be easily fixed by introducing a hash table > (rel_id -> pgStatList entry). Perhaps hash table should be used only > after some threshold. Unless there are any objections I will send a > corresponding patch shortly. I have never thought about this one really. > I didn't explored the second bottleneck closely yet but at first glance > it doesn't look much more complicated. I don't know which way you're thinking of fixing this, but a planner patch to implement faster partition-pruning will have taken care of this, I think. As you may know, even declarative partitioned tables currently depend on constraint exclusion for partition-pruning and planner's current approach of handling inheritance requires to open all the child tables (partitions), whereas the new approach hopefully shouldn't need to do that. I am not sure if looking for a more localized fix for this would be worthwhile, although I may be wrong. Thanks, Amit
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Aleksander Alekseev
Date:
Hello. OK, here is a patch. Benchmark, before: ``` number of transactions actually processed: 1823 latency average = 1153.495 ms latency stddev = 154.366 ms tps = 6.061104 (including connections establishing) tps = 6.061211 (excluding connections establishing) ``` Benchmark, after: ``` number of transactions actually processed: 2462 latency average = 853.862 ms latency stddev = 112.270 ms tps = 8.191861 (including connections establishing) tps = 8.192028 (excluding connections establishing) ``` +35% TPS, just as expected. Feel free to run your own benchmarks on different datasets and workloads. `perf top` shows that first bottleneck is completely eliminated. I did nothing about the second bottleneck since as Amit mentioned partition-pruning should solve this anyway and apparently any micro-optimizations don't worth an effort. Also I checked test coverage using lcov to make sure that this code is test covered. An exact script I'm using could be found here [1]. [1] https://github.com/afiskon/pgscripts/blob/master/code-coverage.sh On Wed, Mar 01, 2017 at 10:36:24AM +0900, Amit Langote wrote: > Hi, > > On 2017/02/28 23:25, Aleksander Alekseev wrote: > > Hello. > > > > I decided to figure out whether current implementation of declarative > > partitioning has any bottlenecks when there is a lot of partitions. Here > > is what I did [1]. > > Thanks for sharing. > > > Then: > > > > ``` > > # 2580 is some pk that exists > > echo 'select * from part_test where pk = 2580;' > t.sql > > pgbench -j 7 -c 7 -f t.sql -P 1 -T 300 eax > > ``` > > > > `perf top` showed to bottlenecks [2]. A stacktrace for the first one > > looks like this [3]: > > > > ``` > > 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 > > 1689 if (entry->t_id == rel_id) > > #0 0x00000000007a42e2 in get_tabstat_entry (rel_id=25696, isshared=0 '\000') at pgstat.c:1689 > > #1 0x00000000007a4275 in pgstat_initstats (rel=0x7f4af3fd41f8) at pgstat.c:1666 > > #2 0x00000000004c7090 in relation_open (relationId=25696, lockmode=0) at heapam.c:1137 > > #3 0x00000000004c72c9 in heap_open (relationId=25696, lockmode=0) at heapam.c:1291 > > (skipped) > > ``` > > > > And here is a stacktrace for the second bottleneck [4]: > > > > ``` > > 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 > > 199 forboth(lo, rels_list, li, rel_numparents) > > #0 0x0000000000584fb1 in find_all_inheritors (parentrelId=16393, lockmode=1, numparents=0x0) at pg_inherits.c:199 > > #1 0x000000000077fc9f in expand_inherited_rtentry (root=0x1badcb8, rte=0x1b630b8, rti=1) at prepunion.c:1408 > > #2 0x000000000077fb67 in expand_inherited_tables (root=0x1badcb8) at prepunion.c:1335 > > #3 0x0000000000767526 in subquery_planner (glob=0x1b63cc0, parse=0x1b62fa0, parent_root=0x0, hasRecursion=0 '\000',tuple_fraction=0) at planner.c:568 > > (skipped) > > ``` > > > > The first one could be easily fixed by introducing a hash table > > (rel_id -> pgStatList entry). Perhaps hash table should be used only > > after some threshold. Unless there are any objections I will send a > > corresponding patch shortly. > > I have never thought about this one really. > > > I didn't explored the second bottleneck closely yet but at first glance > > it doesn't look much more complicated. > > I don't know which way you're thinking of fixing this, but a planner patch > to implement faster partition-pruning will have taken care of this, I > think. As you may know, even declarative partitioned tables currently > depend on constraint exclusion for partition-pruning and planner's current > approach of handling inheritance requires to open all the child tables > (partitions), whereas the new approach hopefully shouldn't need to do > that. I am not sure if looking for a more localized fix for this would be > worthwhile, although I may be wrong. > > Thanks, > Amit > > -- Best regards, Aleksander Alekseev
Attachment
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Andres Freund
Date:
Hi, This issue has bothered me in non-partitioned use-cases recently, so thanks for taking it up. On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote: > diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c > index 2fb9a8bf58..fa906e7930 100644 > --- a/src/backend/postmaster/pgstat.c > +++ b/src/backend/postmaster/pgstat.c > @@ -160,6 +160,16 @@ typedef struct TabStatusArray > > static TabStatusArray *pgStatTabList = NULL; Why are we keeping that list / the "batch" allocation pattern? It doesn't actually seem useful to me after these changes. Given that we don't shrink hash-tables, we could just use the hash-table memory for this, no? I think a separate list that only contains things that are "active" in the current transaction makes sense, because the current logic requires iterating over a potentially very large array at transaction commit. > +/* pgStatTabHash entry */ > +typedef struct TabStatHashEntry > +{ > + Oid t_id; > + PgStat_TableStatus* tsa_entry; > +} TabStatHashEntry; > + > +/* Hash table for faster t_id -> tsa_entry lookup */ > +static HTAB *pgStatTabHash = NULL; > + 'faster ... lookup' doesn't strike me as a useful comment, because it's only useful in relation of the current code to the new code. > /* > * Backends store per-function info that's waiting to be sent to the collector > * in this hash table (indexed by function OID). > @@ -815,7 +825,13 @@ pgstat_report_stat(bool force) > pgstat_send_tabstat(this_msg); > this_msg->m_nentries = 0; > } > + > + /* > + * Entry will be freed soon so we need to remove it from the lookup table. > + */ > + hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, NULL); > } It's not really freed... > static PgStat_TableStatus * > get_tabstat_entry(Oid rel_id, bool isshared) > { > + TabStatHashEntry* hash_entry; > PgStat_TableStatus *entry; > TabStatusArray *tsa; > - TabStatusArray *prev_tsa; > - int i; > + > + /* Try to find an entry */ > + entry = find_tabstat_entry(rel_id); > + if(entry != NULL) > + return entry; Arguably it'd be better to HASH_ENTER directly here, instead of doing two lookups. > /* > - * Search the already-used tabstat slots for this relation. > + * Entry doesn't exist - creating one. > + * First make sure there is a free space in a last element of pgStatTabList. > */ > - prev_tsa = NULL; > - for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next) > + if (!pgStatTabList) > { > - for (i = 0; i < tsa->tsa_used; i++) > - { > - entry = &tsa->tsa_entries[i]; > - if (entry->t_id == rel_id) > - return entry; > - } > + HASHCTL ctl; > > - if (tsa->tsa_used < TABSTAT_QUANTUM) > + Assert(!pgStatTabHash); > + > + memset(&ctl, 0, sizeof(ctl)); > + ctl.keysize = sizeof(Oid); > + ctl.entrysize = sizeof(TabStatHashEntry); > + ctl.hcxt = TopMemoryContext; > + > + pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup table", > + TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); Think it'd be better to move the hash creation into its own function. - Andres
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Amit Langote
Date:
Hi Aleksander, On 2017/03/07 0:22, Aleksander Alekseev wrote: > Hello. > > OK, here is a patch. > > Benchmark, before: > > ``` > number of transactions actually processed: 1823 > latency average = 1153.495 ms > latency stddev = 154.366 ms > tps = 6.061104 (including connections establishing) > tps = 6.061211 (excluding connections establishing) > ``` > > Benchmark, after: > > ``` > number of transactions actually processed: 2462 > latency average = 853.862 ms > latency stddev = 112.270 ms > tps = 8.191861 (including connections establishing) > tps = 8.192028 (excluding connections establishing) > ``` > > +35% TPS, just as expected. Feel free to run your own benchmarks on > different datasets and workloads. `perf top` shows that first bottleneck > is completely eliminated. That seems like a good gain. > I did nothing about the second bottleneck > since as Amit mentioned partition-pruning should solve this anyway and > apparently any micro-optimizations don't worth an effort. Sorry, I didn't mean to dissuade you from trying those micro-optimizations. If general inheritance cases can benefit from it (which, until we have a different method, will be used by partitioned tables as well), I think we should try it. Thanks, Amit
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Aleksander Alekseev
Date:
Hi, Andres Thanks a lot for the review! > Why are we keeping that list / the "batch" allocation pattern? It > doesn't actually seem useful to me after these changes. Given that we > don't shrink hash-tables, we could just use the hash-table memory for > this, no? I don't think we can do that. According to comments: ``` * NOTE: once allocated, TabStatusArray structures are never moved or deleted [...] * This allows relcache pgstat_info pointers to be treated as long-lived data, * avoiding repeated searches in pgstat_initstats() when a relation is * repeatedly opened during a transaction. ``` It is my understanding that dynahash can't give same guarantees. > 'faster ... lookup' doesn't strike me as a useful comment, because it's > only useful in relation of the current code to the new code. Agree, fixed to 'O(1) ... lookup'. > It's not really freed... Right, it's actually zeroed. Fixed. > Arguably it'd be better to HASH_ENTER directly here, instead of doing > two lookups. Good point. Fixed. > Think it'd be better to move the hash creation into its own function. Agree, fixed. On Mon, Mar 06, 2017 at 12:01:50PM -0800, Andres Freund wrote: > Hi, > > This issue has bothered me in non-partitioned use-cases recently, so > thanks for taking it up. > > > On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote: > > diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c > > index 2fb9a8bf58..fa906e7930 100644 > > --- a/src/backend/postmaster/pgstat.c > > +++ b/src/backend/postmaster/pgstat.c > > @@ -160,6 +160,16 @@ typedef struct TabStatusArray > > > > static TabStatusArray *pgStatTabList = NULL; > > Why are we keeping that list / the "batch" allocation pattern? It > doesn't actually seem useful to me after these changes. Given that we > don't shrink hash-tables, we could just use the hash-table memory for > this, no? > > I think a separate list that only contains things that are "active" in > the current transaction makes sense, because the current logic requires > iterating over a potentially very large array at transaction commit. > > > > +/* pgStatTabHash entry */ > > +typedef struct TabStatHashEntry > > +{ > > + Oid t_id; > > + PgStat_TableStatus* tsa_entry; > > +} TabStatHashEntry; > > + > > +/* Hash table for faster t_id -> tsa_entry lookup */ > > +static HTAB *pgStatTabHash = NULL; > > + > > 'faster ... lookup' doesn't strike me as a useful comment, because it's > only useful in relation of the current code to the new code. > > > > > /* > > * Backends store per-function info that's waiting to be sent to the collector > > * in this hash table (indexed by function OID). > > @@ -815,7 +825,13 @@ pgstat_report_stat(bool force) > > pgstat_send_tabstat(this_msg); > > this_msg->m_nentries = 0; > > } > > + > > + /* > > + * Entry will be freed soon so we need to remove it from the lookup table. > > + */ > > + hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, NULL); > > } > > It's not really freed... > > > > static PgStat_TableStatus * > > get_tabstat_entry(Oid rel_id, bool isshared) > > { > > + TabStatHashEntry* hash_entry; > > PgStat_TableStatus *entry; > > TabStatusArray *tsa; > > - TabStatusArray *prev_tsa; > > - int i; > > + > > + /* Try to find an entry */ > > + entry = find_tabstat_entry(rel_id); > > + if(entry != NULL) > > + return entry; > > Arguably it'd be better to HASH_ENTER directly here, instead of doing > two lookups. > > > > /* > > - * Search the already-used tabstat slots for this relation. > > + * Entry doesn't exist - creating one. > > + * First make sure there is a free space in a last element of pgStatTabList. > > */ > > - prev_tsa = NULL; > > - for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next) > > + if (!pgStatTabList) > > { > > - for (i = 0; i < tsa->tsa_used; i++) > > - { > > - entry = &tsa->tsa_entries[i]; > > - if (entry->t_id == rel_id) > > - return entry; > > - } > > + HASHCTL ctl; > > > > - if (tsa->tsa_used < TABSTAT_QUANTUM) > > + Assert(!pgStatTabHash); > > + > > + memset(&ctl, 0, sizeof(ctl)); > > + ctl.keysize = sizeof(Oid); > > + ctl.entrysize = sizeof(TabStatHashEntry); > > + ctl.hcxt = TopMemoryContext; > > + > > + pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup table", > > + TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT); > > Think it'd be better to move the hash creation into its own function. > > > - Andres -- Best regards, Aleksander Alekseev
Attachment
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Aleksander Alekseev
Date:
Hi Amit, > Sorry, I didn't mean to dissuade you from trying those > micro-optimizations. If general inheritance cases can benefit from it > (which, until we have a different method, will be used by partitioned > tables as well), I think we should try it. OK, I'll see what could be done here as well then. On Tue, Mar 07, 2017 at 10:55:12AM +0900, Amit Langote wrote: > Hi Aleksander, > > On 2017/03/07 0:22, Aleksander Alekseev wrote: > > Hello. > > > > OK, here is a patch. > > > > Benchmark, before: > > > > ``` > > number of transactions actually processed: 1823 > > latency average = 1153.495 ms > > latency stddev = 154.366 ms > > tps = 6.061104 (including connections establishing) > > tps = 6.061211 (excluding connections establishing) > > ``` > > > > Benchmark, after: > > > > ``` > > number of transactions actually processed: 2462 > > latency average = 853.862 ms > > latency stddev = 112.270 ms > > tps = 8.191861 (including connections establishing) > > tps = 8.192028 (excluding connections establishing) > > ``` > > > > +35% TPS, just as expected. Feel free to run your own benchmarks on > > different datasets and workloads. `perf top` shows that first bottleneck > > is completely eliminated. > > That seems like a good gain. > > > I did nothing about the second bottleneck > > since as Amit mentioned partition-pruning should solve this anyway and > > apparently any micro-optimizations don't worth an effort. > > Sorry, I didn't mean to dissuade you from trying those > micro-optimizations. If general inheritance cases can benefit from it > (which, until we have a different method, will be used by partitioned > tables as well), I think we should try it. > > Thanks, > Amit > > -- Best regards, Aleksander Alekseev
Hi Aleksander, noticed this in your patch: > * Add a corespinding entry to pgStatTabHash. "corresponding" Also a question: Some one-line comments are /* Comment. */ while others are /* * Comment. */ Why the difference? Hope this helps, Tels
Hi Aleksander, noticed this in your patch: > * Add a corespinding entry to pgStatTabHash. "corresponding" Also a question: Some one-line comments are /* Comment. */ while others are /* * Comment. */ Why the difference? Hope this helps, Tels PS: Sorry if this appears twice, I fumbled the From: ...
Re: [HACKERS] Declarative partitioning optimization for large amountof partitions
From
Aleksander Alekseev
Date:
Hi Tels, Thanks a lot for the review! > "corresponding" Fixed. > Also a question: Some one-line comments are > > /* Comment. */ > > while others are > > /* > * Comment. > */ > > Why the difference? I'm trying to follow a code stile used in a code I'm modifying. In this case I got an impression that second style of one-line comments is used more often an I tried to follow it but apparently I didn't succeed :) This is fixed as well. On Thu, Mar 09, 2017 at 06:40:39PM -0500, Tels wrote: > Hi Aleksander, > > noticed this in your patch: > > > * Add a corespinding entry to pgStatTabHash. > > "corresponding" > > Also a question: Some one-line comments are > > /* Comment. */ > > while others are > > /* > * Comment. > */ > > Why the difference? > > Hope this helps, > > Tels -- Best regards, Aleksander Alekseev
Attachment
[HACKERS] Re: Declarative partitioning optimization for large amount ofpartitions
From
David Steele
Date:
On 3/10/17 7:16 AM, Aleksander Alekseev wrote: > > Thanks a lot for the review! Anyone want to take a crack at reviewing this new version? Thanks, -- -David david@pgmasters.net
[HACKERS] Re: Declarative partitioning optimization for large amount ofpartitions
From
Anastasia Lubennikova
Date:
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: not tested The patch looks good to me. It applies clearly, passes all the tests and eliminates the bottleneck described in the first message. And as I can see from the thread, there were no objections from others, except a few minor comments about code style, which are fixed in the last version of the patch. So I mark it "Ready for committer". The new status of this patch is: Ready for Committer
Re: Re: Declarative partitioning optimization for largeamount of partitions
From
Aleksander Alekseev
Date:
Hi Anastasia, Thanks a lot for a review! As was mentioned above there is also a bottleneck in find_all_inheritors procedure. Turned out the problem there is similar and it could be easily fixed as well. Corresponding patch is attached to this message. To keep things in order I'm attaching the previous patch as well. On Wed, Mar 22, 2017 at 11:53:45AM +0000, Anastasia Lubennikova wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: not tested > > The patch looks good to me. > It applies clearly, passes all the tests and eliminates the bottleneck described in the first message. > And as I can see from the thread, there were no objections from others, > except a few minor comments about code style, which are fixed in the last version of the patch. > So I mark it "Ready for committer". > > The new status of this patch is: Ready for Committer > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Best regards, Aleksander Alekseev
Attachment
Re: Re: Declarative partitioning optimization for largeamount of partitions
From
Aleksander Alekseev
Date:
Also I would like to share some benchmark results. Before applying any patches (copied from one of previous messages): ``` latency average = 1153.495 ms latency stddev = 154.366 ms tps = 6.061104 (including connections establishing) tps = 6.061211 (excluding connections establishing) ``` After applying first patch (copied as well): ``` latency average = 853.862 ms latency stddev = 112.270 ms tps = 8.191861 (including connections establishing) tps = 8.192028 (excluding connections establishing) ``` After applying both patches: ``` latency average = 533.646 ms latency stddev = 86.311 ms tps = 13.110328 (including connections establishing) tps = 13.110608 (excluding connections establishing) ``` Small amount of partitions (2 to be exact), no patches: ``` latency average = 0.928 ms latency stddev = 0.296 ms tps = 7535.224897 (including connections establishing) tps = 7536.145457 (excluding connections establishing) ``` Small amount of partitions, bot patches applied: ``` latency average = 0.915 ms latency stddev = 0.269 ms tps = 7638.344922 (including connections establishing) tps = 7639.164769 (excluding connections establishing) ``` As you can see these patches don't make things worse in any regard. Perf shows that both bottlenecks are gone. Before [1] and after [2]. [1] http://afiskon.ru/s/00/2008c4ae66_temp.png [2] http://afiskon.ru/s/a5/fd81628a3a_temp.png On Fri, Mar 24, 2017 at 03:17:44PM +0300, Aleksander Alekseev wrote: > Hi Anastasia, > > Thanks a lot for a review! > > As was mentioned above there is also a bottleneck in find_all_inheritors > procedure. Turned out the problem there is similar and it could be easily > fixed as well. Corresponding patch is attached to this message. To keep > things in order I'm attaching the previous patch as well. -- Best regards, Aleksander Alekseev
On 1 March 2017 at 01:36, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I don't know which way you're thinking of fixing this, but a planner patch > to implement faster partition-pruning will have taken care of this, I > think. As you may know, even declarative partitioned tables currently > depend on constraint exclusion for partition-pruning and planner's current > approach of handling inheritance requires to open all the child tables > (partitions), whereas the new approach hopefully shouldn't need to do > that. I am not sure if looking for a more localized fix for this would be > worthwhile, although I may be wrong. What "new approach" are we discussing? Is there a patch or design discussion? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Declarative partitioning optimization for large amountof partitions
From
Aleksander Alekseev
Date:
Hi Simon, > > I don't know which way you're thinking of fixing this, but a planner patch > > to implement faster partition-pruning will have taken care of this, I > > think. As you may know, even declarative partitioned tables currently > > depend on constraint exclusion for partition-pruning and planner's current > > approach of handling inheritance requires to open all the child tables > > (partitions), whereas the new approach hopefully shouldn't need to do > > that. I am not sure if looking for a more localized fix for this would be > > worthwhile, although I may be wrong. > > What "new approach" are we discussing? > Is there a patch or design discussion? I think what was meant was plans of my colleague Dmitry Ivanov to implement partition-pruning. I've just spoke with Dmitry about this matter. Unless there is anyone else who is already working on this optimization we would like to work on it together. Unfortunately there is no patch or design discussion of partition-pruning on this commitfest. -- Best regards, Aleksander Alekseev
Re: Re: Declarative partitioning optimization for largeamount of partitions
From
Teodor Sigaev
Date:
> things in order I'm attaching the previous patch as well. Patches look good, but I have some notices: 1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never used for read, so entry for hash could be just a pointer to PgStat_TableStatus. 2 step1 In pgstat_report_stat() you remove one by one entries from hash and remove them all. Isn't it better to hash_destroy/hash_create or even let hash lives in separate memory context and just resets it? 3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash although they will be free from point of view of pgStatTabList. 4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't initialized anywhere. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Re: Re: Declarative partitioning optimization for largeamount of partitions
From
Teodor Sigaev
Date:
Sorry, 1) and 4) is my fault, comment in hsearch.h: * ... The hash key * is expected to be at the start of the caller's hash entry data structure. Ops, forgot that. Teodor Sigaev wrote: >> things in order I'm attaching the previous patch as well. > > Patches look good, but I have some notices: > > 1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never > used for read, so entry for hash could be just a pointer to PgStat_TableStatus. > > 2 step1 In pgstat_report_stat() you remove one by one entries from hash and > remove them all. Isn't it better to hash_destroy/hash_create or even let hash > lives in separate memory context and just resets it? > > 3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash > although they will be free from point of view of pgStatTabList. > > > 4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't > initialized anywhere. > > > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Re: Re: Declarative partitioning optimization for largeamount of partitions
From
Aleksander Alekseev
Date:
Hi Teodor, Thanks a lot for a review! > > step1 In pgstat_report_stat() you remove one by one entries from hash and > > remove them all. Isn't it better to hash_destroy/hash_create or even let hash > > lives in separate memory context and just resets it? Agree, fixed. > > step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash > > although they will be free from point of view of pgStatTabList. Good point! Fixed. -- Best regards, Aleksander Alekseev
Attachment
On Fri, Mar 24, 2017 at 11:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 1 March 2017 at 01:36, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> I don't know which way you're thinking of fixing this, but a planner patch >> to implement faster partition-pruning will have taken care of this, I >> think. As you may know, even declarative partitioned tables currently >> depend on constraint exclusion for partition-pruning and planner's current >> approach of handling inheritance requires to open all the child tables >> (partitions), whereas the new approach hopefully shouldn't need to do >> that. I am not sure if looking for a more localized fix for this would be >> worthwhile, although I may be wrong. > > What "new approach" are we discussing? > > Is there a patch or design discussion? Neither at the moment. As Aleksander said in his reply I was referring to Dmitry Ivanov's plan of porting pg_pathman's planner functionality to core that he mentioned on the declarative partitioning thread back in December [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/426b2b01-61e0-43aa-bd84-c6fcf516f1c3%40postgrespro.ru