Thread: Generating partitioning tuple conversion maps faster
Hi, The code in convert_tuples_by_name_map() could be made a bit smarter to speed up the search for the column with the same name by first looking at the same attnum in the other relation rather than starting the search from the first attnum each time. I imagine most people are going to create partitions with columns in the same order as they are in the partitioned table. This will happen automatically if the CREATE TABLE .. PARTITION OF syntax is used, so it makes sense to exploit that knowledge. I've attached a patch which modifies convert_tuples_by_name_map() to have it just start the inner loop search in the same position as we are currently in the outer loop. Best case, this takes the function from being O(N^2) to O(N). It quite possible that the partition or, more likely, the partitioned table has had some columns dropped (likely this lives longer than a partition), in which case we don't want to miss the target column by 1, so I've coded the patch to count the non-dropped columns and start the search after having ignored dropped columns. This patch is just a few lines of code. I do think there's much more room to improve this whole area. For example, build child->parent map from the parent->child map instead of searching again later with the inputs reversed. Although, that's more than a handful of lines of code. The change I'm proposing here seems worthwhile to me. FWIW, something similar is done in make_inh_translation_list(), although I think my way might have a slightly better chance of not hitting the worst cases search. Benchmark: (Uses tables with 1000 columns, each with a name 11 chars in length.) Setup: select 'create table listp (' || string_agg('c' || left(md5(x::Text),10) || ' int',',') || ') partition by list (c' || left(md5('1'),10) || ');' from generate_Series(1,1000) x; \gexec create table listp0 partition of listp for values in(0); create table listp1 partition of listp for values in(1); select 'create table listpd (' || string_agg('c' || left(md5(x::Text),10) || ' int',',') || ') partition by list (c' || left(md5('1'),10) || ');' from generate_Series(0,1000) x; \gexec select 'alter table listpd drop column c' || left(md5(0::text),10) || ';'; \gexec create table listpd0 partition of listpd for values in(0); create table listpd1 partition of listpd for values in(1); select 'create table list (' || string_agg('c' || left(md5(x::Text),10) || ' int',',') || ');' from generate_Series(1,1000) x; \gexec \q psql -t -c "select 'insert into listp values(' || string_Agg('1',',') || ');' from generate_Series(1,1000) x;" postgres > insertp.sql psql -t -c "select 'insert into listpd values(' || string_Agg('1',',') || ');' from generate_Series(1,1000) x;" postgres > insertpd.sql psql -t -c "select 'insert into list values(' || string_Agg('1',',') || ');' from generate_Series(1,1000) x;" postgres > insert.sql psql -t -c "select 'update list set c' || left(md5('1'),10) || ' = (c' || left(md5('1'),10) || ' + 1) & 1;'" postgres > update.sql psql -t -c "select 'update listp set c' || left(md5('1'),10) || ' = (c' || left(md5('1'),10) || ' + 1) & 1;'" postgres > updatep.sql psql -t -c "select 'update listpd set c' || left(md5('1'),10) || ' = (c' || left(md5('1'),10) || ' + 1) & 1;'" postgres > updatepd.sql Tests: # Test 1: INSERT control test for non-partitioned table. pgbench -n -T 60 -f insert.sql postgres # Test 2: INSERT perfect match pgbench -n -T 60 -f insertp.sql postgres # Test 3: INSERT dropped first column from parent pgbench -n -T 60 -f insertpd.sql postgres # Test 4: UPDATE control test for non-partitioned table. psql -c "truncate list;" postgres psql -f insert.sql postgres pgbench -n -T 60 -f update.sql postgres # Test 5: UPDATE perfect match psql -c "truncate listp;" postgres psql -f insertp.sql postgres pgbench -n -T 60 -f updatep.sql postgres # Test 6: UPDATE dropped first column from parent psql -c "truncate listpd;" postgres psql -f insertpd.sql postgres pgbench -n -T 60 -f updatepd.sql postgres Results: AWS m5d (fsync off) Unpatched: Test 1: tps = 1055.527253 (excluding connections establishing) Test 2: tps = 355.308869 (excluding connections establishing) Test 3: tps = 361.465022 (excluding connections establishing) Test 4: tps = 1107.483236 (excluding connections establishing) Test 5: tps = 132.805775 (excluding connections establishing) Test 6: tps = 86.303093 (excluding connections establishing) Patched: Test 1: tps = 1055.831144 (excluding connections establishing) Test 2: tps = 1014.282634 (excluding connections establishing) Test 3: tps = 982.258274 (excluding connections establishing) Test 4: tps = 625.429778 (excluding connections establishing) Test 5: tps = 525.667826 (excluding connections establishing) Test 6: tps = 173.529296 (excluding connections establishing) I'd have expected Test 6 to do about 480-500tps. Adding debug to check why it's not revealed that the search is doing what's expected. I'm unsure why it has not improved more. Given the small size of this patch. I think it's a good candidate for the July 'fest. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jun 25, 2018 at 10:30 AM David Rowley <david.rowley@2ndquadrant.com> wrote: > [...] > Given the small size of this patch. I think it's a good candidate for > the July 'fest. > Just a different thought, how about having flag array something like needed_tuple_conv? While loading partdesc, we could calculate that the columns of partition table are aligned with the parent or not? Regards, Amul
On 2018/06/25 14:12, amul sul wrote: > On Mon, Jun 25, 2018 at 10:30 AM David Rowley > <david.rowley@2ndquadrant.com> wrote: >> > [...] >> Given the small size of this patch. I think it's a good candidate for >> the July 'fest. >> > > Just a different thought, how about having flag array something like > needed_tuple_conv? > > While loading partdesc, we could calculate that the columns of partition table > are aligned with the parent or not? Yeah maybe, but ISTM, that could be implemented *in addition to* the tweaks to convert_tuples_by_name_map that David's proposing, which seem helpful in their own right. Thanks, Amit
On Mon, Jun 25, 2018 at 10:57 AM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > On 2018/06/25 14:12, amul sul wrote: > > On Mon, Jun 25, 2018 at 10:30 AM David Rowley > > <david.rowley@2ndquadrant.com> wrote: > >> > > [...] > >> Given the small size of this patch. I think it's a good candidate for > >> the July 'fest. > >> > > > > Just a different thought, how about having flag array something like > > needed_tuple_conv? > > > > While loading partdesc, we could calculate that the columns of partition table > > are aligned with the parent or not? > > Yeah maybe, but ISTM, that could be implemented *in addition to* the > tweaks to convert_tuples_by_name_map that David's proposing, which seem > helpful in their own right. > Yes, I agree. Regards, Amul
On 06/25/2018 08:00 AM, David Rowley wrote: > I'd have expected Test 6 to do about 480-500tps. Adding debug to check > why it's not revealed that the search is doing what's expected. I'm > unsure why it has not improved more. I think I found one possible reason for this slowness. Your patch behaves as expected when there is a dropped output column, but does one extra comparison when there is a dropped input column. This backwards conversion is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent inner loop counter (see the attached patch). Still, fixing this doesn't improve the performance. According to perf report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about 0.5%, but they are comparing the same column names, even with the same alignment and relative offsets. I'm completely puzzled by this. As a side thought, we wouldn't have to go through this if we had a hash table that is easy to use, or perhaps string interning in catcache. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 29 June 2018 at 02:23, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > I think I found one possible reason for this slowness. Your patch behaves as > expected when there is a dropped output column, but does one extra > comparison when there is a dropped input column. This backwards conversion > is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent > inner loop counter (see the attached patch). It's just 2000 comparisons vs 1000. > Still, fixing this doesn't improve the performance. According to perf > report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about > 0.5%, but they are comparing the same column names, even with the same > alignment and relative offsets. I'm completely puzzled by this. On further inspection, the slowdown is coming from the planner in make_inh_translation_list(). The INSERT path does not hit that since the planner's job is pretty simple for simple INSERTs. > As a side thought, we wouldn't have to go through this if we had a hash > table that is easy to use, or perhaps string interning in catcache. Maybe it's better to try the direct lookup and fall back on SearchSysCacheAttName() if the same attnum does not have the same name. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 29 June 2018 at 11:10, David Rowley <david.rowley@2ndquadrant.com> wrote: > On further inspection, the slowdown is coming from the planner in > make_inh_translation_list(). The INSERT path does not hit that since > the planner's job is pretty simple for simple INSERTs. I've attached a patch that uses SearchSysCacheAttName to speed up these translations in the planner. This puts test 6 more at the level I'd have expected. Here are fresh benchmarks results taken with the attached, again on AWS m5d instance, though probably not the same one as before (fsync=off). Unpatched: Test 1 tps = 922.479156 (excluding connections establishing) Test 2 tps = 334.701555 (excluding connections establishing) Test 3 tps = 327.547316 (excluding connections establishing) Test 4 tps = 1198.924131 (excluding connections establishing) Test 5 tps = 125.130723 (excluding connections establishing) Test 6 tps = 81.511072 (excluding connections establishing) Patched Test 1 tps = 918.105382 (excluding connections establishing) Test 2 tps = 913.315387 (excluding connections establishing) Test 3 tps = 893.578988 (excluding connections establishing) Test 4 tps = 1213.238177 (excluding connections establishing) Test 5 tps = 459.022550 (excluding connections establishing) Test 6 tps = 416.835747 (excluding connections establishing) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 06/29/2018 03:25 AM, David Rowley wrote: > I've attached a patch that uses SearchSysCacheAttName to speed up > these translations in the planner. Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for updates). If you combine it with persistent inner loop index, it's probably going to be even faster, because it will only require one catalog access for each index shift. Now it looks like it goes to catalog for every column after the dropped one. What about convert_tuples_by_name_map, do you plan to switch it to catalog lookups as well? -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 30 June 2018 at 00:22, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > On 06/29/2018 03:25 AM, David Rowley wrote: >> I've attached a patch that uses SearchSysCacheAttName to speed up >> these translations in the planner. > > Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for > updates). If you combine it with persistent inner loop index, it's probably > going to be even faster, because it will only require one catalog access for > each index shift. Now it looks like it goes to catalog for every column > after the dropped one. Thanks for testing. > What about convert_tuples_by_name_map, do you plan to switch it to catalog > lookups as well? Syscache? I didn't really see an obvious way to get the relids down to the function. e.g the call through ExecEvalConvertRowtype() -> convert_tuples_by_name() does not have a Relation to work with, just a TupleDesc. I think further work might be diminishing returns. I think your idea to reduce the loops in test 6 from 2000 down to 1001 should be worth it. I'll try the idea out next week. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 30 June 2018 at 05:40, David Rowley <david.rowley@2ndquadrant.com> wrote: > I think your idea > to reduce the loops in test 6 from 2000 down to 1001 should be worth > it. I'll try the idea out next week. The attached changes things to use your way of picking up the search for the next column at the column after the last match was found. I don't think performance will have changed much from the last version, but here are the performance results (in tps) from my laptop this time. fsync=off. Test Unpatched Patched Gain Test 1 873.837 912.981 104.48% Test 2 213.004 895.268 420.31% Test 3 224.810 887.099 394.60% Test 4 1177.533 1107.767 94.08% Test 5 97.707 402.258 411.70% Test 6 72.025 360.702 500.80% Tests are all the same as the tests done in the initial email on this thread. Tests 1 and 4 are non-partitioned tests. Any variance there should just be noise. No code was changed in either of those tests. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 07/05/2018 02:52 PM, David Rowley wrote: > On 30 June 2018 at 05:40, David Rowley <david.rowley@2ndquadrant.com> wrote: >> I think your idea >> to reduce the loops in test 6 from 2000 down to 1001 should be worth >> it. I'll try the idea out next week. > The attached changes things to use your way of picking up the search > for the next column at the column after the last match was found. Great. I think we can use the same approach for make_inh_translation_list, as in the attached patch. It show some improvement on test 6. I got the following tps, median of 11 runs (forgot to turn off fsync though): test master v3 v4 1 414 416 408 2 259 409 404 3 263 400 405 4 417 416 404 5 118 311 305 6 85 280 303 -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 7 July 2018 at 01:08, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > Great. I think we can use the same approach for make_inh_translation_list, > as in the attached patch. It show some improvement on test 6. I got the > following tps, median of 11 runs (forgot to turn off fsync though): > > test master v3 v4 > 1 414 416 408 > 2 259 409 404 > 3 263 400 405 > 4 417 416 404 > 5 118 311 305 > 6 85 280 303 Nice. I think that's a good idea. Although, I didn't really like the use of the comma operator you added. I think since TupleDescAttr can't be NULL we can just do: (att = TupleDescAttr(new_tupdesc, new_attno))->attisdropped There's no shortage of other places that do TupleDescAttr(...)-> so I think that's perfectly fine. I also did a slight rewording of the comment above that new code. I've attached v5. Please feel free to add yourself as an author of this patch in the commitfest app. You've probably contributed about as much as I have to this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 07/09/2018 10:13 AM, David Rowley wrote: > I've attached v5. v5 looks good to me, I've changed the status to ready. > Please feel free to add yourself as an author of this patch in the > commitfest app. You've probably contributed about as much as I have to > this. Thanks, I'm fine with being credited as a reviewer. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 9 July 2018 at 23:28, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > On 07/09/2018 10:13 AM, David Rowley wrote: >> I've attached v5. > > v5 looks good to me, I've changed the status to ready. Many thanks for reviewing this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 09/07/18 22:58, David Rowley wrote: > On 9 July 2018 at 23:28, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: >> On 07/09/2018 10:13 AM, David Rowley wrote: >>> I've attached v5. >> >> v5 looks good to me, I've changed the status to ready. > > Many thanks for reviewing this. Pushed, thanks! - Heikki
On 14 July 2018 at 04:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Pushed, thanks! Thanks for pushing, and thanks again for reviewing it, Alexander. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, Any chance for a backport to PG 11? cherry-pick apply cleanly and with a 100 columns table it improves performance nicely (20%). Regards Didier On Sat, Jul 14, 2018 at 1:25 AM David Rowley <david.rowley@2ndquadrant.com> wrote: > > On 14 July 2018 at 04:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Pushed, thanks! > > Thanks for pushing, and thanks again for reviewing it, Alexander. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
On Mon, Jun 17, 2019 at 12:03:03PM +0200, didier wrote: > cherry-pick apply cleanly and with a 100 columns table it improves > performance nicely (20%). 42f70cd is a performance improvement, and not a bug fix. -- Michael