Thread: Generating partitioning tuple conversion maps faster

Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

25 June 2018, 08:00:36

Hi,

The code in convert_tuples_by_name_map() could be made a bit smarter
to speed up the search for the column with the same name by first
looking at the same attnum in the other relation rather than starting
the search from the first attnum each time.

I imagine most people are going to create partitions with columns in
the same order as they are in the partitioned table.  This will happen
automatically if the CREATE TABLE .. PARTITION OF syntax is used, so
it makes sense to exploit that knowledge.

I've attached a patch which modifies convert_tuples_by_name_map() to
have it just start the inner loop search in the same position as we
are currently in the outer loop.  Best case, this takes the function
from being O(N^2) to O(N).  It quite possible that the partition or,
more likely, the partitioned table has had some columns dropped
(likely this lives longer than a partition), in which case we don't
want to miss the target column by 1, so I've coded the patch to count
the non-dropped columns and start the search after having ignored
dropped columns.

This patch is just a few lines of code.  I do think there's much more
room to improve this whole area. For example, build child->parent map
from the parent->child map instead of searching again later with the
inputs reversed. Although, that's more than a handful of lines of
code. The change I'm proposing here seems worthwhile to me. FWIW,
something similar is done in make_inh_translation_list(), although I
think my way might have a slightly better chance of not hitting the
worst cases search.

Benchmark: (Uses tables with 1000 columns, each with a name 11 chars in length.)


Setup:
select 'create table listp (' || string_agg('c' ||
left(md5(x::Text),10) || ' int',',') || ') partition by list (c' ||
left(md5('1'),10) || ');' from generate_Series(1,1000) x;
\gexec
create table listp0 partition of listp for values in(0);
create table listp1 partition of listp for values in(1);

select 'create table listpd (' || string_agg('c' ||
left(md5(x::Text),10) || ' int',',') || ') partition by list (c' ||
left(md5('1'),10) || ');' from generate_Series(0,1000) x;
\gexec
select 'alter table listpd drop column c' || left(md5(0::text),10) || ';';
\gexec
create table listpd0 partition of listpd for values in(0);
create table listpd1 partition of listpd for values in(1);

select 'create table list (' || string_agg('c' ||
left(md5(x::Text),10) || ' int',',') || ');' from
generate_Series(1,1000) x;
\gexec

\q
psql -t -c "select 'insert into listp values(' || string_Agg('1',',')
|| ');' from generate_Series(1,1000) x;" postgres > insertp.sql
psql -t -c "select 'insert into listpd values(' || string_Agg('1',',')
|| ');' from generate_Series(1,1000) x;" postgres > insertpd.sql
psql -t -c "select 'insert into list values(' || string_Agg('1',',')
|| ');' from generate_Series(1,1000) x;" postgres > insert.sql
psql -t -c "select 'update list set c' || left(md5('1'),10) || ' = (c'
|| left(md5('1'),10) || ' + 1) & 1;'" postgres > update.sql
psql -t -c "select 'update listp set c' || left(md5('1'),10) || ' =
(c' || left(md5('1'),10) || ' + 1) & 1;'" postgres > updatep.sql
psql -t -c "select 'update listpd set c' || left(md5('1'),10) || ' =
(c' || left(md5('1'),10) || ' + 1) & 1;'" postgres > updatepd.sql


Tests:

# Test 1: INSERT control test for non-partitioned table.
pgbench -n -T 60 -f insert.sql postgres

# Test 2: INSERT perfect match
pgbench -n -T 60 -f insertp.sql postgres

# Test 3: INSERT dropped first column from parent
pgbench -n -T 60 -f insertpd.sql postgres


# Test 4: UPDATE control test for non-partitioned table.
psql -c "truncate list;" postgres
psql -f insert.sql postgres
pgbench -n -T 60 -f update.sql postgres

# Test 5: UPDATE perfect match
psql -c "truncate listp;" postgres
psql -f insertp.sql postgres
pgbench -n -T 60 -f updatep.sql postgres

# Test 6: UPDATE dropped first column from parent
psql -c "truncate listpd;" postgres
psql -f insertpd.sql postgres
pgbench -n -T 60 -f updatepd.sql postgres


Results:

AWS m5d (fsync off)

Unpatched:

Test 1:
tps = 1055.527253 (excluding connections establishing)

Test 2:
tps = 355.308869 (excluding connections establishing)

Test 3:
tps = 361.465022 (excluding connections establishing)

Test 4:
tps = 1107.483236 (excluding connections establishing)

Test 5:
tps = 132.805775 (excluding connections establishing)

Test 6:
tps = 86.303093 (excluding connections establishing)


Patched:

Test 1:
tps = 1055.831144 (excluding connections establishing)

Test 2:
tps = 1014.282634 (excluding connections establishing)

Test 3:
tps = 982.258274 (excluding connections establishing)

Test 4:
tps = 625.429778 (excluding connections establishing)

Test 5:
tps = 525.667826 (excluding connections establishing)

Test 6:
tps = 173.529296 (excluding connections establishing)

I'd have expected Test 6 to do about 480-500tps. Adding debug to check
why it's not revealed that the search is doing what's expected. I'm
unsure why it has not improved more.

Given the small size of this patch. I think it's a good candidate for
the July 'fest.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v1-0001-Improve-best-case-speed-of-tuple-conversion-map-g.patch

Re: Generating partitioning tuple conversion maps faster

From

amul sul

Date:

25 June 2018, 08:12:34

On Mon, Jun 25, 2018 at 10:30 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
[...]
> Given the small size of this patch. I think it's a good candidate for
> the July 'fest.
>

Just a different thought, how about having flag array something like
needed_tuple_conv?

While loading partdesc, we could calculate that the columns of partition table
are aligned with the parent or not?

Regards,
Amul

Re: Generating partitioning tuple conversion maps faster

From

Amit Langote

Date:

25 June 2018, 08:26:53

On 2018/06/25 14:12, amul sul wrote:
> On Mon, Jun 25, 2018 at 10:30 AM David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>>
> [...]
>> Given the small size of this patch. I think it's a good candidate for
>> the July 'fest.
>>
> 
> Just a different thought, how about having flag array something like
> needed_tuple_conv?
> 
> While loading partdesc, we could calculate that the columns of partition table
> are aligned with the parent or not?

Yeah maybe, but ISTM, that could be implemented *in addition to* the
tweaks to convert_tuples_by_name_map that David's proposing, which seem
helpful in their own right.

Thanks,
Amit

Re: Generating partitioning tuple conversion maps faster

From

amul sul

Date:

25 June 2018, 08:36:41

On Mon, Jun 25, 2018 at 10:57 AM Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> On 2018/06/25 14:12, amul sul wrote:
> > On Mon, Jun 25, 2018 at 10:30 AM David Rowley
> > <david.rowley@2ndquadrant.com> wrote:
> >>
> > [...]
> >> Given the small size of this patch. I think it's a good candidate for
> >> the July 'fest.
> >>
> >
> > Just a different thought, how about having flag array something like
> > needed_tuple_conv?
> >
> > While loading partdesc, we could calculate that the columns of partition table
> > are aligned with the parent or not?
>
> Yeah maybe, but ISTM, that could be implemented *in addition to* the
> tweaks to convert_tuples_by_name_map that David's proposing, which seem
> helpful in their own right.
>
Yes, I agree.

Regards,
Amul

Re: Generating partitioning tuple conversion maps faster

From

Alexander Kuzmenkov

Date:

28 June 2018, 17:23:25

On 06/25/2018 08:00 AM, David Rowley wrote:
> I'd have expected Test 6 to do about 480-500tps. Adding debug to check
> why it's not revealed that the search is doing what's expected. I'm
> unsure why it has not improved more.

I think I found one possible reason for this slowness. Your patch 
behaves as expected when there is a dropped output column, but does one 
extra comparison when there is a dropped input column. This backwards 
conversion is called from ExecInitRoutingInfo. To fix this, I'd just 
keep a persistent inner loop counter (see the attached patch).

Still, fixing this doesn't improve the performance. According to perf 
report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about 
0.5%, but they are comparing the same column names, even with the same 
alignment and relative offsets. I'm completely puzzled by this.

As a side thought, we wouldn't have to go through this if we had a hash 
table that is easy to use, or perhaps string interning in catcache.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

persistent-inner.patch

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

29 June 2018, 02:10:53

On 29 June 2018 at 02:23, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
> I think I found one possible reason for this slowness. Your patch behaves as
> expected when there is a dropped output column, but does one extra
> comparison when there is a dropped input column. This backwards conversion
> is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent
> inner loop counter (see the attached patch).

It's just 2000 comparisons vs 1000.

> Still, fixing this doesn't improve the performance. According to perf
> report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about
> 0.5%, but they are comparing the same column names, even with the same
> alignment and relative offsets. I'm completely puzzled by this.

On further inspection, the slowdown is coming from the planner in
make_inh_translation_list(). The INSERT path does not hit that since
the planner's job is pretty simple for simple INSERTs.

> As a side thought, we wouldn't have to go through this if we had a hash
> table that is easy to use, or perhaps string interning in catcache.

Maybe it's better to try the direct lookup and fall back on
SearchSysCacheAttName() if the same attnum does not have the same
name.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

29 June 2018, 03:25:35

On 29 June 2018 at 11:10, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On further inspection, the slowdown is coming from the planner in
> make_inh_translation_list(). The INSERT path does not hit that since
> the planner's job is pretty simple for simple INSERTs.

I've attached a patch that uses SearchSysCacheAttName to speed up
these translations in the planner.

This puts test 6 more at the level I'd have expected.

Here are fresh benchmarks results taken with the attached, again on
AWS m5d instance, though probably not the same one as before
(fsync=off).


Unpatched:

Test 1
tps = 922.479156 (excluding connections establishing)

Test 2
tps = 334.701555 (excluding connections establishing)

Test 3
tps = 327.547316 (excluding connections establishing)

Test 4
tps = 1198.924131 (excluding connections establishing)

Test 5
tps = 125.130723 (excluding connections establishing)

Test 6
tps = 81.511072 (excluding connections establishing)

Patched

Test 1
tps = 918.105382 (excluding connections establishing)

Test 2
tps = 913.315387 (excluding connections establishing)

Test 3
tps = 893.578988 (excluding connections establishing)

Test 4
tps = 1213.238177 (excluding connections establishing)

Test 5
tps = 459.022550 (excluding connections establishing)

Test 6
tps = 416.835747 (excluding connections establishing)


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v2_speedup_building_tuple_conversion_maps.patch

Re: Generating partitioning tuple conversion maps faster

From

Alexander Kuzmenkov

Date:

29 June 2018, 15:22:04

On 06/29/2018 03:25 AM, David Rowley wrote:
> I've attached a patch that uses SearchSysCacheAttName to speed up
> these translations in the planner.

Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain 
(for updates). If you combine it with persistent inner loop index, it's 
probably going to be even faster, because it will only require one 
catalog access for each index shift. Now it looks like it goes to 
catalog for every column after the dropped one.

What about convert_tuples_by_name_map, do you plan to switch it to 
catalog lookups as well?

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

29 June 2018, 20:40:09

On 30 June 2018 at 00:22, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
> On 06/29/2018 03:25 AM, David Rowley wrote:
>> I've attached a patch that uses SearchSysCacheAttName to speed up
>> these translations in the planner.
>
> Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for
> updates). If you combine it with persistent inner loop index, it's probably
> going to be even faster, because it will only require one catalog access for
> each index shift. Now it looks like it goes to catalog for every column
> after the dropped one.

Thanks for testing.

> What about convert_tuples_by_name_map, do you plan to switch it to catalog
> lookups as well?

Syscache? I didn't really see an obvious way to get the relids down to
the function. e.g the call through ExecEvalConvertRowtype() ->
convert_tuples_by_name() does not have a Relation to work with, just a
TupleDesc.

I think further work might be diminishing returns. I think your idea
to reduce the loops in test 6 from 2000 down to 1001 should be worth
it. I'll try the idea out next week.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

05 July 2018, 14:52:35

On 30 June 2018 at 05:40, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I think your idea
> to reduce the loops in test 6 from 2000 down to 1001 should be worth
> it. I'll try the idea out next week.

The attached changes things to use your way of picking up the search
for the next column at the column after the last match was found.

I don't think performance will have changed much from the last
version, but here are the performance results (in tps) from my laptop
this time. fsync=off.

Test Unpatched Patched Gain
Test 1 873.837 912.981 104.48%
Test 2 213.004 895.268 420.31%
Test 3 224.810 887.099 394.60%
Test 4 1177.533 1107.767 94.08%
Test 5 97.707 402.258 411.70%
Test 6 72.025 360.702 500.80%

Tests are all the same as the tests done in the initial email on this thread.

Tests 1 and 4 are non-partitioned tests. Any variance there should
just be noise. No code was changed in either of those tests.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v3-0001-Improve-performance-of-tuple-conversion-map-gener.patch

Re: Generating partitioning tuple conversion maps faster

From

Alexander Kuzmenkov

Date:

06 July 2018, 16:08:21

On 07/05/2018 02:52 PM, David Rowley wrote:
> On 30 June 2018 at 05:40, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I think your idea
>> to reduce the loops in test 6 from 2000 down to 1001 should be worth
>> it. I'll try the idea out next week.
> The attached changes things to use your way of picking up the search
> for the next column at the column after the last match was found.

Great. I think we can use the same approach for 
make_inh_translation_list, as in the attached patch. It show some 
improvement on test 6. I got the following tps, median of 11 runs 
(forgot to turn off fsync though):

test  master    v3     v4
1      414     416     408
2      259     409     404
3      263     400     405
4      417     416     404
5      118     311     305
6      85      280     303

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v4-0001-Improve-performance-of-tuple-conversion-map-generati.patch

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

09 July 2018, 10:13:25

On 7 July 2018 at 01:08, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
> Great. I think we can use the same approach for make_inh_translation_list,
> as in the attached patch. It show some improvement on test 6. I got the
> following tps, median of 11 runs (forgot to turn off fsync though):
>
> test  master    v3     v4
> 1      414     416     408
> 2      259     409     404
> 3      263     400     405
> 4      417     416     404
> 5      118     311     305
> 6      85      280     303

Nice. I think that's a good idea.  Although, I didn't really like the
use of the comma operator you added. I think since TupleDescAttr can't
be NULL we can just do:

(att = TupleDescAttr(new_tupdesc, new_attno))->attisdropped

There's no shortage of other places that do TupleDescAttr(...)-> so I
think that's perfectly fine.

I also did a slight rewording of the comment above that new code. I've
attached v5.

Please feel free to add yourself as an author of this patch in the
commitfest app. You've probably contributed about as much as I have to
this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

v5-0001-Improve-performance-of-tuple-conversion-map-gener.patch

Re: Generating partitioning tuple conversion maps faster

From

Alexander Kuzmenkov

Date:

09 July 2018, 14:28:46

On 07/09/2018 10:13 AM, David Rowley wrote:
> I've attached v5.

v5 looks good to me, I've changed the status to ready.

> Please feel free to add yourself as an author of this patch in the
> commitfest app. You've probably contributed about as much as I have to
> this.

Thanks, I'm fine with being credited as a reviewer.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

09 July 2018, 22:58:58

On 9 July 2018 at 23:28, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
> On 07/09/2018 10:13 AM, David Rowley wrote:
>> I've attached v5.
>
> v5 looks good to me, I've changed the status to ready.

Many thanks for reviewing this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Generating partitioning tuple conversion maps faster

From

Heikki Linnakangas

Date:

13 July 2018, 19:57:34

On 09/07/18 22:58, David Rowley wrote:
> On 9 July 2018 at 23:28, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
>> On 07/09/2018 10:13 AM, David Rowley wrote:
>>> I've attached v5.
>>
>> v5 looks good to me, I've changed the status to ready.
> 
> Many thanks for reviewing this.

Pushed, thanks!

- Heikki

Re: Generating partitioning tuple conversion maps faster

From

David Rowley

Date:

14 July 2018, 02:25:25

On 14 July 2018 at 04:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Pushed, thanks!

Thanks for pushing, and thanks again for reviewing it, Alexander.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Generating partitioning tuple conversion maps faster

From

didier

Date:

17 June 2019, 10:03:03

Hi,
Any chance for a backport to PG 11?

cherry-pick apply cleanly and with a 100 columns table it improves
performance nicely (20%).

Regards
Didier


On Sat, Jul 14, 2018 at 1:25 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> On 14 July 2018 at 04:57, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Pushed, thanks!
>
> Thanks for pushing, and thanks again for reviewing it, Alexander.
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: Generating partitioning tuple conversion maps faster

From

Michael Paquier

Date:

18 June 2019, 08:11:47

On Mon, Jun 17, 2019 at 12:03:03PM +0200, didier wrote:
> cherry-pick apply cleanly and with a 100 columns table it improves
> performance nicely (20%).

42f70cd is a performance improvement, and not a bug fix.
--
Michael

Attachment

signature.asc