Thread: Benchmarking partitioning triggers and rules

Benchmarking partitioning triggers and rules

From
Tim Uckun
Date:
I wrote a quick benchmarking script to test various partitioning strategies. The code is here.

https://gist.github.com/timuckun/954ab6bdce36fa14bc1c

I was astonished at the variability of the timings between the different variations. 

The test data contained 270K records.  I did a simple insert into without any triggers, with three different trigger variations and with a rule.  The results were like this

clean  0.000000   0.000000   0.000000 (  3.119498)
func_1  0.000000   0.000000   0.000000 (  7.435094)
func_2  0.000000   0.000000   0.000000 ( 28.427617)
func_3  0.000000   0.000000   0.000000 ( 18.348554)
Rule   0.000000   0.000000   0.000000 (  2.901931)

A clean insert 3.1 seconds,  putting a rule took less time!
 
A simple insert into table_name values (NEW.*) doubled the time it takes to insert the records.  Using an EXECUTE with an interpolated table name took TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third off the execution time WTF???

This has left me both baffled and curious.  If changing little things like this makes a huge difference what else can I do to make this even faster?

Would using a different language help? Is Javasscript, Python or Perl faster?  Is there some other syntax I can use?  I tried this

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('  || NEW.* || ')' but that gave me an error.


Re: Benchmarking partitioning triggers and rules

From
Tim Uckun
Date:
Hey I hate to bump my post but I would really appreciate some input on this benchmark. I am very alarmed that adding a very simple partitioning trigger slows the insert speed by an order of magnitude.  Is there any way to speed this up?

On Mon, Mar 9, 2015 at 11:44 AM, Tim Uckun <timuckun@gmail.com> wrote:
I wrote a quick benchmarking script to test various partitioning strategies. The code is here.

https://gist.github.com/timuckun/954ab6bdce36fa14bc1c

I was astonished at the variability of the timings between the different variations. 

The test data contained 270K records.  I did a simple insert into without any triggers, with three different trigger variations and with a rule.  The results were like this

clean  0.000000   0.000000   0.000000 (  3.119498)
func_1  0.000000   0.000000   0.000000 (  7.435094)
func_2  0.000000   0.000000   0.000000 ( 28.427617)
func_3  0.000000   0.000000   0.000000 ( 18.348554)
Rule   0.000000   0.000000   0.000000 (  2.901931)

A clean insert 3.1 seconds,  putting a rule took less time!
 
A simple insert into table_name values (NEW.*) doubled the time it takes to insert the records.  Using an EXECUTE with an interpolated table name took TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third off the execution time WTF???

This has left me both baffled and curious.  If changing little things like this makes a huge difference what else can I do to make this even faster?

Would using a different language help? Is Javasscript, Python or Perl faster?  Is there some other syntax I can use?  I tried this

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('  || NEW.* || ')' but that gave me an error.



Re: Benchmarking partitioning triggers and rules

From
Tomas Vondra
Date:
On 11.3.2015 21:43, Tim Uckun wrote:
> Hey I hate to bump my post but I would really appreciate some input
> on this benchmark. I am very alarmed that adding a very simple
> partitioning trigger slows the insert speed by an order of magnitude.
> Is there any way to speed this up?

I think to actually give you a meaningful response, we really need more
details about your configuration - what PostgreSQL version are you
using, what configuration have you changed from the defaults and such.

Anyway, you're right that triggers are not cheap. The numbers I get on
the development version with a bit of tuning look like this:

INSERT (direct)    1.5 sec
INSERT             4.0 sec
INSERT (EXECUTE)  11.5 sec

So it's ~ what you've measured. Rules have the lowest overhead, but also
there's a lot of tricky places.

There's not much you can do, except for inserting the data directly into
the right partition (without any triggers).


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Benchmarking partitioning triggers and rules

From
Tim Uckun
Date:
I am using postgres 9.4, the default install with "brew install postgres, no tuning at all.  BTW if I use postgres.app application the benchmarks run twice as slow!

Why do you think there is such dramatic difference between

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT ($1).*' USING NEW ;

and

 EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES( ($1).*)' USING NEW ;

One is thirty percent faster than the other.  Also is there an even better way that I don't know about.


On Thu, Mar 12, 2015 at 10:42 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 11.3.2015 21:43, Tim Uckun wrote:
> Hey I hate to bump my post but I would really appreciate some input
> on this benchmark. I am very alarmed that adding a very simple
> partitioning trigger slows the insert speed by an order of magnitude.
> Is there any way to speed this up?

I think to actually give you a meaningful response, we really need more
details about your configuration - what PostgreSQL version are you
using, what configuration have you changed from the defaults and such.

Anyway, you're right that triggers are not cheap. The numbers I get on
the development version with a bit of tuning look like this:

INSERT (direct)    1.5 sec
INSERT             4.0 sec
INSERT (EXECUTE)  11.5 sec

So it's ~ what you've measured. Rules have the lowest overhead, but also
there's a lot of tricky places.

There's not much you can do, except for inserting the data directly into
the right partition (without any triggers).


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Benchmarking partitioning triggers and rules

From
Tomas Vondra
Date:
On 12.3.2015 04:57, Tim Uckun wrote:
> I am using postgres 9.4, the default install with "brew install
> postgres, no tuning at all.  BTW if I use postgres.app application the
> benchmarks run twice as slow!

I have no idea what brew or postgres.app is. But I strongly recommend
you to do some tuning.

  https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

>
> Why do you think there is such dramatic difference between
>
> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
> ($1).*' USING NEW ;
>
> and
>
>  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
> ($1).*)' USING NEW ;
>
> One is thirty percent faster than the other.  Also is there an even
> better way that I don't know about.

Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply
more expensive, as it needs to do more stuff (on every execution). There
are reasons for that, but you may think of it as regular queries vs.
prepared statements.

Prepared statements are parsed and planned once, regular query needs to
be parsed and planned over and over again.


--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Benchmarking partitioning triggers and rules

From
Jim Nasby
Date:
On 3/12/15 8:15 AM, Tomas Vondra wrote:
> On 12.3.2015 04:57, Tim Uckun wrote:
>> I am using postgres 9.4, the default install with "brew install
>> postgres, no tuning at all.  BTW if I use postgres.app application the
>> benchmarks run twice as slow!
>
> I have no idea what brew or postgres.app is. But I strongly recommend
> you to do some tuning.
>
>    https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
>>
>> Why do you think there is such dramatic difference between
>>
>> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
>> ($1).*' USING NEW ;
>>
>> and
>>
>>   EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
>> ($1).*)' USING NEW ;
>>
>> One is thirty percent faster than the other.  Also is there an even
>> better way that I don't know about.
>
> Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply
> more expensive, as it needs to do more stuff (on every execution). There
> are reasons for that, but you may think of it as regular queries vs.
> prepared statements.
>
> Prepared statements are parsed and planned once, regular query needs to
> be parsed and planned over and over again.

BTW, if you're that concerned about performance you could probably do a
lot better than a plpgsql trigger by creating one in C. There's an
enormous amount of code involved just in parsing and starting a plpgsql
trigger, and then it's going to have to re-parse the dynamic SQL for
every single row, whereas a C trigger could avoid almost all of that.

Rules are likely to be even faster (at least until you get to a fairly
large number of partitions), but as Thomas mentioned they're very tricky
to use. The critical thing to remember with them is they're essentially
hacking on the original query itself.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Benchmarking partitioning triggers and rules

From
Tim Uckun
Date:
I understand that there is overhead involved in parsing the strings and such.  The amount of overhead was surprising to me but that's another matter.  What I am really puzzled about is the difference between the statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT ($1).*' USING NEW ;

and

  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES (($1).*)' USING NEW ;

They both do string interpolation but one is significantly faster than the other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.

Re: Benchmarking partitioning triggers and rules

From
Adrian Klaver
Date:
On 04/07/2015 07:49 PM, Tim Uckun wrote:
> I understand that there is overhead involved in parsing the strings and
> such.  The amount of overhead was surprising to me but that's another
> matter.  What I am really puzzled about is the difference between the
> statements
>
> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
> ($1).*' USING NEW ;
>
> and
>
>    EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
> (($1).*)' USING NEW ;


Offhand I would say because in the first case you are doing a SELECT and
in the second you are just doing a substitution.

>
> They both do string interpolation but one is significantly faster than
> the other.   Is there a third and even faster way?
>
> I am using RDS so I can't really do stored procs in C.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Benchmarking partitioning triggers and rules

From
Tim Uckun
Date:
So is there a third and even faster way of doing this?

On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 04/07/2015 07:49 PM, Tim Uckun wrote:
I understand that there is overhead involved in parsing the strings and
such.  The amount of overhead was surprising to me but that's another
matter.  What I am really puzzled about is the difference between the
statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

   EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
(($1).*)' USING NEW ;


Offhand I would say because in the first case you are doing a SELECT and in the second you are just doing a substitution.



They both do string interpolation but one is significantly faster than
the other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Benchmarking partitioning triggers and rules

From
Jim Nasby
Date:
On 4/8/15 4:58 PM, Tim Uckun wrote:
> So is there a third and even faster way of doing this?

Please don't top-post.

> On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 04/07/2015 07:49 PM, Tim Uckun wrote:
>
>         I understand that there is overhead involved in parsing the
>         strings and
>         such.  The amount of overhead was surprising to me but that's
>         another
>         matter.  What I am really puzzled about is the difference
>         between the
>         statements
>
>         EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
>         ($1).*' USING NEW ;
>
>         and
>
>             EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||
>         ' VALUES
>         (($1).*)' USING NEW ;
>
>
>
>     Offhand I would say because in the first case you are doing a SELECT
>     and in the second you are just doing a substitution.
>
>
>
>         They both do string interpolation but one is significantly
>         faster than
>         the other.   Is there a third and even faster way?
>
>         I am using RDS so I can't really do stored procs in C.

There are only 3 ways you can "steer" data into the correct partition:

1) Outside the database (ie: the application itself knows what partition
table to use)
2) Rules
3) Triggers

#1 will almost always be faster (I say almost because if something has
to decide row-by-row it may be faster for the database to do it).

#2 and #3 depend on the workload. Rules are parsed once PER STATEMENT,
so if you're using a single INSERT to load a lot of rows they might well
be faster than triggers. But as mentioned, they're a real PITA to use
and they don't work at all with COPY.

When it comes to triggers, you will get different performance depending
on the trigger language used as well as how you write the trigger
itself. Presumably a trigger function written in C will be faster than
anything else. I would expect plpgsql to be the next fastest after that,
but that's just a guess.

As for the difference between SELECT and VALUES above, that's probably
due to a difference in parsing or in the number of transforms the NEW
has to go through in the executor. My guess is that because SELECT is
much more general purpose than VALUES it's both more expensive to parse
as well as more expensive to execute. If you really want to know for
certain, connect gdb to a database backend on your laptop/desktop, fire
off an insert (to load plpgsql into backend memory), set a gdb
breakpoint on exec_stmt_block(), and see what's different between the
two use cases.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com