Thread: Benchmarking partitioning triggers and rules
I wrote a quick benchmarking script to test various partitioning strategies. The code is here.
https://gist.github.com/timuckun/954ab6bdce36fa14bc1c
I was astonished at the variability of the timings between the different variations. https://gist.github.com/timuckun/954ab6bdce36fa14bc1c
clean 0.000000 0.000000 0.000000 ( 3.119498)
func_1 0.000000 0.000000 0.000000 ( 7.435094)
func_2 0.000000 0.000000 0.000000 ( 28.427617)
func_3 0.000000 0.000000 0.000000 ( 18.348554)
Rule 0.000000 0.000000 0.000000 ( 2.901931)
A clean insert 3.1 seconds, putting a rule took less time!
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES (' || NEW.* || ')' but that gave me an error.
Hey I hate to bump my post but I would really appreciate some input on this benchmark. I am very alarmed that adding a very simple partitioning trigger slows the insert speed by an order of magnitude. Is there any way to speed this up?
On Mon, Mar 9, 2015 at 11:44 AM, Tim Uckun <timuckun@gmail.com> wrote:
Would using a different language help? Is Javasscript, Python or Perl faster? Is there some other syntax I can use? I tried thisThis has left me both baffled and curious. If changing little things like this makes a huge difference what else can I do to make this even faster?A simple insert into table_name values (NEW.*) doubled the time it takes to insert the records. Using an EXECUTE with an interpolated table name took TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third off the execution time WTF???The test data contained 270K records. I did a simple insert into without any triggers, with three different trigger variations and with a rule. The results were like thisI wrote a quick benchmarking script to test various partitioning strategies. The code is here.I was astonished at the variability of the timings between the different variations.
https://gist.github.com/timuckun/954ab6bdce36fa14bc1c
clean 0.000000 0.000000 0.000000 ( 3.119498)
func_1 0.000000 0.000000 0.000000 ( 7.435094)
func_2 0.000000 0.000000 0.000000 ( 28.427617)
func_3 0.000000 0.000000 0.000000 ( 18.348554)
Rule 0.000000 0.000000 0.000000 ( 2.901931)
A clean insert 3.1 seconds, putting a rule took less time!
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES (' || NEW.* || ')' but that gave me an error.
On 11.3.2015 21:43, Tim Uckun wrote: > Hey I hate to bump my post but I would really appreciate some input > on this benchmark. I am very alarmed that adding a very simple > partitioning trigger slows the insert speed by an order of magnitude. > Is there any way to speed this up? I think to actually give you a meaningful response, we really need more details about your configuration - what PostgreSQL version are you using, what configuration have you changed from the defaults and such. Anyway, you're right that triggers are not cheap. The numbers I get on the development version with a bit of tuning look like this: INSERT (direct) 1.5 sec INSERT 4.0 sec INSERT (EXECUTE) 11.5 sec So it's ~ what you've measured. Rules have the lowest overhead, but also there's a lot of tricky places. There's not much you can do, except for inserting the data directly into the right partition (without any triggers). -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I am using postgres 9.4, the default install with "brew install postgres, no tuning at all. BTW if I use postgres.app application the benchmarks run twice as slow!
Why do you think there is such dramatic difference between EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT ($1).*' USING NEW ;
and
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES( ($1).*)' USING NEW ;
On Thu, Mar 12, 2015 at 10:42 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 11.3.2015 21:43, Tim Uckun wrote:
> Hey I hate to bump my post but I would really appreciate some input
> on this benchmark. I am very alarmed that adding a very simple
> partitioning trigger slows the insert speed by an order of magnitude.
> Is there any way to speed this up?
I think to actually give you a meaningful response, we really need more
details about your configuration - what PostgreSQL version are you
using, what configuration have you changed from the defaults and such.
Anyway, you're right that triggers are not cheap. The numbers I get on
the development version with a bit of tuning look like this:
INSERT (direct) 1.5 sec
INSERT 4.0 sec
INSERT (EXECUTE) 11.5 sec
So it's ~ what you've measured. Rules have the lowest overhead, but also
there's a lot of tricky places.
There's not much you can do, except for inserting the data directly into
the right partition (without any triggers).
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12.3.2015 04:57, Tim Uckun wrote: > I am using postgres 9.4, the default install with "brew install > postgres, no tuning at all. BTW if I use postgres.app application the > benchmarks run twice as slow! I have no idea what brew or postgres.app is. But I strongly recommend you to do some tuning. https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > > Why do you think there is such dramatic difference between > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT > ($1).*' USING NEW ; > > and > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES( > ($1).*)' USING NEW ; > > One is thirty percent faster than the other. Also is there an even > better way that I don't know about. Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply more expensive, as it needs to do more stuff (on every execution). There are reasons for that, but you may think of it as regular queries vs. prepared statements. Prepared statements are parsed and planned once, regular query needs to be parsed and planned over and over again. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/12/15 8:15 AM, Tomas Vondra wrote: > On 12.3.2015 04:57, Tim Uckun wrote: >> I am using postgres 9.4, the default install with "brew install >> postgres, no tuning at all. BTW if I use postgres.app application the >> benchmarks run twice as slow! > > I have no idea what brew or postgres.app is. But I strongly recommend > you to do some tuning. > > https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > >> >> Why do you think there is such dramatic difference between >> >> EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT >> ($1).*' USING NEW ; >> >> and >> >> EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES( >> ($1).*)' USING NEW ; >> >> One is thirty percent faster than the other. Also is there an even >> better way that I don't know about. > > Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply > more expensive, as it needs to do more stuff (on every execution). There > are reasons for that, but you may think of it as regular queries vs. > prepared statements. > > Prepared statements are parsed and planned once, regular query needs to > be parsed and planned over and over again. BTW, if you're that concerned about performance you could probably do a lot better than a plpgsql trigger by creating one in C. There's an enormous amount of code involved just in parsing and starting a plpgsql trigger, and then it's going to have to re-parse the dynamic SQL for every single row, whereas a C trigger could avoid almost all of that. Rules are likely to be even faster (at least until you get to a fairly large number of partitions), but as Thomas mentioned they're very tricky to use. The critical thing to remember with them is they're essentially hacking on the original query itself. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
I understand that there is overhead involved in parsing the strings and such. The amount of overhead was surprising to me but that's another matter. What I am really puzzled about is the difference between the statements
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT ($1).*' USING NEW ;
and
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES (($1).*)' USING NEW ;
They both do string interpolation but one is significantly faster than the other. Is there a third and even faster way?EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT ($1).*' USING NEW ;
and
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES (($1).*)' USING NEW ;
On 04/07/2015 07:49 PM, Tim Uckun wrote: > I understand that there is overhead involved in parsing the strings and > such. The amount of overhead was surprising to me but that's another > matter. What I am really puzzled about is the difference between the > statements > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT > ($1).*' USING NEW ; > > and > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES > (($1).*)' USING NEW ; Offhand I would say because in the first case you are doing a SELECT and in the second you are just doing a substitution. > > They both do string interpolation but one is significantly faster than > the other. Is there a third and even faster way? > > I am using RDS so I can't really do stored procs in C. > -- Adrian Klaver adrian.klaver@aklaver.com
So is there a third and even faster way of doing this?
On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 04/07/2015 07:49 PM, Tim Uckun wrote:I understand that there is overhead involved in parsing the strings and
such. The amount of overhead was surprising to me but that's another
matter. What I am really puzzled about is the difference between the
statements
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT
($1).*' USING NEW ;
and
EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' VALUES
(($1).*)' USING NEW ;
Offhand I would say because in the first case you are doing a SELECT and in the second you are just doing a substitution.--
They both do string interpolation but one is significantly faster than
the other. Is there a third and even faster way?
I am using RDS so I can't really do stored procs in C.
Adrian Klaver
adrian.klaver@aklaver.com
On 4/8/15 4:58 PM, Tim Uckun wrote: > So is there a third and even faster way of doing this? Please don't top-post. > On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 04/07/2015 07:49 PM, Tim Uckun wrote: > > I understand that there is overhead involved in parsing the > strings and > such. The amount of overhead was surprising to me but that's > another > matter. What I am really puzzled about is the difference > between the > statements > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || ' SELECT > ($1).*' USING NEW ; > > and > > EXECUTE 'INSERT INTO ' || quote_ident(partition_name) || > ' VALUES > (($1).*)' USING NEW ; > > > > Offhand I would say because in the first case you are doing a SELECT > and in the second you are just doing a substitution. > > > > They both do string interpolation but one is significantly > faster than > the other. Is there a third and even faster way? > > I am using RDS so I can't really do stored procs in C. There are only 3 ways you can "steer" data into the correct partition: 1) Outside the database (ie: the application itself knows what partition table to use) 2) Rules 3) Triggers #1 will almost always be faster (I say almost because if something has to decide row-by-row it may be faster for the database to do it). #2 and #3 depend on the workload. Rules are parsed once PER STATEMENT, so if you're using a single INSERT to load a lot of rows they might well be faster than triggers. But as mentioned, they're a real PITA to use and they don't work at all with COPY. When it comes to triggers, you will get different performance depending on the trigger language used as well as how you write the trigger itself. Presumably a trigger function written in C will be faster than anything else. I would expect plpgsql to be the next fastest after that, but that's just a guess. As for the difference between SELECT and VALUES above, that's probably due to a difference in parsing or in the number of transforms the NEW has to go through in the executor. My guess is that because SELECT is much more general purpose than VALUES it's both more expensive to parse as well as more expensive to execute. If you really want to know for certain, connect gdb to a database backend on your laptop/desktop, fire off an insert (to load plpgsql into backend memory), set a gdb breakpoint on exec_stmt_block(), and see what's different between the two use cases. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com