Thread: rules *very* slow?
In testing my database, I've encountered what appears to be a concerning performance problem using a rule (ON INSERT). Here's the situation: Every time a row is inserted into a table (called 'messages'), I want to increment a counter in a different table (called 'users'). The best way I could think up to implement this was to use a Postgres rule which incremented the appropriate data when an INSERT is performed. The database schema (lots of extraneous stuff removed): CREATE TABLE users ( id serial PRIMARY KEY, num_posts int4 DEFAULT 0, ); CREATE TABLE messages ( id serial, poster int4 NOT NULL REFERENCES users MATCH FULL, ); CREATE RULE update_posts_total AS ON insert TO messages DO UPDATE users SET num_posts = num_posts + 1 WHERE users.id = new.poster; To test performance, I wrote a simple Perl script which inserts 3000 rows into the 'messages' table (inside a single transaction). With the rule: Adding 3000 messages. 364 wallclock secs ( 1.04 usr + 0.22 sys = 1.26 CPU) Without the rule: Adding 3000 messages. 7 wallclock secs ( 0.83 usr + 0.19 sys = 1.02 CPU) While the test is running, postgres was using about 98% of the available CPU time (perl is using the rest). I would expect the rule it cause a bit of overhead (maybe taking twice or three times as long as w/o the rule), but it's taking ~52x longer. I've tried creating an index on messages.poster, but it has no effect (performance is the same). I guesses that Postgres was ignoring the index so I disabled seqscan, but that had no effect. A couple questions: 1) Are rules really this slow? 2) Have I done something wrong? Is there a more efficient way to implement this? 3) Will this translate into an equivelant real-world performance hit? I wrote the Perl script in a few minutes so it's not particularly accurate. In 'production', my application will only insert 1 row per transaction, with perhaps 1 or 2 inserts per second (the majority of queries will probably be SELECTs). What exactly is causing the benchmark to be *so* slow, and will it effect my application to the same degree? 4) The current rule only performs 1 action, but when this app is finished this rule will probably be performing 3 or more UPDATEs for every INSERT. Will this bring correspondingly poor performance (i.e. 364x3 seconds per 3000 inserts), or does the performance problem lie somewhere else? I'm running Postgres 7.1-devel on FreeBSD 4.1-STABLE. The tests were run on my development box - a P2 350 with 128 MB of RAM. Feel free to ask me for more info. Thanks in advance, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken
Attachment
Neil Conway <nconway@klamath.dyndns.org> writes: > I would expect the rule it cause a bit of overhead (maybe > taking twice or three times as long as w/o the rule), but > it's taking ~52x longer. Ouch. > I've tried creating an index on messages.poster, but it has > no effect (performance is the same). I guesses that Postgres > was ignoring the index so I disabled seqscan, but that had > no effect. An index on messages.poster wouldn't help here, AFAICS. The update generated by the rule should be using an indexscan on the users.id index (check this by doing an EXPLAIN on one of your insert commands). > 1) Are rules really this slow? Seems like they shouldn't be. Could you recompile with PROFILE=-pg and get a gprof profile on the 3000-inserts test case? > 2) Have I done something wrong? Is there a more efficient way to > implement this? Actually, I'd have gone for a trigger implemented by a plpgsql function, so that you only pay the overhead of planning the UPDATE query once not every time. But I don't think that should explain 52x, either. As long as you've uncovered a problem, let's see if we can fix it before you go off in a different direction. regards, tom lane
After my last email, I added the rest of the rule actions. So the relevant part of the schema now looks like this: CREATE RULE update_msg_stats AS ON INSERT TO messages DO ( UPDATE users SET num_posts = num_posts + 1 WHERE users.id = new.poster; UPDATE threads SET num_posts = num_posts + 1 WHERE threads.id = new.thread; UPDATE forums SET num_posts = forums.num_posts + 1 FROM threads t WHERE t.id = new.thread AND forums.id = t.forum; UPDATE threads SET last_post = timestamp('now') WHERE threads.id = new.thread; UPDATE forums SET last_post = timestamp('now') FROM threads t WHERE t.id = new.thread AND forums.id = t.forum; ); (I'll just provide the definition of the rule -- if you need all the tables it effects, just tell me.) I reran the 3000-inserts test case with the more complex rule in place - after 1 hour, and canceled it. It had inserted about 2000 of the 3000 rows (judging by the value of messages_id_seq). As before, inserting these rows took < 10 seconds without the rules. On Wed, Oct 18, 2000 at 10:50:43PM -0400, Tom Lane wrote: > > 1) Are rules really this slow? > > Seems like they shouldn't be. Could you recompile with PROFILE=-pg > and get a gprof profile on the 3000-inserts test case? I tried rebuilding Postgres, but 'gmake' fails with: gmake[3]: Leaving directory `/usr/home/nconway/pgsql-20000924/src/backend/utils' gcc -I../../src/include -O2 -m486 -pipe -Wall -Wmissing-prototypes -Wmissing-declarations -pg -o postgres access/SUBSYS.obootstrap/SUBSYS.o catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o lib/SUBSYS.o libpq/SUBSYS.omain/SUBSYS.o nodes/SUBSYS.o optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o rewrite/SUBSYS.ostorage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz -lcrypt -lcompat -lln -lm -lutil -lreadline -ltermcap -lncurses -export-dynamic -pg /usr/libexec/elf/ld: cannot find -lc_p gmake[2]: *** [postgres] Error 1 gmake[2]: Leaving directory `/usr/home/nconway/pgsql-20000924/src/backend' gmake[1]: *** [all] Error 2 gmake[1]: Leaving directory `/usr/home/nconway/pgsql-20000924/src' gmake: *** [all] Error 2 After it has been compiling for a while. I'm running FreeBSD 4.1-STABLE on x86 with gcc 2.95.2 . I tried compiling 3 nightly snapshots (Oct 18, Oct 19, and Sept. 24), and none of them worked. They all compile properly without profiling. -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Four stages of acceptance: i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view; iii) this is true, but quite unimportant; iv) I always said so. -- J. B. S. Haldane in Journal of Genetics 58:464 (1963).
Attachment
Neil Conway <nconway@klamath.dyndns.org> writes: > /usr/libexec/elf/ld: cannot find -lc_p Ugh. Looks like you don't have a complete installation of profiling support code. I suspect you'll find that compiling "hello world" with -pg doesn't work either. Profiling works OK for me, but I don't have time right now to reverse-engineer all the details of your test case. If you'll send me a complete script to reproduce what you're doing, I'll try to take a look. regards, tom lane
Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > I would expect the rule it cause a bit of overhead (maybe > > taking twice or three times as long as w/o the rule), but > > it's taking ~52x longer. > > Ouch. Cannot recreate such a big runtime difference here. With the given example, the test with the rule runs ~4 times compared to without the rule (26 secs vs. 8 secs using a Tcl script as driver). And that is IMHO not too bad. Having the rule in place means that the rewriter has to create one UPDATE per INSERT, which must be executed. This UPDATE then invokes a referential integrity trigger to check whether the KEY of the users row has changed (in which case it'd need to check if there are references). So there is more overhead than just one more UPDATE. > > > I've tried creating an index on messages.poster, but it has > > no effect (performance is the same). I guesses that Postgres > > was ignoring the index so I disabled seqscan, but that had > > no effect. > > An index on messages.poster wouldn't help here, AFAICS. The update > generated by the rule should be using an indexscan on the users.id > index (check this by doing an EXPLAIN on one of your insert commands). It shouldn't help here. But it will help in the case of deleting users to speedup the referential action lookup for existing messages. > > > 1) Are rules really this slow? Not AFAICS. But to ensure could you please give me more information? What is the number and average size of rows in the users table? Are the 3000 messages distributed over all users or just a few? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #