Thread: rules *very* slow?

rules *very* slow?

From
Neil Conway
Date:
In testing my database, I've encountered what appears to be a
concerning performance problem using a rule (ON INSERT). Here's
the situation:

Every time a row is inserted into a table (called 'messages'), I want
to increment a counter in a different table (called 'users'). The
best way I could think up to implement this was to use a Postgres
rule which incremented the appropriate data when an INSERT is
performed.

The database schema (lots of extraneous stuff removed):

CREATE TABLE users (
    id              serial PRIMARY KEY,
    num_posts       int4 DEFAULT 0,
);

CREATE TABLE messages (
    id              serial,
    poster          int4 NOT NULL REFERENCES users MATCH FULL,
);

CREATE RULE update_posts_total
    AS ON insert TO messages DO
    UPDATE users SET num_posts = num_posts + 1 WHERE users.id = new.poster;

To test performance, I wrote a simple Perl script which inserts
3000 rows into the 'messages' table (inside a single transaction).

With the rule:

Adding 3000 messages.
    364 wallclock secs ( 1.04 usr +  0.22 sys =  1.26 CPU)

Without the rule:

Adding 3000 messages.
     7 wallclock secs ( 0.83 usr +  0.19 sys =  1.02 CPU)

While the test is running, postgres was using about 98% of the
available CPU time (perl is using the rest).

I would expect the rule it cause a bit of overhead (maybe
taking twice or three times as long as w/o the rule), but
it's taking ~52x longer.

I've tried creating an index on messages.poster, but it has
no effect (performance is the same). I guesses that Postgres
was ignoring the index so I disabled seqscan, but that had
no effect.

A couple questions:

1) Are rules really this slow?

2) Have I done something wrong? Is there a more efficient way to
implement this?

3) Will this translate into an equivelant real-world performance hit?
I wrote the Perl script in a few minutes so it's not particularly
accurate. In 'production', my application will only insert 1 row
per transaction, with perhaps 1 or 2 inserts per second (the
majority of queries will probably be SELECTs). What exactly is
causing the benchmark to be *so* slow, and will it effect my
application to the same degree?

4) The current rule only performs 1 action, but when this app
is finished this rule will probably be performing 3 or more
UPDATEs for every INSERT. Will this bring correspondingly poor
performance (i.e. 364x3 seconds per 3000 inserts), or does
the performance problem lie somewhere else?

I'm running Postgres 7.1-devel on FreeBSD 4.1-STABLE. The tests
were run on my development box - a P2 350 with 128 MB of RAM.
Feel free to ask me for more info.

Thanks in advance,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Don't worry about people stealing your ideas.  If your ideas are any
good, you'll have to ram them down people's throats.
        -- Howard Aiken

Attachment

Re: rules *very* slow?

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> I would expect the rule it cause a bit of overhead (maybe
> taking twice or three times as long as w/o the rule), but
> it's taking ~52x longer.

Ouch.

> I've tried creating an index on messages.poster, but it has
> no effect (performance is the same). I guesses that Postgres
> was ignoring the index so I disabled seqscan, but that had
> no effect.

An index on messages.poster wouldn't help here, AFAICS.  The update
generated by the rule should be using an indexscan on the users.id
index (check this by doing an EXPLAIN on one of your insert commands).

> 1) Are rules really this slow?

Seems like they shouldn't be.  Could you recompile with PROFILE=-pg
and get a gprof profile on the 3000-inserts test case?

> 2) Have I done something wrong? Is there a more efficient way to
> implement this?

Actually, I'd have gone for a trigger implemented by a plpgsql function,
so that you only pay the overhead of planning the UPDATE query once not
every time.  But I don't think that should explain 52x, either.
As long as you've uncovered a problem, let's see if we can fix it before
you go off in a different direction.

            regards, tom lane

Re: rules *very* slow?

From
Neil Conway
Date:
After my last email, I added the rest of the rule actions. So the
relevant part of the schema now looks like this:

CREATE RULE update_msg_stats
    AS ON INSERT TO messages DO (
        UPDATE users SET num_posts = num_posts + 1
            WHERE users.id = new.poster;
        UPDATE threads SET num_posts = num_posts + 1
            WHERE threads.id = new.thread;
        UPDATE forums SET num_posts = forums.num_posts + 1 FROM threads t
            WHERE t.id = new.thread AND forums.id = t.forum;

        UPDATE threads SET last_post = timestamp('now')
            WHERE threads.id = new.thread;
        UPDATE forums SET last_post = timestamp('now') FROM threads t
            WHERE t.id = new.thread AND forums.id = t.forum;
    );

(I'll just provide the definition of the rule -- if you need all the
tables it effects, just tell me.)

I reran the 3000-inserts test case with the more complex rule in place
- after 1 hour, and canceled it. It had inserted about 2000 of the
3000 rows (judging by the value of messages_id_seq). As before,
inserting these rows took < 10 seconds without the rules.

On Wed, Oct 18, 2000 at 10:50:43PM -0400, Tom Lane wrote:
> > 1) Are rules really this slow?
>
> Seems like they shouldn't be.  Could you recompile with PROFILE=-pg
> and get a gprof profile on the 3000-inserts test case?

I tried rebuilding Postgres, but 'gmake' fails with:

gmake[3]: Leaving directory `/usr/home/nconway/pgsql-20000924/src/backend/utils'
gcc -I../../src/include  -O2 -m486 -pipe -Wall -Wmissing-prototypes -Wmissing-declarations -pg -o postgres
access/SUBSYS.obootstrap/SUBSYS.o catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o lib/SUBSYS.o
libpq/SUBSYS.omain/SUBSYS.o nodes/SUBSYS.o optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o
rewrite/SUBSYS.ostorage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz -lcrypt -lcompat -lln -lm -lutil -lreadline -ltermcap
-lncurses -export-dynamic -pg 
/usr/libexec/elf/ld: cannot find -lc_p
gmake[2]: *** [postgres] Error 1
gmake[2]: Leaving directory `/usr/home/nconway/pgsql-20000924/src/backend'
gmake[1]: *** [all] Error 2
gmake[1]: Leaving directory `/usr/home/nconway/pgsql-20000924/src'
gmake: *** [all] Error 2

After it has been compiling for a while. I'm running FreeBSD 4.1-STABLE on
x86 with gcc 2.95.2 . I tried compiling 3 nightly snapshots (Oct 18, Oct 19,
and Sept. 24), and none of them worked. They all compile properly without
profiling.

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Four stages of acceptance:
    i) this is worthless nonsense;
    ii) this is an interesting, but perverse, point of view;
    iii) this is true, but quite unimportant;
    iv) I always said so.
        -- J. B. S. Haldane in Journal of Genetics 58:464 (1963).

Attachment

Re: rules *very* slow?

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> /usr/libexec/elf/ld: cannot find -lc_p

Ugh.  Looks like you don't have a complete installation of profiling
support code.  I suspect you'll find that compiling "hello world" with
-pg doesn't work either.

Profiling works OK for me, but I don't have time right now to
reverse-engineer all the details of your test case.  If you'll send me a
complete script to reproduce what you're doing, I'll try to take a look.

            regards, tom lane

Re: rules *very* slow?

From
Jan Wieck
Date:
Tom Lane wrote:
> Neil Conway <nconway@klamath.dyndns.org> writes:
> > I would expect the rule it cause a bit of overhead (maybe
> > taking twice or three times as long as w/o the rule), but
> > it's taking ~52x longer.
>
> Ouch.

    Cannot  recreate such a big runtime difference here. With the
    given example, the test with the rule runs ~4 times  compared
    to without the rule (26 secs vs. 8 secs using a Tcl script as
    driver).

    And that is IMHO not too bad. Having the rule in place  means
    that  the rewriter has to create one UPDATE per INSERT, which
    must be executed. This  UPDATE  then  invokes  a  referential
    integrity  trigger  to check whether the KEY of the users row
    has changed (in which case it'd need to check  if  there  are
    references).  So  there  is  more overhead than just one more
    UPDATE.

>
> > I've tried creating an index on messages.poster, but it has
> > no effect (performance is the same). I guesses that Postgres
> > was ignoring the index so I disabled seqscan, but that had
> > no effect.
>
> An index on messages.poster wouldn't help here, AFAICS.  The update
> generated by the rule should be using an indexscan on the users.id
> index (check this by doing an EXPLAIN on one of your insert commands).

    It shouldn't help here. But it  will  help  in  the  case  of
    deleting  users  to speedup the referential action lookup for
    existing messages.

>
> > 1) Are rules really this slow?

    Not AFAICS. But to ensure  could  you  please  give  me  more
    information?   What is the number and average size of rows in
    the users table? Are the 3000 messages distributed  over  all
    users or just a few?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #