Thread: performance - triggers, row existence etc.

performance - triggers, row existence etc.

From
tv@fuzzy.cz
Date:
Hello,

I'm just in the middle of performance tunning of our database running
on PostgreSQL, and I've several questions (I've searched the online
docs, but without success).

1) When I first use the EXPLAIN ANALYZE command, the time is much
   larger than in case of subsequent invocations of EXPLAIN ANALYZE.
   I suppose the plan prepared during the first invocation is cached
   somewhere, but I'm not sure where and for how long.

   I suppose the execution plans are connection specific, but
   I'm not sure whether this holds for the sql queries inside the
   triggers too. I've done some testing but the things are somehow
   more difficult thanks to persistent links (the commands will
   be executed from PHP).

2) Is there some (performance) difference between BEFORE and AFTER
   triggers? I believe there's no measurable difference.

3) Vast majority of SQL commands inside the trigger checks whether there
   exists a row that suits some conditions (same IP, visitor ID etc.)
   Currently I do this by

   SELECT INTO tmp id FROM ... JOIN ... WHERE ... LIMIT 1
   IF NOT FOUND THEN
    ....
   END IF;

   and so on. I believe this is fast and low-cost solution (compared
   to the COUNT(*) way I've used before), but is there some even better
   (faster) way to check row existence?

Thanks
t.v.

Re: performance - triggers, row existence etc.

From
John A Meinel
Date:
tv@fuzzy.cz wrote:

>Hello,
>
>I'm just in the middle of performance tunning of our database running
>on PostgreSQL, and I've several questions (I've searched the online
>docs, but without success).
>
>1) When I first use the EXPLAIN ANALYZE command, the time is much
>   larger than in case of subsequent invocations of EXPLAIN ANALYZE.
>   I suppose the plan prepared during the first invocation is cached
>   somewhere, but I'm not sure where and for how long.
>
>
>
This is actually true for any command. If you just use \timing and not
explain analyze, you will see that the first time is usually
significantly longer than the rest.

It's because the tables you are using are being cached in RAM (by the OS
& by postgres).
It's not a planning difference, it's a bulk data cache difference.

When and how long is dependent on how much RAM you have, and how much of
the database you are using.

>   I suppose the execution plans are connection specific, but
>   I'm not sure whether this holds for the sql queries inside the
>   triggers too. I've done some testing but the things are somehow
>   more difficult thanks to persistent links (the commands will
>   be executed from PHP).
>
>
Connection specific????
If you were doing PREPARE myquery AS SELECT ...; Then myquery would only
exist for that connection. And cursors & temp tables are only for the
given connection.
But otherwise I don't think the connection matters.

>2) Is there some (performance) difference between BEFORE and AFTER
>   triggers? I believe there's no measurable difference.
>
>
I don't know that there is a performance difference, but there is a
semantic one. If you are trying to (potentially) prevent the row from
being inserted you must do that BEFORE, since the row doesn't exist yet.
If you are trying to update a foreign key reference to the new object,
you must do that AFTER, so that the row exists to reference.

>3) Vast majority of SQL commands inside the trigger checks whether there
>   exists a row that suits some conditions (same IP, visitor ID etc.)
>   Currently I do this by
>
>   SELECT INTO tmp id FROM ... JOIN ... WHERE ... LIMIT 1
>   IF NOT FOUND THEN
>    ....
>   END IF;
>
>   and so on. I believe this is fast and low-cost solution (compared
>   to the COUNT(*) way I've used before), but is there some even better
>   (faster) way to check row existence?
>
>
>
SELECT ... WHERE EXISTS ...;
I'm not sure what you are trying to do, but this makes a good joined
command.

SELECT what_I_want FROM table WHERE EXISTS (SELECT what_I_need FROM
othertable);

In general, though, SELECT  WHERE LIMIT 1 is about as fast as you can get.

>Thanks
>t.v.
>
>
John
=:->


Attachment

Re: performance - triggers, row existence etc.

From
"Tambet Matiisen"
Date:
...
>
> 2) Is there some (performance) difference between BEFORE and AFTER
>    triggers? I believe there's no measurable difference.
>

BEFORE triggers might be faster, because you get a chance to reject the
record before it is inserted into table. Common practice is to put
validity checks into BEFORE triggers and updates of other tables into
AFTER triggers. See also
http://archives.postgresql.org/pgsql-sql/2005-04/msg00088.php.

> 3) Vast majority of SQL commands inside the trigger checks
> whether there
>    exists a row that suits some conditions (same IP, visitor ID etc.)
>    Currently I do this by
>
>    SELECT INTO tmp id FROM ... JOIN ... WHERE ... LIMIT 1
>    IF NOT FOUND THEN
>     ....
>    END IF;
>
>    and so on. I believe this is fast and low-cost solution (compared
>    to the COUNT(*) way I've used before), but is there some
> even better
>    (faster) way to check row existence?
>

You could save one temporary variable by using PERFORM:

PERFORM 1 FROM ... JOIN ... WHERE ... LIMIT 1;
IF NOT FOUND THEN
...
END IF;

You might want to consider, if you need FOR UPDATE in those queries, so
that the referenced row maintains it's state until the end of
transaction. BTW, foreign keys weren't enough?

  Tambet