Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From Dennis Bjorklund
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id Pine.LNX.4.44.0507290835100.7470-100000@zigo.dhs.org
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher <matt.s@aptalaska.net>)
Responses Re: Performance problems testing with Spamassassin 3.1.0
List pgsql-performance
On Wed, 27 Jul 2005, Matthew Schumacher wrote:

> Then they do this to insert the token:
>
> INSERT INTO bayes_token (
>   id,
>   token,
>   spam_count,
>   ham_count,
>   atime
> ) VALUES (
>   ?,
>   ?,
>   ?,
>   ?,
>   ?
> ) ON DUPLICATE KEY
>   UPDATE
>     spam_count = GREATEST(spam_count + ?, 0),
>     ham_count = GREATEST(ham_count + ?, 0),
>     atime = GREATEST(atime, ?)
>
> Or update the token:
>
> UPDATE bayes_vars SET
>   $token_count_update
>   newest_token_age = GREATEST(newest_token_age, ?),
>   oldest_token_age = LEAST(oldest_token_age, ?)
>   WHERE id = ?
>
>
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements performing poorly.

How can they perform poorly when they are dead simple? Here are 2
functions that work for the above cases of greatest:

CREATE FUNCTION greatest_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;'
 LANGUAGE SQL;

CREATE FUNCTION least_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;'
 LANGUAGE SQL;

and these should be inlined by pg and very fast to execute.

I wrote a function that should do what the insert above does. The update
I've not looked at (I don't know what $token_count_update is) but the
update looks simple enough to just implement the same way in pg as in
mysql.

For the insert or replace case you can probably use this function:

CREATE FUNCTION insert_or_update_token (xid INTEGER,
                                        xtoken BYTEA,
                                        xspam_count INTEGER,
                                        xham_count INTEGER,
                                        xatime INTEGER)
RETURNS VOID AS
$$
BEGIN
   LOOP
     UPDATE bayes_token
        SET spam_count = greatest_int (spam_count + xspam_count, 0),
            ham_count  = greatest_int (ham_count + xham_count, 0),
            atime      = greatest_int (atime, xatime)
      WHERE id = xid
        AND token = xtoken;

      IF found THEN
        RETURN;
      END IF;

      BEGIN
        INSERT INTO bayes_token VALUES (xid,
                                        xtoken,
                                        xspam_count,
                                        xham_count,
                                        xatime);
        RETURN;
      EXCEPTION WHEN unique_violation THEN
        -- do nothing
      END;
   END LOOP;
END;
$$
LANGUAGE plpgsql;

It's not really tested so I can't tell if it's faster then what you have.
What it does do is mimic the way you insert values in mysql. It only work
on pg 8.0 and later however since the exception handling was added in 8.0.

--
/Dennis Björklund


pgsql-performance by date:

Previous
From: Gavin Sherry
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Next
From: "Luke Lonergan"
Date:
Subject: Re: Performance problems testing with Spamassassin