Re: Performance problems testing with Spamassassin 3.1.0

From: Dennis Bjorklund
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: ,
Msg-id: Pine.LNX.4.44.0507290835100.7470-100000@zigo.dhs.org
(view: Whole thread, Raw)
In response to: Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher)
Responses: Re: Performance problems testing with Spamassassin 3.1.0  (Josh Berkus)
List: pgsql-performance

Tree view

Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Matthew Schumacher, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Dennis Bjorklund, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Josh Berkus, )
     Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Tom Lane, )
  Re: Performance problems testing with Spamassassin  ("Luke Lonergan", )
 Re: Performance problems testing with Spamassassin 3.1.0  (Karim Nassar, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
       Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
        Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
            Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
             Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
              Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
               Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                  Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                   Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                    Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
        Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
           Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
            Re: Performance problems testing with Spamassassin 3.1.0  (Andreas Pflug, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  ("Merlin Moncure", )

On Wed, 27 Jul 2005, Matthew Schumacher wrote:

> Then they do this to insert the token:
>
> INSERT INTO bayes_token (
>   id,
>   token,
>   spam_count,
>   ham_count,
>   atime
> ) VALUES (
>   ?,
>   ?,
>   ?,
>   ?,
>   ?
> ) ON DUPLICATE KEY
>   UPDATE
>     spam_count = GREATEST(spam_count + ?, 0),
>     ham_count = GREATEST(ham_count + ?, 0),
>     atime = GREATEST(atime, ?)
>
> Or update the token:
>
> UPDATE bayes_vars SET
>   $token_count_update
>   newest_token_age = GREATEST(newest_token_age, ?),
>   oldest_token_age = LEAST(oldest_token_age, ?)
>   WHERE id = ?
>
>
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements performing poorly.

How can they perform poorly when they are dead simple? Here are 2
functions that work for the above cases of greatest:

CREATE FUNCTION greatest_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;'
 LANGUAGE SQL;

CREATE FUNCTION least_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;'
 LANGUAGE SQL;

and these should be inlined by pg and very fast to execute.

I wrote a function that should do what the insert above does. The update
I've not looked at (I don't know what $token_count_update is) but the
update looks simple enough to just implement the same way in pg as in
mysql.

For the insert or replace case you can probably use this function:

CREATE FUNCTION insert_or_update_token (xid INTEGER,
                                        xtoken BYTEA,
                                        xspam_count INTEGER,
                                        xham_count INTEGER,
                                        xatime INTEGER)
RETURNS VOID AS
$$
BEGIN
   LOOP
     UPDATE bayes_token
        SET spam_count = greatest_int (spam_count + xspam_count, 0),
            ham_count  = greatest_int (ham_count + xham_count, 0),
            atime      = greatest_int (atime, xatime)
      WHERE id = xid
        AND token = xtoken;

      IF found THEN
        RETURN;
      END IF;

      BEGIN
        INSERT INTO bayes_token VALUES (xid,
                                        xtoken,
                                        xspam_count,
                                        xham_count,
                                        xatime);
        RETURN;
      EXCEPTION WHEN unique_violation THEN
        -- do nothing
      END;
   END LOOP;
END;
$$
LANGUAGE plpgsql;

It's not really tested so I can't tell if it's faster then what you have.
What it does do is mimic the way you insert values in mysql. It only work
on pg 8.0 and later however since the exception handling was added in 8.0.

--
/Dennis Björklund



pgsql-performance by date:

From: Bruce Momjian
Date:
Subject: Re: [PATCHES] COPY FROM performance improvements
From: Alvaro Herrera
Date:
Subject: Re: Performance problems testing with Spamassassin