Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From Matthew Schumacher
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id 42EE8043.1050207@aptalaska.net
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance problems testing with Spamassassin 3.1.0
List pgsql-performance
Tom Lane wrote:
>
> Revised insertion procedure:
>
>
> CREATE or replace FUNCTION put_tokens (_id INTEGER,
>                               _tokens BYTEA[],
>                               _spam_count INTEGER,
>                               _ham_count INTEGER,
>                               _atime INTEGER)
> RETURNS VOID AS
> $$
> declare _token bytea;
>         new_tokens integer := 0;
> BEGIN
>   for i in array_lower(_tokens,1) .. array_upper(_tokens,1)
>   LOOP
>     _token := _tokens[i];
>     UPDATE bayes_token
>       SET spam_count = spam_count + _spam_count,
>           ham_count  = ham_count + _ham_count,
>           atime      = _atime
>       WHERE id = _id
>       AND token = _token;
>
>     IF not found THEN
>       INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
>                                       _ham_count, _atime);
>       new_tokens := new_tokens + 1;
>     END IF;
>   END LOOP;
>   if new_tokens > 0 THEN
>       UPDATE bayes_vars SET token_count = token_count + new_tokens
>       WHERE id = _id;
>       IF NOT FOUND THEN
>         RAISE EXCEPTION 'unable to update token_count in bayes_vars';
>       END IF;
>   END IF;
>   RETURN;
> END;
> $$
> LANGUAGE plpgsql;
>

Tom, thanks for all your help on this, I think we are fairly close to
having this done in a proc.  The biggest problem we are running into now
is that the data gets inserted as an int.  Even though your proc defines
_token as byeta, I get numbers in the table.  For example:

select put_tokens2(1, '{"\\246\\323\\061\\332\\277"}', 1, 1, 10000);

produces this:

 id |      token      | spam_count | ham_count | atime
----+-----------------+------------+-----------+-------
  1 | 246323061332277 |          1 |         1 | 10000

I'm not sure why this is happening, perhaps the problem is obvious to you?

Thanks,
schu

pgsql-performance by date:

Previous
From: "Alon Goldshuv"
Date:
Subject: Re: [PATCHES] COPY FROM performance improvements
Next
From: PFC
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0