Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance
From | Matthew Schumacher |
---|---|
Subject | Re: Performance problems testing with Spamassassin 3.1.0 |
Date | |
Msg-id | 42F1CE43.8040801@aptalaska.net Whole thread Raw |
In response to | Re: Performance problems testing with Spamassassin 3.1.0 (Michael Parker <parkerm@pobox.com>) |
Responses |
Re: Performance problems testing with Spamassassin 3.1.0
Re: Performance problems testing with Spamassassin 3.1.0 Re: Performance problems testing with Spamassassin 3.1.0 |
List | pgsql-performance |
Okay, Here is the status of the SA updates and a question: Michael got SA changed to pass an array of tokens to the proc so right there we gained a ton of performance due to connections and transactions being grouped into one per email instead of one per token. Now I am working on making the proc even faster. Since we have all of the tokens coming in as an array, it should be possible to get this down to just a couple of queries. I have the proc using IN and NOT IN statements to update everything at once from a temp table, but it progressively gets slower because the temp table is growing between vacuums. At this point it's slightly slower than the old update or else insert on every token. What I really want to do is have the token array available as a record so that I can query against it, but not have it take up the resources of a real table. If I could copy from an array into a record then I can even get rid of the loop. Anyone have any thoughts on how to do this? CREATE OR REPLACE FUNCTION put_tokens(inuserid INTEGER, intokenary BYTEA[], inspam_count INTEGER, inham_count INTEGER, inatime INTEGER) RETURNS VOID AS ' DECLARE _token BYTEA; BEGIN for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1) LOOP _token := intokenary[i]; INSERT INTO bayes_token_tmp VALUES (_token); END LOOP; UPDATE bayes_token SET spam_count = greatest_int(spam_count + inspam_count, 0), ham_count = greatest_int(ham_count + inham_count , 0), atime = greatest_int(atime, 1000) WHERE id = inuserid AND (token) IN (SELECT intoken FROM bayes_token_tmp); UPDATE bayes_vars SET token_count = token_count + (SELECT count(intoken) FROM bayes_token_tmp WHERE intoken NOT IN (SELECT token FROM bayes_token)), newest_token_age = greatest_int(newest_token_age, inatime), oldest_token_age = least_int(oldest_token_age, inatime) WHERE id = inuserid; INSERT INTO bayes_token SELECT inuserid, intoken, inspam_count, inham_count, inatime FROM bayes_token_tmp WHERE (inspam_count > 0 OR inham_count > 0) AND (intoken) NOT IN (SELECT token FROM bayes_token); delete from bayes_token_tmp; RETURN; END; ' LANGUAGE 'plpgsql'; CREATE OR REPLACE FUNCTION greatest_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;' LANGUAGE SQL; CREATE OR REPLACE FUNCTION least_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;' LANGUAGE SQL;
pgsql-performance by date: