Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance
From | John A Meinel |
---|---|
Subject | Re: Performance problems testing with Spamassassin 3.1.0 |
Date | |
Msg-id | 42F26E2C.8010607@arbash-meinel.com Whole thread Raw |
In response to | Re: Performance problems testing with Spamassassin 3.1.0 (Matthew Schumacher <matt.s@aptalaska.net>) |
Responses |
Re: Performance problems testing with Spamassassin 3.1.0
|
List | pgsql-performance |
Matthew Schumacher wrote: > Matthew Schumacher wrote: > >>Tom Lane wrote: >> >> >> >>>I don't really see why you think that this path is going to lead to >>>better performance than where you were before. Manipulation of the >>>temp table is never going to be free, and IN (sub-select) is always >>>inherently not fast, and NOT IN (sub-select) is always inherently >>>awful. Throwing a pile of simple queries at the problem is not >>>necessarily the wrong way ... especially when you are doing it in >>>plpgsql, because you've already eliminated the overhead of network >>>round trips and repeated planning of the queries. >>> >>> regards, tom lane >> >> >>The reason why I think this may be faster is because I would avoid >>running an update on data that needs to be inserted which saves >>searching though the table for a matching token. >> >>Perhaps I should do the insert first, then drop those tokens from the >>temp table, then do my updates in a loop. >> >>I'll have to do some benchmarking... >> >>schu > > > Tom, I think your right, whenever I do a NOT IN it does a full table > scan against bayes_token and since that table is going to get very big > doing the simple query in a loop that uses an index seems a bit faster. > > John, thanks for your help, it was worth a try, but it looks like the > looping is just faster. > > Here is what I have so far in case anyone else has ideas before I > abandon it: Surely this isn't what you have. You have *no* loop here, and you have stuff like: AND (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token); I'm guessing this isn't your last version of the function. As far as putting the CREATE TEMP TABLE inside the function, I think the problem is that the first time it runs, it compiles the function, and when it gets to the UPDATE/INSERT with the temporary table name, at compile time it hard-codes that table id. I tried getting around it by using "EXECUTE" which worked, but it made the function horribly slow. So I don't recommend it. Anyway, if you want us to evaluate it, you really need to send us the real final function. John =:-> > > CREATE OR REPLACE FUNCTION put_tokens(inuserid INTEGER, > intokenary BYTEA[], > inspam_count INTEGER, > inham_count INTEGER, > inatime INTEGER) > RETURNS VOID AS ' > DECLARE > _token BYTEA; > BEGIN > > UPDATE > bayes_token > SET > spam_count = greatest_int(spam_count + inspam_count, 0), > ham_count = greatest_int(ham_count + inham_count , 0), > atime = greatest_int(atime, inatime) > WHERE > id = inuserid > AND > (token) IN (SELECT bayes_token_tmp FROM bayes_token_tmp(intokenary)); > > UPDATE > bayes_vars > SET > token_count = token_count + ( > SELECT > count(bayes_token_tmp) > FROM > bayes_token_tmp(intokenary) > WHERE > bayes_token_tmp NOT IN (SELECT token FROM bayes_token)), > newest_token_age = greatest_int(newest_token_age, inatime), > oldest_token_age = least_int(oldest_token_age, inatime) > WHERE > id = inuserid; > > INSERT INTO > bayes_token > SELECT > inuserid, > bayes_token_tmp, > inspam_count, > inham_count, > inatime > FROM > bayes_token_tmp(intokenary) > WHERE > (inspam_count > 0 OR inham_count > 0) > AND > (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token); > > RETURN; > END; > ' LANGUAGE 'plpgsql'; > > CREATE OR REPLACE FUNCTION bayes_token_tmp(intokenary BYTEA[]) RETURNS > SETOF bytea AS > ' > BEGIN > for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1) > LOOP > return next intokenary[i]; > END LOOP; > RETURN; > end > ' > language 'plpgsql'; > > CREATE OR REPLACE FUNCTION greatest_int (integer, integer) > RETURNS INTEGER > IMMUTABLE STRICT > AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;' > LANGUAGE SQL; > > CREATE OR REPLACE FUNCTION least_int (integer, integer) > RETURNS INTEGER > IMMUTABLE STRICT > AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;' > LANGUAGE SQL; > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
Attachment
pgsql-performance by date: