Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From Andreas Pflug
Subject Re: Performance problems testing with Spamassassin 3.1.0
Date
Msg-id 42ED161F.1020408@pse-consulting.de
Whole thread Raw
In response to Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-performance
Jim C. Nasby wrote:
> On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
>
>>Ok, here is the current plan.
>>
>>Change the spamassassin API to pass a hash of tokens into the storage
>>module, pass the tokens to the proc as an array, start a transaction,
>>load the tokens into a temp table using copy, select the tokens distinct
>>into the token table for new tokens, update the token table for known
>>tokens, then commit.
>
>
> You might consider:
> UPDATE tokens
>     FROM temp_table (this updates existing records)
>
> INSERT INTO tokens
>     SELECT ...
>     FROM temp_table
>     WHERE NOT IN (SELECT ... FROM tokens)
>
> This way you don't do an update to newly inserted tokens, which helps
> keep vacuuming needs in check.

The subselect might be quite a big set, so avoiding a full table scan
and materialization by

DELETE temp_table
   WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;

or

INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL

might be an additional win, assuming that only a small fraction of
tokens is inserted and updated.

Regards,
Andreas

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Next
From: William Yu
Date:
Subject: Re: Performance problems on 4/8way Opteron (dualcore)