Jim C. Nasby wrote:
> On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
>
>>Ok, here is the current plan.
>>
>>Change the spamassassin API to pass a hash of tokens into the storage
>>module, pass the tokens to the proc as an array, start a transaction,
>>load the tokens into a temp table using copy, select the tokens distinct
>>into the token table for new tokens, update the token table for known
>>tokens, then commit.
>
>
> You might consider:
> UPDATE tokens
> FROM temp_table (this updates existing records)
>
> INSERT INTO tokens
> SELECT ...
> FROM temp_table
> WHERE NOT IN (SELECT ... FROM tokens)
>
> This way you don't do an update to newly inserted tokens, which helps
> keep vacuuming needs in check.
The subselect might be quite a big set, so avoiding a full table scan
and materialization by
DELETE temp_table
WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;
or
INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL
might be an additional win, assuming that only a small fraction of
tokens is inserted and updated.
Regards,
Andreas