Home > mailing lists

Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Performance problems testing with Spamassassin 3.1.0
Date	August 4, 2005 13:16:24
Msg-id	16345.1123172170@sss.pgh.pa.us Whole thread Raw
In response to	Re: Performance problems testing with Spamassassin 3.1.0 (John A Meinel <john@arbash-meinel.com>)
List	pgsql-performance

Tree view

John A Meinel <john@arbash-meinel.com> writes:
> Tom Lane wrote:
>> I don't really see why you think that this path is going to lead to
>> better performance than where you were before.

> So for an IN (sub-select), does it actually pull all of the rows from
> the other table, or is the planner smart enough to stop once it finds
> something?

It stops when it finds something --- but it's still a join operation
in essence.  I don't see that putting the values one by one into a table
and then joining is going to be a win compared to just processing the
values one at a time against the main table.

> Is IN (sub-select) about the same as EXISTS (sub-select WHERE x=y)?
> What about NOT IN (sub-select) versus NOT EXISTS (sub-select WHERE x=y)

The EXISTS variants are actually worse, because we've not spent as much
time teaching the planner how to optimize them.  There's effectively
only one decent plan for an EXISTS, which is that the subselect's "x" is
indexed and we do an indexscan probe using the outer "y" for each outer
row.  IN and NOT IN can do that, or several alternative plans that might
be better depending on data statistics.

However, that's cold comfort for Matthew's application -- the only way
he'd get any benefit from all those planner smarts is if he ANALYZEs
the temp table after loading it and then EXECUTEs the main query (so
that it gets re-planned every time).  Plus, at least some of those
alternative plans would require an index on the temp table, which is
unlikely to be worth the cost of setting up.  And finally, this
formulation requires separate IN and NOT IN tests that are necessarily
going to do a lot of redundant work.

There's enough overhead here that I find it highly doubtful that it'll
be a win compared to the original approach of retail queries against the
main table.

            regards, tom lane

pgsql-performance by date:

From: Matthew Schumacher
Date: 04 August 2005, 13:16:00
Subject: Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher
Date: 04 August 2005, 13:32:08
Subject: Re: Performance problems testing with Spamassassin 3.1.0

Re: Performance problems testing with Spamassassin 3.1.0 - Mailing list pgsql-performance

Previous

Next