Re: Performance problems testing with Spamassassin 3.1.0

From: Tom Lane
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: ,
Msg-id: 16345.1123172170@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel)
List: pgsql-performance

Tree view

Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Matthew Schumacher, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Josh Berkus, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Dennis Bjorklund, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Josh Berkus, )
     Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  (Tom Lane, )
  Re: Performance problems testing with Spamassassin  ("Luke Lonergan", )
 Re: Performance problems testing with Spamassassin 3.1.0  (Karim Nassar, )
  Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Gavin Sherry, )
   Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
    Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
     Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
     Re: Performance problems testing with Spamassassin 3.1.0  (Andrew McMillan, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
      Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
       Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
        Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
         Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
          Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
           Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
            Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
             Re: Performance problems testing with Spamassassin 3.1.0  (Michael Parker, )
              Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (PFC, )
               Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
               Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane, )
                Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                 Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                  Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
                   Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
                    Re: Performance problems testing with Spamassassin 3.1.0  (John A Meinel, )
        Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
         Re: Performance problems testing with Spamassassin 3.1.0  (John Arbash Meinel, )
          Re: Performance problems testing with Spamassassin 3.1.0  (Matthew Schumacher, )
           Re: Performance problems testing with Spamassassin 3.1.0  ("Jim C. Nasby", )
            Re: Performance problems testing with Spamassassin 3.1.0  (Andreas Pflug, )
 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.  ("Merlin Moncure", )

John A Meinel <> writes:
> Tom Lane wrote:
>> I don't really see why you think that this path is going to lead to
>> better performance than where you were before.

> So for an IN (sub-select), does it actually pull all of the rows from
> the other table, or is the planner smart enough to stop once it finds
> something?

It stops when it finds something --- but it's still a join operation
in essence.  I don't see that putting the values one by one into a table
and then joining is going to be a win compared to just processing the
values one at a time against the main table.

> Is IN (sub-select) about the same as EXISTS (sub-select WHERE x=y)?
> What about NOT IN (sub-select) versus NOT EXISTS (sub-select WHERE x=y)

The EXISTS variants are actually worse, because we've not spent as much
time teaching the planner how to optimize them.  There's effectively
only one decent plan for an EXISTS, which is that the subselect's "x" is
indexed and we do an indexscan probe using the outer "y" for each outer
row.  IN and NOT IN can do that, or several alternative plans that might
be better depending on data statistics.

However, that's cold comfort for Matthew's application -- the only way
he'd get any benefit from all those planner smarts is if he ANALYZEs
the temp table after loading it and then EXECUTEs the main query (so
that it gets re-planned every time).  Plus, at least some of those
alternative plans would require an index on the temp table, which is
unlikely to be worth the cost of setting up.  And finally, this
formulation requires separate IN and NOT IN tests that are necessarily
going to do a lot of redundant work.

There's enough overhead here that I find it highly doubtful that it'll
be a win compared to the original approach of retail queries against the
main table.

            regards, tom lane


pgsql-performance by date:

From: John A Meinel
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0
From: Matthew Schumacher
Date:
Subject: Re: Performance problems testing with Spamassassin 3.1.0