Home > mailing lists

Re: Cost estimation problem on seq scan in a loop - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Cost estimation problem on seq scan in a loop
Date	December 17, 2013 00:10:06
Msg-id	CAM-w4HNaGJboH_xMsv=1xR3RTw8fJ6TE13E0uN_JcrEMa2J+2Q@mail.gmail.com Whole thread
In response to	Cost estimation problem on seq scan in a loop (Jeff Janes <jeff.janes@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Dec 16, 2013 at 11:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Is there some principled way to go about teaching the planner that hashing
> smallish_table on the join filter key is a cheap insurance policy against
> underestimating the row count of the outer loop?

The problem is that cheap protection can end up being very expensive
when it's the most expensive part of the query and it's repeated many
times. In this query it's expecting thousands of rows. The executor
goes to some effort to avoid having to do unnecessary copies of data
and be able to use it straight out of the disk buffers so having to
copy it an unnecessary time to a hash table would be annoying.

What's more likely, I think is having plan nodes that make decisions
at run-time. There's been some movement in this direction already and
lots of discussion about it. Having a join type that retrieves the
first few rows from the lhs and then decides whether to do a hash or
nested loop on the rhs based on how many it finds might be more
tractable than most other strategies.

-- 
greg

pgsql-hackers by date:

From: Tom Lane
Date: 16 December 2013, 23:59:16
Subject: Re: planner missing a trick for foreign tables w/OR conditions

From: Josh Berkus
Date: 17 December 2013, 00:16:07
Subject: Why no INSTEAD OF triggers on tables?

Re: Cost estimation problem on seq scan in a loop - Mailing list pgsql-hackers

Previous

Next