Re: [HACKERS] Idea for speeding up uncorrelated subqueries - Mailing list pgsql-hackers

From Vadim Mikheev
Subject Re: [HACKERS] Idea for speeding up uncorrelated subqueries
Date
Msg-id 37AA50C6.D43FC84E@krs.ru
Whole thread Raw
In response to Re: [HACKERS] Idea for speeding up uncorrelated subqueries  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] Idea for speeding up uncorrelated subqueries
List pgsql-hackers
Bruce Momjian wrote:
> 
> Suppose you have a subquery that returns 1000 rows.  There is no code so
> lookups of the inner table are indexed:
> 
>         select *
>         from tab
>         where col in (select col2 from tab2)
> 
> In this case, a sequential scan of the subquery results are required.  I
> didn't think the subquery was executed every time it needed to see if
> col1 was in the subquery.

Oh, well. These are cases when query may be rewritten with EXISTS.
This is possible when there are no aggregates in subquery.

> > After looking at subselect.c I think I understand why --- InitPlans are
> > only for subqueries that are known to return a *single* reslt.  When you
> > have a subquery that might potentially return many, many tuples, you
> > need to scan through those tuples, so we use SubPlan tactics even if
> > there's not a query correlation.

Yes. But as I said already, you can use InitPlan (i.e. execute subquery
first) after removing duplicates from subquery results.

> > But this neglects the cost of re-executing the subplan over and over.
> > Materializing the result should help, no?  (Of course there are cases

We could not only cache subquery results (materialization) but also
hash them. 

> > where it won't, such as when the subplan is just an unqualified select,
> > but most of the time it should be a win, I think...)

In such cases, if there are no aggregates in subquery then EXISTS
could be used else materialization will still help. 

> No what Vadim is done MVCC, I would like to bug him to improve subquery
> performance.  We are tweeking the optimizer, but we have this huge
> subquery performance problem here.

No, Bruce. I'm in WAL now. I think that we need in recovery
(remember that you'll lose indices being updated when some
crash took place), fast backup (it's easy to copy part of log 
than dump 1Gb table), fast commits (<= 1 fsync per commit
using group commit, instead of >= 2 fsyncs now), savepoints 
AND buffered logging, which you, Bruce, want so much, 
and so long -:).

Vadim


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries