Re: slow IN() clause for many cases - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: slow IN() clause for many cases
Date
Msg-id 1133304818.2906.487.camel@localhost.localdomain
Whole thread Raw
In response to Re: slow IN() clause for many cases  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: slow IN() clause for many cases  (Martijn van Oosterhout <kleptog@svana.org>)
Re: slow IN() clause for many cases  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Tue, 2005-11-29 at 17:21 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Do you think we'll be able to generate a single ScalarArrayOpExpr from a
> > small subselect and pass it through as an indexable expression?
> 
> If you don't mind spelling it with the ARRAY(sub-select) syntax, which
> I think is a Postgres-ism (though it's possible Joe got it from
> SQL2003).
> 
> regression=# explain select * from tenk1 where unique1 = any (array(select f1 from int4_tbl));
>                                  QUERY PLAN
> -----------------------------------------------------------------------------
>  Bitmap Heap Scan on tenk1  (cost=3.09..37.86 rows=10 width=244)
>    Recheck Cond: (unique1 = ANY ($0))
>    InitPlan
>      ->  Seq Scan on int4_tbl  (cost=0.00..1.05 rows=5 width=4)
>    ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.04 rows=10 width=0)
>          Index Cond: (unique1 = ANY ($0))
> (6 rows)
> 
> Of course the planner is just guessing about how many rows this will
> produce.

So we could teach the planner to transform:

IN (subselect) 

into

= ANY(array(subselect))

if we had the planner think the subselect had say < 1000 rows?

> > e.g.
> > Select * From Sales where month IN (
> > select month from time_dimension where FinYear = 2005 and Quarter = 3)
> 
> > Having taught predtest.c about ScalarArrayOpExpr means that would allow
> > this to work with constraint exclusion.
> 
> Not hardly, unless you want to play fast and loose with semantics by
> evaluating subselects at plan time instead of run time.  You could
> persuade that to happen by wrapping the ARRAY(sub-select) into a
> function mis-declared as IMMUTABLE, but I'd be pretty resistant to
> having the planner assume any such thing by default.

Man, thats a horrible thought. I must be dragging you down :-)

IMHO the only way to do joins that access partitions is to do the
constraint exclusion at run time, but I can see thats a longer
conversation than I can start right now.

Best Regards, Simon Riggs




pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: slow IN() clause for many cases
Next
From: Bruce Momjian
Date:
Subject: Re: Using multi-row technique with COPY