Thread: Using IN with subselect

Using IN with subselect

From

Dave Smith

Date:

25 November 2004, 16:49:15

I have a query with an in subquery like

where x in (select x from y);

Now the subquery is not related to the outer query so it always returns
the same set. Is this subselect executed each time or just once? If it
is executed each time, if I create a function would that then be only
executed once?

--
Dave Smith
CANdata Systems Ltd
416-493-9020

Re: Using IN with subselect

From

Martijn van Oosterhout

Date:

25 November 2004, 17:12:21

Running EXPLAIN over the query will tell you...

On Thu, Nov 25, 2004 at 11:49:06AM -0500, Dave Smith wrote:
> I have a query with an in subquery like
>
> where x in (select x from y);
>
> Now the subquery is not related to the outer query so it always returns
> the same set. Is this subselect executed each time or just once? If it
> is executed each time, if I create a function would that then be only
> executed once?
>
> --
> Dave Smith
> CANdata Systems Ltd
> 416-493-9020
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-88432.dat

Re: Using IN with subselect

From

Dave Smith

Date:

25 November 2004, 17:19:28

Well here is explain. I would guess that it is executed each time ..
function any different?

HashAggregate  (cost=288.32..288.32 rows=1 width=32)
   ->  Hash IN Join  (cost=288.18..288.31 rows=1 width=32)
         Hash Cond: (("outer".gl_num)::text =
lpad(ltrim(("inner".account_num)::text, '0'::text), 9, ' '::text))
         ->  Subquery Scan journal_all  (cost=282.36..282.45 rows=2
width=64)
               ->  Unique  (cost=282.36..282.43 rows=2 width=159)
                     ->  Sort  (cost=282.36..282.36 rows=2 width=159)
                           Sort Key: objectid, owner_oid, source_code,
posting_date, control_num, reference, gl_num, gl_amt, distributed_amt,
"comment", operator_id, branch_id, company_id
                           ->  Append  (cost=0.00..282.35 rows=2
width=159)
                                 ->  Subquery Scan "*SELECT* 1"
(cost=0.00..265.24 rows=1 width=159)
                                       ->  Index Scan using journal_9 on
journal  (cost=0.00..265.23 rows=1 width=159)
                                             Index Cond: (company_id =
1000)
                                             Filter: ((posting_date >=
'2004-01-01'::date) AND (posting_date <= '2004-01-31'::date))
                                 ->  Subquery Scan "*SELECT* 2"
(cost=0.00..17.10 rows=1 width=159)
                                       ->  Index Scan using
journal_hist_7 on journal_hist  (cost=0.00..17.09 rows=1 width=159)
                                             Index Cond: (company_id =
1000)
                                             Filter: ((posting_date >=
'2004-01-01'::date) AND (posting_date <= '2004-01-31'::date))
         ->  Hash  (cost=5.83..5.83 rows=1 width=13)
               ->  Index Scan using glmast_index3 on glmast
(cost=0.00..5.83 rows=1 width=13)
                     Index Cond: ((company_id = 1000) AND
((control_type)::text = 'F'::text))

On Thu, 2004-11-25 at 12:11, Martijn van Oosterhout wrote:
> Running EXPLAIN over the query will tell you...
>
> On Thu, Nov 25, 2004 at 11:49:06AM -0500, Dave Smith wrote:
> > I have a query with an in subquery like
> >
> > where x in (select x from y);
> >
> > Now the subquery is not related to the outer query so it always returns
> > the same set. Is this subselect executed each time or just once? If it
> > is executed each time, if I create a function would that then be only
> > executed once?
> >
> > --
> > Dave Smith
> > CANdata Systems Ltd
> > 416-493-9020
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
--
Dave Smith
CANdata Systems Ltd
416-493-9020

Re: Using IN with subselect

From

Martijn van Oosterhout

Date:

25 November 2004, 17:26:12

On Thu, Nov 25, 2004 at 12:19:23PM -0500, Dave Smith wrote:
> Well here is explain. I would guess that it is executed each time ..
> function any different?

Hmm, if it's only executed once I would expect it to be an InitPlan.
Are you sure it's not correlated?

If you want to make sure, put the subquery in the FROM clause, that it
definitly will only be run once.

<snip>

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-88436.dat

Re: Using IN with subselect

From

Tom Lane

Date:

25 November 2004, 17:57:22

Dave Smith <dave.smith@candata.com> writes:
> Well here is explain. I would guess that it is executed each time ..
> function any different?

> HashAggregate  (cost=288.32..288.32 rows=1 width=32)
>    ->  Hash IN Join  (cost=288.18..288.31 rows=1 width=32)
>          ->  Subquery Scan journal_all  (cost=282.36..282.45 rows=2 width=64)
>          ->  Hash  (cost=5.83..5.83 rows=1 width=13)
>                ->  Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)

No ... this plan says to scan glmast once, load the selected rows into
an in-memory hash table, then scan journal_all once and probe the hash
table for matches.  It looks like a pretty decent choice of plan to me.

            regards, tom lane

Re: Using IN with subselect

From

Dave Smith

Date:

25 November 2004, 19:09:57

That's what I wanted it to do I just did not understand how to read the
explain. So is it HashAggregate that means this already loaded?

On Thu, 2004-11-25 at 12:57, Tom Lane wrote:
> Dave Smith <dave.smith@candata.com> writes:
> > Well here is explain. I would guess that it is executed each time ..
> > function any different?
>
> > HashAggregate  (cost=288.32..288.32 rows=1 width=32)
> >    ->  Hash IN Join  (cost=288.18..288.31 rows=1 width=32)
> >          ->  Subquery Scan journal_all  (cost=282.36..282.45 rows=2 width=64)
> >          ->  Hash  (cost=5.83..5.83 rows=1 width=13)
> >                ->  Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)
>
> No ... this plan says to scan glmast once, load the selected rows into
> an in-memory hash table, then scan journal_all once and probe the hash
> table for matches.  It looks like a pretty decent choice of plan to me.
>
>             regards, tom lane
--
Dave Smith
CANdata Systems Ltd
416-493-9020

Re: Using IN with subselect

From

Greg Stark

Date:

25 November 2004, 20:07:41

Dave Smith <dave.smith@candata.com> writes:

> That's what I wanted it to do I just did not understand how to read the
> explain. So is it HashAggregate that means this already loaded?

The HashAggregate Node is doing a GROUP BY command. (or the implicit GROUP BY
if you used an aggregate function without one.)

The "Hash IN Join" is a particular type of join that behaves the way Tom
describes. Each type of join behaves differently. A Nested Loop join would
execute one side once and the other many times. A Merge join executes both
sides once.

You could also run "explain analyze" instead of just "explain" and look at the
"loops" number that would tell you how many times the node actually was
executed.

> > Dave Smith <dave.smith@candata.com> writes:
> > > Well here is explain. I would guess that it is executed each time ..
> > > function any different?
> >
> > > HashAggregate  (cost=288.32..288.32 rows=1 width=32)
> > >    ->  Hash IN Join  (cost=288.18..288.31 rows=1 width=32)
> > >          ->  Subquery Scan journal_all  (cost=282.36..282.45 rows=2 width=64)
> > >          ->  Hash  (cost=5.83..5.83 rows=1 width=13)
> > >                ->  Index Scan using glmast_index3 on glmast (cost=0.00..5.83 rows=1 width=13)
> >
> > No ... this plan says to scan glmast once, load the selected rows into
> > an in-memory hash table, then scan journal_all once and probe the hash
> > table for matches.  It looks like a pretty decent choice of plan to me.

--
greg