Re: Query plan question, and a memory leak - Mailing list pgsql-general

From Greg Stark
Subject Re: Query plan question, and a memory leak
Date
Msg-id 87lm0y9hxz.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: Query plan question, and a memory leak  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Query plan question, and a memory leak
List pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > 1) The following query has an odd plan that I can't figure out how to read. It
> >    seems to include the subplan twice, does that mean it's executing it twice?
>
> If you had done EXPLAIN ANALYZE, you would know ;-)

Well, not till later this week I wouldn't :)

> >    Even twice doesn't explain the cost which is much higher than similar plans
> >    that don't trigger the duplicate subplan. What am I doing wrong to trigger
> >    this behaviour?
>
> The inner nested loop's cost looks in line to me for one execution of
> the subplan and one execution of the inner indexscan for each tuple of
> the outer table (ad).  Given that the subplan depends on ad.ad_id
> there's really not any way to avoid re-executing it for each row of ad.
>
> The reason the outer nested loop looks so bad is you've got an
> unconstrained join to gg...

That's intentional, this is the same query I was trying to run earlier this
weekend and ran out of disk space for the results.

Even so the numbers don't seem right. Something really quite strange is going
on. Look at these plans. Adding the clause to actually exclude records where
the subquery don't find any records causes the cost to go through the roof.

I wouldn't expect it to actually take any more time. In fact I would expect it
to take a lot less time since it takes time to handle the resulting data too.

slo=> explain select * from (select a, (select b from b where b=a) as x from a) as z ;
                           QUERY PLAN
----------------------------------------------------------------
 Subquery Scan z  (cost=0.00..20.00 rows=1000 width=4)
   ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=4)
         SubPlan
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
(5 rows)

Time: 1.34 ms

slo=> explain select * from (select a, (select b from b where b=a) as x from a) as z where x is not null;
                           QUERY PLAN
----------------------------------------------------------------
 Subquery Scan z  (cost=0.00..22520.00 rows=995 width=4)
   ->  Seq Scan on a  (cost=0.00..22520.00 rows=995 width=4)
         Filter: ((subplan) IS NOT NULL)
         SubPlan
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
(8 rows)

Time: 1.68 ms


For this test I created these tables, there are no records in them:

slo=> \d a
       Table "public.a"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer |

slo=> \d b
       Table "public.b"
 Column |  Type   | Modifiers
--------+---------+-----------
 b      | integer |

slo=> \d c
       Table "public.c"
 Column |  Type   | Modifiers
--------+---------+-----------
 c      | integer |





> > 2) The version of the query at the bottom appears to trigger a big memory
> >    leak. The only difference is the addition of a "WHERE geom2 @ make_box()"
> >    clause. (make_box returns a box, the definition is below).
>
> What datatype is geom2?

It was a box as well. I've eliminated that column from the queries at least
for now though.

--
greg

pgsql-general by date:

Previous
From: "Nigel J. Andrews"
Date:
Subject: DBI driver and transactions
Next
From: Lincoln Yeoh
Date:
Subject: Re: DBI driver and transactions