Thread: Query plan question, and a memory leak

Query plan question, and a memory leak

From
Greg Stark
Date:
One question, and one possible bug report:

1) The following query has an odd plan that I can't figure out how to read. It
   seems to include the subplan twice, does that mean it's executing it twice?
   Even twice doesn't explain the cost which is much higher than similar plans
   that don't trigger the duplicate subplan. What am I doing wrong to trigger
   this behaviour?

2) The version of the query at the bottom appears to trigger a big memory
   leak. The only difference is the addition of a "WHERE geom2 @ make_box()"
   clause. (make_box returns a box, the definition is below). That version
   grows continuously, quickly reaching 200M before I kill it.

The queries are simplified versions of the actual query I'm working with, so
they might not make much logical sense, but they cause the same problems.


This is the query with the strange plan:

slo=> explain SELECT 1
                         FROM gg, ad, store_location
                        WHERE store_location_id = (
                               SELECT store_location_id
                                 FROM ad_store_location JOIN store_location USING (store_location_id)
                                WHERE ad_id = ad.ad_id
                                LIMIT 1
                               ) ;

slo-> slo-> slo(> slo(> slo(> slo(> slo(>                                                             QUERY PLAN
                                                     

-----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..12974034.71 rows=375971060 width=8)
   ->  Nested Loop  (cost=0.00..624633.51 rows=45140 width=8)
         ->  Seq Scan on ad  (cost=0.00..2597.40 rows=45140 width=4)
         ->  Index Scan using store_location_pkey on store_location  (cost=0.00..8.39 rows=1 width=4)
               Index Cond: (store_location.store_location_id = (subplan))
               SubPlan
                 ->  Limit  (cost=0.00..5.37 rows=1 width=8)
                       ->  Nested Loop  (cost=0.00..24.32 rows=5 width=8)
                             ->  Index Scan using idx_ad_store_location_ad on ad_store_location  (cost=0.00..10.60
rows=5width=4) 
                                   Index Cond: (ad_id = $0)
                             ->  Index Scan using store_location_pkey on store_location  (cost=0.00..3.02 rows=1
width=4)
                                   Index Cond: ("outer".store_location_id = store_location.store_location_id)
                 ->  Limit  (cost=0.00..5.37 rows=1 width=8)
                       ->  Nested Loop  (cost=0.00..24.32 rows=5 width=8)
                             ->  Index Scan using idx_ad_store_location_ad on ad_store_location  (cost=0.00..10.60
rows=5width=4) 
                                   Index Cond: (ad_id = $0)
                             ->  Index Scan using store_location_pkey on store_location  (cost=0.00..3.02 rows=1
width=4)
                                   Index Cond: ("outer".store_location_id = store_location.store_location_id)
   ->  Seq Scan on gg  (cost=0.00..190.29 rows=8329 width=0)





This is the query that triggers the memory leak:

slo=> explain SELECT 1
                         FROM gg, ad, store_location
                        WHERE store_location_id = (
                               SELECT store_location_id
                                 FROM ad_store_location JOIN store_location USING (store_location_id)
                                WHERE ad_id = ad.ad_id
                                  AND store_location.geom2 @ make_box(gg.longitude,gg.latitude,65)
                                LIMIT 1
                               ) ;

slo-> slo-> slo(> slo(> slo(> slo(> slo(> slo(>                                                          QUERY PLAN
                                                     

-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..19453708582.74 rows=375971060 width=24)
   ->  Nested Loop  (cost=0.00..12351998.60 rows=375971060 width=20)
         ->  Seq Scan on ad  (cost=0.00..2597.40 rows=45140 width=4)
         ->  Seq Scan on gg  (cost=0.00..190.29 rows=8329 width=16)
   ->  Index Scan using store_location_pkey on store_location  (cost=0.00..27.36 rows=1 width=4)
         Index Cond: (store_location.store_location_id = (subplan))
         SubPlan
           ->  Limit  (cost=0.00..24.34 rows=1 width=8)
                 ->  Nested Loop  (cost=0.00..24.34 rows=1 width=8)
                       ->  Index Scan using idx_ad_store_location_ad on ad_store_location  (cost=0.00..10.60 rows=5
width=4)
                             Index Cond: (ad_id = $0)
                       ->  Index Scan using store_location_pkey on store_location  (cost=0.00..3.02 rows=1 width=4)
                             Index Cond: ("outer".store_location_id = store_location.store_location_id)
                             Filter: (geom2 @ make_box($1, $2, 65::double precision))
           ->  Limit  (cost=0.00..24.34 rows=1 width=8)
                 ->  Nested Loop  (cost=0.00..24.34 rows=1 width=8)
                       ->  Index Scan using idx_ad_store_location_ad on ad_store_location  (cost=0.00..10.60 rows=5
width=4)
                             Index Cond: (ad_id = $0)
                       ->  Index Scan using store_location_pkey on store_location  (cost=0.00..3.02 rows=1 width=4)
                             Index Cond: ("outer".store_location_id = store_location.store_location_id)
                             Filter: (geom2 @ make_box($1, $2, 65::double precision))




This is the definition of make_box:

-- make_box(longitude, latitude, distance) --
CREATE OR REPLACE FUNCTION make_box(float,float,float) RETURNS box AS
'SELECT box(point(long-d_long,lat-d_lat),point(long+d_long,lat+d_lat))
   FROM (SELECT $1 AS long, $2 AS lat,
         $3*1000::float/1852::float/60::float as d_lat,
         $3*1000::float/1852::float/60::float/cos(radians($2)) as d_long
        ) as x'
LANGUAGE SQL
STRICT IMMUTABLE;


--
greg

Re: Query plan question, and a memory leak

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> 1) The following query has an odd plan that I can't figure out how to read. It
>    seems to include the subplan twice, does that mean it's executing it twice?

If you had done EXPLAIN ANALYZE, you would know ;-)

My guess is that because the subplan is part of an indexscan qual, it
appears in both the indexqual and 'indexqualorig' lists of the indexscan
node.  The copy in 'indexqualorig' will never be executed in this
example, though it could be if there were a multiple-index-scan plan
involved.

>    Even twice doesn't explain the cost which is much higher than similar plans
>    that don't trigger the duplicate subplan. What am I doing wrong to trigger
>    this behaviour?

The inner nested loop's cost looks in line to me for one execution of
the subplan and one execution of the inner indexscan for each tuple of
the outer table (ad).  Given that the subplan depends on ad.ad_id
there's really not any way to avoid re-executing it for each row of ad.

The reason the outer nested loop looks so bad is you've got an
unconstrained join to gg...

> 2) The version of the query at the bottom appears to trigger a big memory
>    leak. The only difference is the addition of a "WHERE geom2 @ make_box()"
>    clause. (make_box returns a box, the definition is below).

What datatype is geom2?

            regards, tom lane

Re: Query plan question, and a memory leak

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> 2) The version of the query at the bottom appears to trigger a big memory
>    leak.

> -- make_box(longitude, latitude, distance) --
> CREATE OR REPLACE FUNCTION make_box(float,float,float) RETURNS box AS
> 'SELECT box(point(long-d_long,lat-d_lat),point(long+d_long,lat+d_lat))
>    FROM (SELECT $1 AS long, $2 AS lat,
>          $3*1000::float/1852::float/60::float as d_lat,
>          $3*1000::float/1852::float/60::float/cos(radians($2)) as d_long
>         ) as x'
> LANGUAGE SQL
> STRICT IMMUTABLE;

I fooled around with this, and was able to verify a per-row memory leak
in 7.3 --- but it's gone in CVS tip, probably because of the work I did
recently to prevent memory leaks in SQL functions.

My guess is that you could work around the leak in 7.3 by flattening the
query in make_box into a single SELECT, instead of being cute with a
sub-select.

            regards, tom lane

Re: Query plan question, and a memory leak

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > 1) The following query has an odd plan that I can't figure out how to read. It
> >    seems to include the subplan twice, does that mean it's executing it twice?
>
> If you had done EXPLAIN ANALYZE, you would know ;-)

Well, not till later this week I wouldn't :)

> >    Even twice doesn't explain the cost which is much higher than similar plans
> >    that don't trigger the duplicate subplan. What am I doing wrong to trigger
> >    this behaviour?
>
> The inner nested loop's cost looks in line to me for one execution of
> the subplan and one execution of the inner indexscan for each tuple of
> the outer table (ad).  Given that the subplan depends on ad.ad_id
> there's really not any way to avoid re-executing it for each row of ad.
>
> The reason the outer nested loop looks so bad is you've got an
> unconstrained join to gg...

That's intentional, this is the same query I was trying to run earlier this
weekend and ran out of disk space for the results.

Even so the numbers don't seem right. Something really quite strange is going
on. Look at these plans. Adding the clause to actually exclude records where
the subquery don't find any records causes the cost to go through the roof.

I wouldn't expect it to actually take any more time. In fact I would expect it
to take a lot less time since it takes time to handle the resulting data too.

slo=> explain select * from (select a, (select b from b where b=a) as x from a) as z ;
                           QUERY PLAN
----------------------------------------------------------------
 Subquery Scan z  (cost=0.00..20.00 rows=1000 width=4)
   ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=4)
         SubPlan
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
(5 rows)

Time: 1.34 ms

slo=> explain select * from (select a, (select b from b where b=a) as x from a) as z where x is not null;
                           QUERY PLAN
----------------------------------------------------------------
 Subquery Scan z  (cost=0.00..22520.00 rows=995 width=4)
   ->  Seq Scan on a  (cost=0.00..22520.00 rows=995 width=4)
         Filter: ((subplan) IS NOT NULL)
         SubPlan
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
           ->  Seq Scan on b  (cost=0.00..22.50 rows=5 width=4)
                 Filter: (b = $0)
(8 rows)

Time: 1.68 ms


For this test I created these tables, there are no records in them:

slo=> \d a
       Table "public.a"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer |

slo=> \d b
       Table "public.b"
 Column |  Type   | Modifiers
--------+---------+-----------
 b      | integer |

slo=> \d c
       Table "public.c"
 Column |  Type   | Modifiers
--------+---------+-----------
 c      | integer |





> > 2) The version of the query at the bottom appears to trigger a big memory
> >    leak. The only difference is the addition of a "WHERE geom2 @ make_box()"
> >    clause. (make_box returns a box, the definition is below).
>
> What datatype is geom2?

It was a box as well. I've eliminated that column from the queries at least
for now though.

--
greg

Re: Query plan question, and a memory leak

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I wouldn't expect it to actually take any more time. In fact I would expect it
> to take a lot less time since it takes time to handle the resulting data too.

You're mistaking planner estimate time for reality ;-).

IIRC, the planner doesn't bother to account for evaluation time of
select-list values in its estimates.  At least in simple cases, there's
no point in doing that math because the cost will be the same no matter
what plan is chosen.

            regards, tom lane

Re: Query plan question, and a memory leak

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > I wouldn't expect it to actually take any more time. In fact I would expect it
> > to take a lot less time since it takes time to handle the resulting data too.
>
> You're mistaking planner estimate time for reality ;-).
>
> IIRC, the planner doesn't bother to account for evaluation time of
> select-list values in its estimates.  At least in simple cases, there's
> no point in doing that math because the cost will be the same no matter
> what plan is chosen.

Yeah after further thought I realized it makes sense for the optimizer not to
bother taking into account the result set since in theory the result set
should be the same regardless of the plan.

However I tested those queries with some data and things really do seem to be
behaving oddly. It takes nearly twice as long to run the version with the
where clause and duplicate subplan. And the analyze output seems to indicate
that it is in fact being executed.

Even then, the cost is way more than twice the cost without the where clause:

slo=> explain analyze select * from (select id, (select id from words where id=w.id) as x from words as w) as z ;
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Subquery Scan z  (cost=0.00..983.92 rows=45392 width=4) (actual time=14.02..1988.66 rows=45392 loops=1)
   ->  Seq Scan on words w  (cost=0.00..983.92 rows=45392 width=4) (actual time=14.01..1796.42 rows=45392 loops=1)
         SubPlan
           ->  Index Scan using idx on words  (cost=0.00..3.02 rows=1 width=4) (actual time=0.02..0.03 rows=1
loops=45392)
                 Index Cond: (id = $0)
 Total runtime: 2049.16 msec
(6 rows)

Time: 2050.95 ms
slo=> explain analyze select * from (select id, (select id from words where id=w.id) as x from words as w) as z where x
isnot null; 
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Subquery Scan z  (cost=0.00..138006.65 rows=45165 width=4) (actual time=2.19..3599.57 rows=45392 loops=1)
   ->  Seq Scan on words w  (cost=0.00..138006.65 rows=45165 width=4) (actual time=2.18..3417.73 rows=45392 loops=1)
         Filter: ((subplan) IS NOT NULL)
         SubPlan
           ->  Index Scan using idx on words  (cost=0.00..3.02 rows=1 width=4) (actual time=0.02..0.03 rows=1
loops=45392)
                 Index Cond: (id = $0)
           ->  Index Scan using idx on words  (cost=0.00..3.02 rows=1 width=4) (actual time=0.02..0.03 rows=1
loops=45392)
                 Index Cond: (id = $0)
 Total runtime: 3662.43 msec
(9 rows)

Time: 3664.63 ms



--
greg

Re: Query plan question, and a memory leak

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> However I tested those queries with some data and things really do seem to be
> behaving oddly. It takes nearly twice as long to run the version with the
> where clause and duplicate subplan.

Yes, they will both be executed --- there's no
duplicate-subexpression-elimination logic...

            regards, tom lane

Re: Query plan question, and a memory leak

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > However I tested those queries with some data and things really do seem to be
> > behaving oddly. It takes nearly twice as long to run the version with the
> > where clause and duplicate subplan.
>
> Yes, they will both be executed --- there's no
> duplicate-subexpression-elimination logic...

But, but, there was no common subexpression in the original query. The only
common subexpression was introduced by the optimizer itself at some point.

The query was of the form
 "select * from (select <subquery> as foo) where foo is not null".

Is there some way to force the optimizer not to substitute the subquery in the
where clause? I mean, I can't do a materialize, I don't have the temporary
space for all the records, but is there some way to get it to assemble the
data for the single record and then apply the where clause on that?

Thanks again for all the help. I'm sorry if I'm becoming a pest but this query
may be really important to my project.


--
greg

Re: Query plan question, and a memory leak

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Is there some way to force the optimizer not to substitute the subquery in the
> where clause?

You could try tinkering with the rules for invoking subquery_push_qual
in src/backend/optimizer/path/allpaths.c.  This might be a case that
would fall under the note there wondering if pushing down can ever
result in a worse plan.  I'm not sure though that we can tell the
difference reliably...

            regards, tom lane

Re: Query plan question, and a memory leak

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Is there some way to force the optimizer not to substitute the subquery in the
> > where clause?
>
> You could try tinkering with the rules for invoking subquery_push_qual
> in src/backend/optimizer/path/allpaths.c.  This might be a case that
> would fall under the note there wondering if pushing down can ever
> result in a worse plan.  I'm not sure though that we can tell the
> difference reliably...

Indeed changing
 select * from (select <subquery> as foo) where foo is not null
into
 select * from (select <subquery> as foo) where (select foo) is not null

causes that code path to give up on inlining the subplan.

Thanks for the pointer to the part of the code involved.

Perhaps it should check not just whether the where clause involves a subplan
but also whether expression it's substituting involves a subplan? There may be
cases where it would be advantageous to inline a sub plan though, it just
seems like it wouldn't be the majority.

I guess in an ideal world the optimizer would consider both possibilities and
choose based on the cost. Does the optimizer only use the costs for choosing
join orders and methods and not for deciding whether to make other
transformations?

--
greg

Re: Query plan question, and a memory leak

From
Greg Stark
Date:
From the 7.3 documentation:

> Currently, functions returning sets may also be called in the target list of a
> SELECT query. For each row that the SELECT generates by itself, the function
> returning set is invoked, and an output row is generated for each element of
> the function's result set. Note, however, that this capability is deprecated
> and may be removed in future releases.

Is this still considered true? It turns out this may be precisely the
functionality I need to implement something. But I wouldn't want to use it if
it's likely to disappear in 7.4 or soon. The only alternative I see is to
write a PL/pgSQL loop myself.

--
greg