why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? - Mailing list pgsql-performance

From Frank van Vugt
Subject why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??
Date
Msg-id 200406150849.29470.ftm.van.vugt@foxi.nl
Whole thread Raw
Responses Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??
List pgsql-performance
Hi all,

I noticed the following. Given two tables, just simply articles and their
packages:

    article(id int)
    package( id int, article_id int, amount)

When taking the minimum package for articles given some constraint on the
article table

    select
        article.id,
        p_min
    from
        article,
        (select
            article_id,
            min(amount) as p_min
        from
            package
        group by
            article_id
        ) as foo
    where
        article.id = foo.article_id and
        article.id < 50;

The query plan shows

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Hash Join  (cost=730.11..808.02 rows=1 width=8)
   Hash Cond: ("outer".article_id = "inner".id)
   ->  Subquery Scan foo  (cost=726.10..781.74 rows=4451 width=8)
         ->  HashAggregate  (cost=726.10..737.23 rows=4451 width=8)
               ->  Seq Scan on package  (cost=0.00..635.40 rows=18140 width=8)
   ->  Hash  (cost=4.01..4.01 rows=1 width=4)
         ->  Index Scan using article_pkey on article  (cost=0.00..4.01 rows=1
width=4)
               Index Cond: (id < 50)

So, a seq scan on the complete package table is done, although it seems the
planner could have deduced that only the part that has 'article_id < 50'
would be relevant, since 'article.id < 50' and 'article.id = foo.article_id'

If I ditch the group by, then this contraint does get pushed into the
subselect :

    select
        article.id,
        p_min
    from
        article,
        (select
            article_id,
            amount as p_min
        from
            package
        ) as foo
    where
        article.id = foo.article_id and
        article.id < 50;


                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..14.22 rows=5 width=8)
   ->  Index Scan using article_pkey on article  (cost=0.00..4.01 rows=1
width=4)
         Index Cond: (id < 50)
   ->  Index Scan using package_idx3 on package  (cost=0.00..10.16 rows=4
width=8)
         Index Cond: ("outer".id = package.article_id)


So it seems the math-knowledge is there ;)
I'm wondering about what I'm overlooking here. When I take the first query and
add the article constraint 'manually', I get:

    select
        article.id,
        p_min
    from
        article,
        (select
            article_id,
            min(amount) as p_min
        from
            package
        where
            article_id < 50
        group by
            article_id
        ) as foo
    where
        article.id = foo.article_id and
        article.id < 50;

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..9.65 rows=1 width=8)
   Join Filter: ("outer".id = "inner".article_id)
   ->  Index Scan using article_pkey on article  (cost=0.00..4.01 rows=1
width=4)
         Index Cond: (id < 50)
   ->  Subquery Scan foo  (cost=0.00..5.62 rows=1 width=8)
         ->  GroupAggregate  (cost=0.00..5.61 rows=1 width=8)
               ->  Index Scan using package_idx3 on package  (cost=0.00..5.60
rows=2 width=8)
                     Index Cond: (article_id < 50)


which would have been a nice plan for the first query ;)


Could someone shine a light on this, please?



--
Best,




Frank.


pgsql-performance by date:

Previous
From: "Neeraj "
Date:
Subject: Interpreting OSDB Results
Next
From: "Jeremy Dunn"
Date:
Subject: Re: 50 000 000 Table entries and I have to do a keyword search HELP NEEDED