Thread: Pushing IN (subquery) down through UNION ALL?

Pushing IN (subquery) down through UNION ALL?

From
Dave Johansen
Date:
I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on two joins and it doesn't seem to want to push the IN (subquery) optimization down into the plan for the two queries being unioned. Is there something I can do to fix this? Or is it just a limitation of the planner/optimizer?

I also tried this with 8.4.7 and it seemed to exhibit the same behaviour, so here's an example of what I'm talking about (obviously in a real system I'd have indexes and all that other fun stuff):

CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM addresses1 AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT u.id, u.name, a.value FROM addresses2 AS a JOIN users AS u ON a.userid=u.id;


Here's the EXPLAIN output for two example queries:

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT id FROM users WHERE name='A');
                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=2.15..5.58 rows=1 width=40) (actual time=0.144..0.340 rows=3 loops=1)
  Hash Cond: (u.id = users.id)
  ->  Append  (cost=1.09..4.48 rows=9 width=40) (actual time=0.059..0.239 rows=9 loops=1)
        ->  Hash Join  (cost=1.09..2.19 rows=4 width=10) (actual time=0.055..0.075 rows=4 loops=1)
              Hash Cond: (a.userid = u.id)
              ->  Seq Scan on addresses1 a  (cost=0.00..1.04 rows=4 width=8) (actual time=0.006..0.013 rows=4 loops=1)
              ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual time=0.019..0.019 rows=4 loops=1)
                    ->  Seq Scan on users u  (cost=0.00..1.04 rows=4 width=6) (actual time=0.003..0.008 rows=4 loops=1)
        ->  Hash Join  (cost=1.09..2.21 rows=5 width=10) (actual time=0.109..0.133 rows=5 loops=1)
              Hash Cond: (a.userid = u.id)
              ->  Seq Scan on addresses2 a  (cost=0.00..1.05 rows=5 width=8) (actual time=0.004..0.012 rows=5 loops=1)
              ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual time=0.020..0.020 rows=4 loops=1)
                    ->  Seq Scan on users u  (cost=0.00..1.04 rows=4 width=6) (actual time=0.004..0.010 rows=4 loops=1)
  ->  Hash  (cost=1.05..1.05 rows=1 width=4) (actual time=0.053..0.053 rows=1 loops=1)
        ->  Seq Scan on users  (cost=0.00..1.05 rows=1 width=4) (actual time=0.032..0.040 rows=1 loops=1)
              Filter: (name = 'A'::text)
 Total runtime: 0.519 ms
(17 rows)

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1);
                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114 rows=3 loops=1)
  ->  Append  (cost=0.00..4.27 rows=3 width=40) (actual time=0.049..0.101 rows=3 loops=1)
        ->  Nested Loop  (cost=0.00..2.12 rows=2 width=10) (actual time=0.046..0.063 rows=2 loops=1)
              ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6) (actual time=0.025..0.028 rows=1 loops=1)
                    Filter: (id = 1)
              ->  Seq Scan on addresses1 a  (cost=0.00..1.05 rows=2 width=8) (actual time=0.009..0.017 rows=2 loops=1)
                    Filter: (a.userid = 1)
        ->  Nested Loop  (cost=0.00..2.12 rows=1 width=10) (actual time=0.015..0.025 rows=1 loops=1)
              ->  Seq Scan on addresses2 a  (cost=0.00..1.06 rows=1 width=8) (actual time=0.005..0.008 rows=1 loops=1)
                    Filter: (userid = 1)
              ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6) (actual time=0.004..0.007 rows=1 loops=1)
                    Filter: (u.id = 1)
 Total runtime: 0.251 ms
(13 rows)

You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

Thanks,
Dave

Re: Pushing IN (subquery) down through UNION ALL?

From
Dave Johansen
Date:
On Thu, Feb 24, 2011 at 8:14 AM, Dave Johansen <davejohansen@gmail.com> wrote:
I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on two joins and it doesn't seem to want to push the IN (subquery) optimization down into the plan for the two queries being unioned. Is there something I can do to fix this? Or is it just a limitation of the planner/optimizer?

I also tried this with 8.4.7 and it seemed to exhibit the same behaviour, so here's an example of what I'm talking about (obviously in a real system I'd have indexes and all that other fun stuff):

CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM addresses1 AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT u.id, u.name, a.value FROM addresses2 AS a JOIN users AS u ON a.userid=u.id;


Here's the EXPLAIN output for two example queries:

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT id FROM users WHERE name='A');
                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
  Hash Semi Join  (cost=2.15..5.58 rows=1 width=40) (actual time=0.144..0.340 rows=3 loops=1)
  Hash Cond: (u.id = users.id)
  ->  Append  (cost=1.09..4.48 rows=9 width=40) (actual time=0.059..0.239 rows=9 loops=1)
        ->  Hash Join  (cost=1.09..2.19 rows=4 width=10) (actual time=0.055..0.075 rows=4 loops=1)
              Hash Cond: (a.userid = u.id)
              ->  Seq Scan on addresses1 a  (cost=0.00..1.04 rows=4 width=8) (actual time=0.006..0.013 rows=4 loops=1)
              ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual time=0.019..0.019 rows=4 loops=1)
                    ->  Seq Scan on users u  (cost=0.00..1.04 rows=4 width=6) (actual time=0.003..0.008 rows=4 loops=1)
        ->  Hash Join  (cost=1.09..2.21 rows=5 width=10) (actual time=0.109..0.133 rows=5 loops=1)
              Hash Cond: (a.userid = u.id)
              ->  Seq Scan on addresses2 a  (cost=0.00..1.05 rows=5 width=8) (actual time=0.004..0.012 rows=5 loops=1)
              ->  Hash  (cost=1.04..1.04 rows=4 width=6) (actual time=0.020..0.020 rows=4 loops=1)
                    ->  Seq Scan on users u  (cost=0.00..1.04 rows=4 width=6) (actual time=0.004..0.010 rows=4 loops=1)
  ->  Hash  (cost=1.05..1.05 rows=1 width=4) (actual time=0.053..0.053 rows=1 loops=1)
        ->  Seq Scan on users  (cost=0.00..1.05 rows=1 width=4) (actual time=0.032..0.040 rows=1 loops=1)
              Filter: (name = 'A'::text)
 Total runtime: 0.519 ms
(17 rows)

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1);
                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114 rows=3 loops=1)
  ->  Append  (cost=0.00..4.27 rows=3 width=40) (actual time=0.049..0.101 rows=3 loops=1)
        ->  Nested Loop  (cost=0.00..2.12 rows=2 width=10) (actual time=0.046..0.063 rows=2 loops=1)
              ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6) (actual time=0.025..0.028 rows=1 loops=1)
                    Filter: (id = 1)
              ->  Seq Scan on addresses1 a  (cost=0.00..1.05 rows=2 width=8) (actual time=0.009..0.017 rows=2 loops=1)
                    Filter: (a.userid = 1)
        ->  Nested Loop  (cost=0.00..2.12 rows=1 width=10) (actual time=0.015..0.025 rows=1 loops=1)
              ->  Seq Scan on addresses2 a  (cost=0.00..1.06 rows=1 width=8) (actual time=0.005..0.008 rows=1 loops=1)
                    Filter: (userid = 1)
              ->  Seq Scan on users u  (cost=0.00..1.05 rows=1 width=6) (actual time=0.004..0.007 rows=1 loops=1)
                    Filter: (u.id = 1)
 Total runtime: 0.251 ms
(13 rows)

You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

Thanks,
Dave

I also just noticed that an ORDER BY x LIMIT n optimization is not pushed down through the UNION ALL as well. I understand that this may be a little trickier because the ORDER BY and LIMIT would need to be applied to the subqueries and then re-applied after the APPEND, but is there some way to get either the previous issue or this issue to optimize as desired? Or do I just need to change my schema to not use two separate tables with a VIEW and a UNION ALL?

Thanks again,
Dave

Re: Pushing IN (subquery) down through UNION ALL?

From
Vik Reykja
Date:
On Thu, Feb 24, 2011 at 16:14, Dave Johansen <davejohansen@gmail.com> wrote:
You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id = ANY (array(SELECT id FROM users WHERE name='A'));

(Tested on 9.0.3)

Re: Pushing IN (subquery) down through UNION ALL?

From
Dave Johansen
Date:
On Thu, Feb 24, 2011 at 12:33 PM, Vik Reykja <vikreykja@gmail.com> wrote:
On Thu, Feb 24, 2011 at 16:14, Dave Johansen <davejohansen@gmail.com> wrote:
You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id = ANY (array(SELECT id FROM users WHERE name='A'));

(Tested on 9.0.3)

I just tested that on 8.3.3 and it performed quickly like I expected the other query to, so that did the trick.
Thanks a ton,
Dave

Re: Pushing IN (subquery) down through UNION ALL?

From
Vik Reykja
Date:
On Thu, Feb 24, 2011 at 20:56, Dave Johansen <davejohansen@gmail.com> wrote:
On Thu, Feb 24, 2011 at 12:33 PM, Vik Reykja <vikreykja@gmail.com> wrote:
On Thu, Feb 24, 2011 at 16:14, Dave Johansen <davejohansen@gmail.com> wrote:
You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id = ANY (array(SELECT id FROM users WHERE name='A'));

(Tested on 9.0.3)

I just tested that on 8.3.3 and it performed quickly like I expected the other query to, so that did the trick.

Is there any good reason you're not using 8.3.14?
 
Thanks a ton,

You're welcome.

Re: Pushing IN (subquery) down through UNION ALL?

From
Dave Johansen
Date:
On Thu, Feb 24, 2011 at 12:59 PM, Vik Reykja <vikreykja@gmail.com> wrote:
On Thu, Feb 24, 2011 at 20:56, Dave Johansen <davejohansen@gmail.com> wrote:
On Thu, Feb 24, 2011 at 12:33 PM, Vik Reykja <vikreykja@gmail.com> wrote:
On Thu, Feb 24, 2011 at 16:14, Dave Johansen <davejohansen@gmail.com> wrote:
You'll notice that the subquery version is doing the full join and then the filtering, but the explicitly listed version pushing the filtering into the plan before the join. Is there a way to make the subquery version perform the same optimization?

EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id = ANY (array(SELECT id FROM users WHERE name='A'));

(Tested on 9.0.3)

I just tested that on 8.3.3 and it performed quickly like I expected the other query to, so that did the trick.

Is there any good reason you're not using 8.3.14?

No, I just haven't taken the time to do the upgrade on all of our systems. It is definitely something that I have started to consider more strongly though.

Dave

Re: Pushing IN (subquery) down through UNION ALL?

From
Robert Haas
Date:
On Thu, Feb 24, 2011 at 11:38 AM, Dave Johansen <davejohansen@gmail.com> wrote:
> I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
> down through the UNION ALL as well. I understand that this may be a little
> trickier because the ORDER BY and LIMIT would need to be applied to the
> subqueries and then re-applied after the APPEND,

PostgreSQL 9.1 will know how to do this, FWIW.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pushing IN (subquery) down through UNION ALL?

From
Thom Brown
Date:
On 2 March 2011 19:38, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 24, 2011 at 11:38 AM, Dave Johansen <davejohansen@gmail.com> wrote:
>> I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
>> down through the UNION ALL as well. I understand that this may be a little
>> trickier because the ORDER BY and LIMIT would need to be applied to the
>> subqueries and then re-applied after the APPEND,
>
> PostgreSQL 9.1 will know how to do this, FWIW.

Out of curiosity, what was the commit for this?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Pushing IN (subquery) down through UNION ALL?

From
Robert Haas
Date:
On Wed, Mar 2, 2011 at 9:11 AM, Thom Brown <thom@linux.com> wrote:
> On 2 March 2011 19:38, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Feb 24, 2011 at 11:38 AM, Dave Johansen <davejohansen@gmail.com> wrote:
>>> I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
>>> down through the UNION ALL as well. I understand that this may be a little
>>> trickier because the ORDER BY and LIMIT would need to be applied to the
>>> subqueries and then re-applied after the APPEND,
>>
>> PostgreSQL 9.1 will know how to do this, FWIW.
>
> Out of curiosity, what was the commit for this?

11cad29c91524aac1d0b61e0ea0357398ab79bf8 Support MergeAppend plans, to
allow sorted output from append relations.
034967bdcbb0c7be61d0500955226e1234ec5f04 Reimplement planner's
handling of MIN/MAX aggregate optimization.
947d0c862c895618a874344322e7b07c9df05cb2 Use appendrel planning logic
for top-level UNION ALL structures.
6fbc323c8042303a737028f9da7616896bccc517 Further fallout from the
MergeAppend patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pushing IN (subquery) down through UNION ALL?

From
Thom Brown
Date:
On 2 March 2011 19:52, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 2, 2011 at 9:11 AM, Thom Brown <thom@linux.com> wrote:
>> On 2 March 2011 19:38, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Thu, Feb 24, 2011 at 11:38 AM, Dave Johansen <davejohansen@gmail.com> wrote:
>>>> I also just noticed that an ORDER BY x LIMIT n optimization is not pushed
>>>> down through the UNION ALL as well. I understand that this may be a little
>>>> trickier because the ORDER BY and LIMIT would need to be applied to the
>>>> subqueries and then re-applied after the APPEND,
>>>
>>> PostgreSQL 9.1 will know how to do this, FWIW.
>>
>> Out of curiosity, what was the commit for this?
>
> 11cad29c91524aac1d0b61e0ea0357398ab79bf8 Support MergeAppend plans, to
> allow sorted output from append relations.
> 034967bdcbb0c7be61d0500955226e1234ec5f04 Reimplement planner's
> handling of MIN/MAX aggregate optimization.
> 947d0c862c895618a874344322e7b07c9df05cb2 Use appendrel planning logic
> for top-level UNION ALL structures.
> 6fbc323c8042303a737028f9da7616896bccc517 Further fallout from the
> MergeAppend patch.

Erk.. I see.  Thanks :)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935