Re: LIMIT and UNION ALL - Mailing list pgsql-performance

From Dave Johansen
Subject Re: LIMIT and UNION ALL
Date
Msg-id BANLkTiki32HzEYh8f36p8MB5K3jTSnf5BA@mail.gmail.com
Whole thread Raw
In response to Re: LIMIT and UNION ALL  (Robert Klemme <shortcutter@googlemail.com>)
Responses Re: LIMIT and UNION ALL
List pgsql-performance
On Wed, May 18, 2011 at 8:54 AM, Robert Klemme <shortcutter@googlemail.com> wrote:
On Wed, May 18, 2011 at 5:26 PM, Dave Johansen <davejohansen@gmail.com> wrote:
> I am using Postgres 8.3.3 and I have a VIEW which is a UNION ALL of two
> tables but when I do a select on the view using a LIMIT, it scans the entire
> tables and takes significantly longer than writing out the query with the
> LIMITs in the sub-queries themselves. Is there a solution to get the view to
> perform like the query with the LIMIT explicitly placed in the sub-queries?

Can you show DDL and queries?

The query with the LIMIT on the subqueries and the one with the LIMIT
on the overall query are not semantically equivalent.  Since you can
have an ORDER BY before the LIMIT on the query with the limit on the
view the database must have all the rows before it can apply the
ordering and properly determine the limit.  Although it might be
possible to determine under particular circumstances that only one of
the tables needs to be queried or tables need only be queried
partially I deem that quite complex. I do not know whether Postgres
can do such optimizations but for that we would certainly need to see
the concrete example (including constraint and indexes).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Yes, there is an order by an index involved. Here's a simplified version of the schema and queries that demonstrates the same behaviour.

                                 Table "public.message1"
 Column |           Type           |                       Modifiers                       
--------+--------------------------+--------------------------------------------------------
 rid    | integer                  | not null default nextval('message1_rid_seq'::regclass)
 data   | integer                  |
 tlocal | timestamp with time zone |
Indexes:
    "message1_pkey" PRIMARY KEY, btree (rid)
Referenced by:
    TABLE "parsed1" CONSTRAINT "parsed1_msgid_fkey" FOREIGN KEY (msgid) REFERENCES message1(rid)

                                  Table "public.parsed1"
 Column |           Type           |                       Modifiers                       
--------+--------------------------+--------------------------------------------------------
 rid    | integer                  | not null default nextval('parsed1_rid_seq'::regclass)
 msgid  | integer                  |
 data   | integer                  |
 tlocal | timestamp with time zone |
Indexes:
    "parsed1_pkey" PRIMARY KEY, btree (rid)
Foreign-key constraints:
    "parsed1_msgid_fkey" FOREIGN KEY (msgid) REFERENCES message1(rid) ON DELETE CASCADE

For this example, message2 has the same structure/definition and message1 and parsed2 has the same structure/definition as parsed1.

           View "public.parsed_all"
 Column |           Type           | Modifiers
--------+--------------------------+-----------
 rid    | integer                  |
 msgid  | integer                  |
 data   | integer                  |
 tlocal | timestamp with time zone |
View definition:
         SELECT parsed1.rid, parsed1.msgid, parsed1.data, parsed1.tlocal
           FROM parsed1
UNION ALL
         SELECT parsed2.rid, parsed2.msgid, parsed2.data, parsed2.tlocal
           FROM parsed2;




Slow version using the view:

EXPLAIN ANALYZE SELECT * FROM parsed_all ORDER BY tlocal DESC LIMIT 10;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=74985.28..74985.31 rows=10 width=20) (actual time=6224.229..6224.244 rows=10 loops=1)
   ->  Sort  (cost=74985.28..79985.28 rows=2000000 width=20) (actual time=6224.226..6224.230 rows=10 loops=1)
         Sort Key: parsed1.tlocal
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Result  (cost=0.00..31766.00 rows=2000000 width=20) (actual time=0.026..4933.210 rows=2000000 loops=1)
               ->  Append  (cost=0.00..31766.00 rows=2000000 width=20) (actual time=0.024..2880.868 rows=2000000 loops=1)
                     ->  Seq Scan on parsed1  (cost=0.00..15883.00 rows=1000000 width=20) (actual time=0.023..551.870 rows=1000000 loops=1)
                     ->  Seq Scan on parsed2  (cost=0.00..15883.00 rows=1000000 width=20) (actual time=0.027..549.465 rows=1000000 loops=1)
 Total runtime: 6224.337 ms
(9 rows)

Fast version using a direct query with limits in the sub-queries:

EXPLAIN ANALYZE SELECT * FROM (SELECT * FROM (SELECT * FROM parsed1 ORDER BY tlocal DESC LIMIT 10) AS a UNION ALL SELECT * FROM (SELECT * FROM parsed2 ORDER BY tlocal DESC LIMIT 10) AS b) AS c ORDER BY tlocal DESC LIMIT 10;
                                                                               QUERY PLAN                                                         
                    
---------------------------------------------------------------------------------------------------------------------------------------------------
---------------------
 Limit  (cost=1.33..1.35 rows=10 width=20) (actual time=0.131..0.145 rows=10 loops=1)
   ->  Sort  (cost=1.33..1.38 rows=20 width=20) (actual time=0.129..0.132 rows=10 loops=1)
         Sort Key: parsed1.tlocal
         Sort Method:  quicksort  Memory: 17kB
         ->  Result  (cost=0.00..0.90 rows=20 width=20) (actual time=0.023..0.100 rows=20 loops=1)
               ->  Append  (cost=0.00..0.90 rows=20 width=20) (actual time=0.020..0.078 rows=20 loops=1)
                     ->  Limit  (cost=0.00..0.35 rows=10 width=20) (actual time=0.020..0.035 rows=10 loops=1)
                           ->  Index Scan using parsed1_tlocal_index on parsed1  (cost=0.00..34790.39 rows=1000000 width=20) (actual time=0.018..0.
025 rows=10 loops=1)
                     ->  Limit  (cost=0.00..0.35 rows=10 width=20) (actual time=0.010..0.024 rows=10 loops=1)
                           ->  Index Scan using parsed2_tlocal_index on parsed2  (cost=0.00..34758.39 rows=1000000 width=20) (actual time=0.009..0.
015 rows=10 loops=1)
 Total runtime: 0.187 ms
(11 rows)


Basically, the second query is giving the same result as the version using the view but is able to use the indexes because it the order by and limit are explicitly placed in the sub-queries. So is there a way to make the planner perform the same sort of operation and push those same constraints into the sub-queries on its own?

Thanks,
Dave

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: The shared buffers challenge
Next
From: Tom Lane
Date:
Subject: Re: LIMIT and UNION ALL