Thread: optimization with limit and order by in a view

optimization with limit and order by in a view

From
elein
Date:
Yes, I vacuumed.  Reproduced on both 7.3.2 and 7.5.

Brain dead java beans want order by clauses in views
that they use.

my view is:
    select ....
    from bigtable b left join lookuptable l
    order by bigkey desc;

The usual query was to limit by 1 or 20.

What I found was that if I moved the order by outside
of the view definition, the query went from 5000-7000ms
down to 70-1.5ms.

The explain shows that when the order by is in the view
a subquery is executed.  When it is not, it goes directly
into the nested loop join.

I do not see how the sort occurs faster in the query where the
sort is on the outside of the view definition.

Is there something here that we are not optimizing?  The
queries should be equivalent, no?

Here are the four explains.  One each for limit 1 and 20
and against the sorted view and unsorted view.


elein=#  explain analyze SELECT xact_id, msg_request_code,
elein-# netinfo_code, response_code, create_timestamp, pan, tran_amount,
elein-# netinfo_short_desc  FROM network_transactions
elein-# limit 1;
                                                                     QUERY PLAN
                            

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=56475.42..56475.43 rows=1 width=133) (actual time=5747.123..5747.124 rows=1 loops=1)
   ->  Subquery Scan network_transactions  (cost=56475.42..59408.02 rows=234608 width=133) (actual
time=5747.118..5747.118rows=1 loops=1) 
         ->  Sort  (cost=56475.42..57061.94 rows=234608 width=78) (actual time=5747.088..5747.088 rows=1 loops=1)
               Sort Key: t.xact_id
               ->  Hash Left Join  (cost=1.09..9758.51 rows=234608 width=78) (actual time=17.907..2326.073 rows=234608
loops=1)
                     Hash Cond: ("outer".netinfo_code = ("inner".netinfo_code)::text)
                     ->  Seq Scan on transaction_log t  (cost=0.00..6813.08 rows=234608 width=68) (actual
time=17.749..616.371rows=234608 loops=1) 
                     ->  Hash  (cost=1.07..1.07 rows=7 width=17) (actual time=0.063..0.063 rows=0 loops=1)
                           ->  Seq Scan on netinfo_codes n  (cost=0.00..1.07 rows=7 width=17) (actual time=0.021..0.043
rows=7loops=1) 
 Total runtime: 5764.097 ms
(10 rows)
elein=# explain analyze SELECT xact_id, msg_request_code,
elein-# netinfo_code, response_code, create_timestamp, pan, tran_amount,
elein-# netinfo_short_desc  FROM network_transactions
elein-# limit 20;
                                                                    QUERY PLAN
                           

---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=56475.42..56475.67 rows=20 width=133) (actual time=5052.540..5052.639 rows=20 loops=1)
   ->  Subquery Scan network_transactions  (cost=56475.42..59408.02 rows=234608 width=133) (actual
time=5052.536..5052.621rows=20 loops=1) 
         ->  Sort  (cost=56475.42..57061.94 rows=234608 width=78) (actual time=5052.504..5052.523 rows=20 loops=1)
               Sort Key: t.xact_id
               ->  Hash Left Join  (cost=1.09..9758.51 rows=234608 width=78) (actual time=0.224..2183.144 rows=234608
loops=1)
                     Hash Cond: ("outer".netinfo_code = ("inner".netinfo_code)::text)
                     ->  Seq Scan on transaction_log t  (cost=0.00..6813.08 rows=234608 width=68) (actual
time=0.043..448.986rows=234608 loops=1) 
                     ->  Hash  (cost=1.07..1.07 rows=7 width=17) (actual time=0.130..0.130 rows=0 loops=1)
                           ->  Seq Scan on netinfo_codes n  (cost=0.00..1.07 rows=7 width=17) (actual time=0.089..0.106
rows=7loops=1) 
 Total runtime: 5069.562 ms
(10 rows)
elein=# explain analyze SELECT xact_id, msg_request_code,
elein-# netinfo_code, response_code, create_timestamp, pan, tran_amount,
elein-# netinfo_short_desc  FROM net_trans_noorder order by xact_id desc
elein-# limit 1;
                                                                               QUERY PLAN
                                                

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.04 rows=1 width=78) (actual time=0.339..0.340 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..477486.69 rows=234608 width=78) (actual time=0.336..0.336 rows=1 loops=1)
         Join Filter: ("outer".netinfo_code = ("inner".netinfo_code)::text)
         ->  Index Scan Backward using transaction_log_pkey on transaction_log t  (cost=0.00..205927.93 rows=234608
width=68)(actual time=0.243..0.243 rows=1 loops=1) 
         ->  Seq Scan on netinfo_codes n  (cost=0.00..1.07 rows=7 width=17) (actual time=0.046..0.051 rows=7 loops=1)
 Total runtime: 0.595 ms
(6 rows)

elein=#
elein=# explain analyze SELECT xact_id, msg_request_code,
elein-# netinfo_code, response_code, create_timestamp, pan, tran_amount,
elein-# netinfo_short_desc  FROM net_trans_noorder order by xact_id desc
elein-# limit 20;
                                                                               QUERY PLAN
                                                 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.07..21.78 rows=20 width=78) (actual time=0.138..1.117 rows=20 loops=1)
   ->  Nested Loop Left Join  (cost=1.07..242879.76 rows=234608 width=78) (actual time=0.134..1.075 rows=20 loops=1)
         Join Filter: ("outer".netinfo_code = ("inner".netinfo_code)::text)
         ->  Index Scan Backward using transaction_log_pkey on transaction_log t  (cost=0.00..205927.93 rows=234608
width=68)(actual time=0.044..0.647 rows=20 loops=1) 
         ->  Materialize  (cost=1.07..1.14 rows=7 width=17) (actual time=0.002..0.006 rows=7 loops=20)
               ->  Seq Scan on netinfo_codes n  (cost=0.00..1.07 rows=7 width=17) (actual time=0.029..0.049 rows=7
loops=1)
 Total runtime: 1.378 ms
(7 rows)


Re: optimization with limit and order by in a view

From
Tom Lane
Date:
elein <elein@varlena.com> writes:
> Brain dead java beans want order by clauses in views
> that they use.

That's *quite* brain dead, considering that standard SQL doesn't allow
ORDER BY in view definitions at all.  Sure you can't fix it on the
client side?

> What I found was that if I moved the order by outside
> of the view definition, the query went from 5000-7000ms
> down to 70-1.5ms.

Yeah.  The planner can't flatten a subquery that contains ORDER BY into
the parent query, because there'd be no place to put the ORDER BY.  So
when you write it that way, the subquery is planned independently and
it doesn't realize that it should use a fast-start plan instead of a
minimum-total-time plan.

I can think of various possible kluges to get around this in simple
cases, but nothing I like much...

            regards, tom lane

Re: optimization with limit and order by in a view

From
elein
Date:
Thank you.  I already removed the order by in the view
definition which was put in against my recommendation...

But it is an interesting case.

If I understand you, a subquery with an order by cannot
be flattened.

cheers,

elein


On Tue, Jul 13, 2004 at 05:55:50PM -0400, Tom Lane wrote:
> elein <elein@varlena.com> writes:
> > Brain dead java beans want order by clauses in views
> > that they use.
>
> That's *quite* brain dead, considering that standard SQL doesn't allow
> ORDER BY in view definitions at all.  Sure you can't fix it on the
> client side?
>
> > What I found was that if I moved the order by outside
> > of the view definition, the query went from 5000-7000ms
> > down to 70-1.5ms.
>
> Yeah.  The planner can't flatten a subquery that contains ORDER BY into
> the parent query, because there'd be no place to put the ORDER BY.  So
> when you write it that way, the subquery is planned independently and
> it doesn't realize that it should use a fast-start plan instead of a
> minimum-total-time plan.
>
> I can think of various possible kluges to get around this in simple
> cases, but nothing I like much...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html