Re: Poor performance when using a window function in a view - Mailing list pgsql-general

From Chris Hanks
Subject Re: Poor performance when using a window function in a view
Date
Msg-id CAK7KUdDywcUv5AyGCV3bvNzeAx6SpsnJH-G66Z+==LmLTzDCcQ@mail.gmail.com
Whole thread Raw
In response to Re: Poor performance when using a window function in a view  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
On Fri, Mar 1, 2013 at 6:37 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, Mar 1, 2013 at 3:59 AM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 1:21 AM, Raymond O'Donnell <rod@iol.ie> wrote:
>>
>> On 01/03/2013 00:19, Chris Hanks wrote:
>> > On Thu, Feb 28, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us
>> > <mailto:tgl@sss.pgh.pa.us>> wrote:
>> >
>> >     Chris Hanks <christopher.m.hanks@gmail.com
>> >     <mailto:christopher.m.hanks@gmail.com>> writes:
>> >     > create or replace view values_view as
>> >     > select fkey1, fkey3,
>> >     >   (derived1 / max(derived1) over (partition by fkey1)) as
>> > derived1,
>> >     >   (derived2 / sum(derived1) over (partition by fkey1)) as derived2
>> >     > from (
>> >     >   select fkey1, fkey3,
>> >     >     cast(sum((case when (value > 0.0) then 4 else 1 end)) as
>> > double
>> >     > precision) as derived1,
>> >     >     sum((case when (value > 0.0) then (value * 4) else (value + 1)
>> >     end)) as
>> >     > derived2
>> >     >   from values
>> >     >   group by fkey1, fkey3
>> >     > ) as t1;
>> >
>> >     > -- This query requires a sequential scan on values, though all the
>> >     data it
>> >     > needs could be found much more efficiently with an index scan.
>> >     > explain analyze select * from values_view where fkey1 = 1263;
>> >
>> >     To use the outer WHERE clause as an index constraint, postgres would
>> >     have to prove that scanning only the rows with fkey1 = 1263 would
>> > still
>> >     find all the rows that would get examined by the window functions
>> > ---
>> >     and in this case, it's not only the window functions that make that
>> > less
>> >     than obvious, but the grouped aggregates in the sub-select below
>> > them.
>> >     There's not nearly that amount of intelligence in the system about
>> >     window functions, as yet.  So you'll have to write out the query
>> >     longhand and put the WHERE clause at the lower level, if you want
>> > this
>> >     optimization to happen.
>> >
>> >                             regards, tom lane
>> >
>> >
>> > Ok, that makes sense, thanks.
>> >
>> > Can anyone point me to an example of wrapping a function in a view, like
>> > Merlin suggested? I'm not sure how that would work.
>>
>> Off the top of my head, I'd imagine it's as simple as:
>>
>>   create view ... as
>>   select * from my_function(...);
>>
>> :-)
>>
>> Ray.
>>
>>
>> --
>> Raymond O'Donnell :: Galway :: Ireland
>> rod@iol.ie
>
>
>
> Sorry, I don't understand. I'm able to make a function that takes an integer
> and uses it in the subselect as "WHERE fkey1 = arg", and that works as I
> expect it to and it's plenty fast. But I don't see how to write a view to
> take advantage of this function - what arguments would go in
> my_function(...) when I'm declaring the view?

First let's clearly state the problem - create some test data:
postgres=# create table foo as select v1, v2 from (select
generate_series(1,10) v1) q1, (select generate_series(1,10000) v2) q2;
postgres=# create table bar as select v1 from generate_series(1,10) v1;

foo is data table, bar defines the key around which we have the data.

create an index:
postgres=# create index on foo(v1);

simple lookup is optimized obviously:
postgres=# explain select * from foo where v1 = 7;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using foo_v1_idx on foo  (cost=0.00..4.38 rows=1 width=8)
   Index Cond: (v1 = 50)

get indexed lookup for simple window function:
postgres=# explain select v1, sum(v2) over(partition by v1 order by
v2) from foo where v1 = 7;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 WindowAgg  (cost=4.39..4.41 rows=1 width=8)
   ->  Sort  (cost=4.39..4.39 rows=1 width=8)
         Sort Key: v2
         ->  Index Scan using foo_v1_idx on foo  (cost=0.00..4.38
rows=1 width=8)
               Index Cond: (v1 = 50)

push to subquery (essentially what view does) and we lose the optimization:
postgres=# explain select * from (select v1, sum(v2) over(partition by
v1 order by v2) from foo) q  where v1 = 7;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Subquery Scan on q  (cost=9747.82..12997.82 rows=1 width=12)
   Filter: (q.v1 = 50)
   ->  WindowAgg  (cost=9747.82..11747.82 rows=100000 width=8)
         ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)
               Sort Key: foo.v1, foo.v2
               ->  Seq Scan on foo  (cost=0.00..1443.00 rows=100000 width=8)

let's work around it!
postgres=# CREATE OR REPLACE FUNCTION getfoovals(
  v1 INOUT int,
  sumv2 OUT BIGINT) RETURNS SETOF RECORD AS
$$
  select
    v1,
    sum(v2)
    over(partition by v1 order by v2)
  from foo where foo.v1 = getfoovals.v1;
$$ LANGUAGE SQL STABLE;

select * from getfoovals(7);
 v1 |  sumv2
----+----------
  7 |        1
  7 |        3
  7 |        6
  7 |       10
  7 |       15
  7 |       21
<snip>

abstract to view:
postgres=# CREATE OR REPLACE VIEW foovals AS
  select
    v1,
    (getfoovals(v1)).sumv2
  from bar;

postgres=#  select * from foovals where v1 = 7;

Main problem with this technique is awkwardness around using column
lest set returning function if it returns > 1 column...if you'r not
careful you can get extra invocations of function.   you work hack
around this to some degree via:
*) offset 0 hacks
*) return type coersion hacks
*) 9.3 LATERAL feature completely nails it

Can't use WITH because it fences of the optimization.

This (window function optimization fencing) is probably #1 performance
gotcha I hit in everyday coding after LATERAL and lack of better
ability to inline simple SQL functions.

merlin




Hi - thanks everyone for your input. I bit the bullet and just made a materialized view for now. I got Merlin's suggestion working, though - I'm going to revisit it when 9.3 comes out and I can use LATERAL.

Thanks again!
Chris

pgsql-general by date:

Previous
From: Michael Best
Date:
Subject: Re: broke postgres, how to fix??
Next
From: James Cloos
Date:
Subject: Re: Floating point error