Thread: Windowing Function Patch Review -> Performance Comparison.

Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
All,

This is my first patch review for PostgreSQL. I did submit a patch last
commit fest (Boyer-Moore) so I feel I should review one this commit fest.
I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my
best. Heikki is also reviewing this patch which makes me feel better.

My aim is to get the author has much feed back as quickly as possible. For
this reason I'm going to be breaking down my reviews into the following
topics.

1. Does patch apply cleanly?

2. Any compiler warnings?

3. Do the results follow the SQL standard?

4. Performance Comparison, does it perform better than alternate ways of
doing things. Self joins, sub queries etc.

5. Performance, no comparison. How does it perform with larger tables?


Things I probably won't attempt to review:

Source code; best practises, making use of existing APIs etc. I'd rather
leave that for Heikki and possibly others that join in reviewing this patch.

It's not that I'm too lazy, just that I don't feel that I know the source
well enough. Plus it's a complex patch.

Really I should follow my list in order but I'm going to do number 4 first
in order to get some quick feedback to the author.

I've created some "real world" tests where windowing functions will be
useful. I created some tables then populated with data. I then wrote 2
queries; 1 to make use of windowing functions, the other that uses a method
without windowing functions.

Test Results:

Test        Normal    Windowing    UOM        Increase %
Test 1    498.00    578.00    Trans/Sec    16.06%
Test 2     336.00    481.00    Trans/Sec    43.15%
Test 3    1.30        8.45        Trans/Sec    550.00%
Test 4    424.00    629.00    Trans/Sec    48.35%
Test 5    8.89        31052.69    Trans/Hour    349114.85%
Test 6    253.00    305.00    Trans/Sec    20.55%

(Please see attached document for the actual tests)

Note: The above results will much depend on the set of data. Most of my
tests use a very small volume of data. Test 3 and 5 use more data that the
other tests. It's quite obvious that the more data there is in my tests the
bigger the margin between the two methods becomes. I originally ran test 3
with 40000 rows to simulate a large marathon but the "normal query" was
going to take hours... I reduced the rows to 10000.

Obervations:

Test 3 and 5 did not seem to make use of an index to get a sorted list of
results. I disabled enable_seqscan but the planner still failed to choose
index_scan. Is there any reason for this? Perhaps I'm missing something.
Hitoshi, can you take a look at this?

Tests:

Please see attached file. Perhaps there were more efficient ways for certain
queries, I just couldn't think of them...

Please let me know if you feel I should be conducting the review in another
way.

David.










Attachment

Re: Windowing Function Patch Review -> Performance Comparison.

From
"Vladimir Sitnikov"
Date:
Here is another way to solve "big marathon" without window functions (and many other kinds of "windowing" queries,
especiallythose that do not specify "rows preceeding" etc.).<br /><br />It could be considered as a very dirty hack,
howeverit could give you an insight on the performance of the "windowed" query with indexscan instead of seqscan.<br
/><br/><pre class="src"><font color="blue">create</font> <font color="blue">function</font> var_set<span
style="background-color:rgb(255, 249, 140);"></span> (<font color="blue">text</font>,text) <font
color="blue">returns</font><font color="blue">text</font><br />
 
<font color="blue">as</font><br /><font color="red">'<br />  select set_config ('</font><font
color="red">'public.'</font><fontcolor="red">'||$2||pg_backend_pid(), $1, false);<br />'</font> LANGUAGE <font
color="red">'sql'</font>;<br/>
 
<br /><font color="blue">create</font> <font color="blue">function</font> var_get (<font color="blue">text</font>)
<fontcolor="blue">returns</font> <font color="blue">text</font><br /><font color="blue">as</font><br /><font
color="red">'<br/> select current_setting('</font><font color="red">'public.'</font><font
color="red">'||$1||pg_backend_pid());<br/>'</font> LANGUAGE <font color="red">'sql'</font>;<br /><br /><font
color="blue">create</font>operator >>> (<font color="blue">procedure</font> = var_set, leftarg = <font
color="blue">text</font>,rightarg = <font color="blue">text</font>);<br />
 
<font color="blue">create</font> operator <<< (<font color="blue">procedure</font> = var_get, rightarg = <font
color="blue">text</font>);<br/></pre><br /><pre class="src"><font color="teal">-- init values</font><br />
 
<font color="blue">select</font> <font color="red">''</font>>>><font color="red">'prev_time'</font>, <font
color="red">'0'</font>>>><fontcolor="red">'dense_rank'</font>;<br />
 
<br /><font color="teal">-- marathon query</font><br /><font color="blue">select</font> *<br />  <font
color="blue">from</font>(<br />     <font color="blue">select</font> (((<font color="magenta">case</font> <font
color="blue">when</font><font color="blue">time</font>::<font color="blue">text</font> = <<<<font
color="red">'prev_time'</font><font color="blue">then</font> <font color="darkblue"><b>0</b></font> <font
color="blue">else</font><font color="darkblue"><b>1</b></font> <font color="blue">end</font>)+(<<<<font
color="red">'dense_rank'</font>)::int4)::<fontcolor="blue">text</font>>>><font
color="red">'dense_rank'</font>)::int4<font color="blue">as</font> position, runnerid, <font
color="blue">time</font><br/>      <font color="blue">from</font> big_marathon<br />      <font
color="blue">order</font><font color="blue">by</font> <font color="blue">time</font><br />  ) results<br /> <font
color="blue">where</font>position=<font color="darkblue"><b>2</b></font><br />
 
</pre>Best regards,<br />Vladimir Sitnikov<br />

Re: Windowing Function Patch Review -> Performance Comparison.

From
"Vladimir Sitnikov"
Date:
Just a small correction: there should be <font color="blue">time</font>::<font
color="blue">text</font>>>><fontcolor="red">'prev_time'</font> for the calculations to be correct:<br /><pre
class="src"><fontcolor="blue"><br />
 
select</font> *<br />  <font color="blue">from</font> (<br />     <font color="blue">select</font> (((<font
color="magenta">case</font><font color="blue">when</font> <font color="blue">time</font>::<font
color="blue">text</font>= <<<<font color="red">'prev_time'</font> <font color="blue">then</font> <font
color="darkblue"><b>0</b></font><font color="blue">else</font> <font color="darkblue"><b>1</b></font> <font
color="blue">end</font>)+(<<<<fontcolor="red">'dense_rank'</font>)::int4)::<font
color="blue">text</font>>>><fontcolor="red">'dense_rank'</font>)::int4 <font color="blue">as</font> position,
runnerid,<font color="blue">time</font>, <font color="blue">time</font>::<font
color="blue">text</font>>>><fontcolor="red">'prev_time'</font><br />      <font color="blue">from</font>
big_marathon<br/>      <font color="blue">order</font> <font color="blue">by</font> <font color="blue">time</font><br
/> ) results<br /> <font color="blue">where</font> position=<font color="darkblue"><b>2</b></font><br />
 
</pre><pre class="src"><font color="teal">-- meter_readings</font><br /><font color="blue">select</font> <font
color="red">''</font>>>> <font color="red">'lag'</font>;<br /><br /><font color="blue">select</font> <font
color="blue">date</font>,reading::<font color="blue">numeric</font>-(<font color="magenta">case</font> lag <font
color="blue">when</font><font color="red">''</font> <font color="blue">then</font> <font color="#777777">null</font>
<fontcolor="blue">else</font> lag <font color="blue">end</font>)::<font color="blue">numeric</font> <font
color="blue">as</font>used<br /> <font color="blue">from</font> (<br />    <font color="blue">select</font> <font
color="blue">date</font>,<<<<font color="red">'lag'</font> <font color="blue">as</font> lag, reading::<font
color="blue">text</font>>>><font color="red">'lag'</font> <font color="blue">as</font> reading<br />     <font
color="blue">from</font>meter_readings<br />     <font color="blue">order</font> <font color="blue">by</font> <font
color="blue">date</font><br/>  ) <font color="blue">as</font> t<br /> <font color="blue">order</font> <font
color="blue">by</font>used <font color="blue">asc</font> nulls <font color="blue">last</font> limit <font
color="darkblue"><b>1</b></font><br/>
 
</pre><br />Best regards,<br />Vladimir Sitnikov<br />

Re: Windowing Function Patch Review -> Performance Comparison.

From
"Hitoshi Harada"
Date:
2008/11/2 David Rowley <dgrowley@gmail.com>:
> Obervations:
>
> Test 3 and 5 did not seem to make use of an index to get a sorted list of
> results. I disabled enable_seqscan but the planner still failed to choose
> index_scan. Is there any reason for this? Perhaps I'm missing something.
> Hitoshi, can you take a look at this?

Ah, good point. Maybe it's because I haven't paid attention to choose
index_scan for upper sort node. I just put the sort node whatever the
downer node is, so it might be needed to sink the information down to
scan choice process that we use sort node upper. Could someone point
me out how to do it, or which part of the existing code would be a
good guide?

> Tests:
>
> Please see attached file. Perhaps there were more efficient ways for certain
> queries, I just couldn't think of them...
>
> Please let me know if you feel I should be conducting the review in another
> way.

Thanks for your test. Didn't post publicly, I've also tested real
problems and performed better than I thought. If you can afford it,
could you add selfjoin cases? It's like:

-- normal
SELECT t1.id, t1.grp, count(t2) + 1 AS row_number
FROM t t1
INNER JOIN t t2 ON t1.grp = t2.grp AND t1.id > t2.id;

-- windowing
SELECT id, grp, row_number() OVER (PARTITION grp ORDER BY id)
FROM t;


Regards,

-- 
Hitoshi Harada


Re: Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
Hitoshi Harada Wrote:
> Thanks for your test. Didn't post publicly, I've also tested real
> problems and performed better than I thought. If you can afford it,
> could you add selfjoin cases? It's like:

Ok, did self joins with some. I don't know if it's possible with all.

Test   Sub query Self join Vladimir Windowing UOM        Increase %
Test 1 498.00    N/A       N/A      578.00    Trans/Sec  16.06%
Test 2 336.00    424.00    182.78    481.00    Trans/Sec  13.44%
Test 3 1.30      7.59      1.90     8.45      Trans/Sec  11.33%
Test 4 424.00    361.00    182.00   629.00    Trans/Sec  48.35%
Test 5 8.89      N/A       5844.16  31052.69  Trans/Hour 431.35%
Test 6 253.00    N/A       N/A      305.00    Trans/Sec  20.55%

See attached for details.
The increase % column is now:

Window / max ( Sub query, self join, Vladimir ) - 1

Vladimir, I've included your tests too. I understand that people will
probably use this method as sometimes there is little choice to get the
performance that is required.

Hitoshi Harada Wrote:
> 2008/11/2 David Rowley <dgrowley@gmail.com>:
> > Obervations:
> >
> > Test 3 and 5 did not seem to make use of an index to get a sorted list
> of
> > results. I disabled enable_seqscan but the planner still failed to
> choose
> > index_scan. Is there any reason for this? Perhaps I'm missing something.
> > Hitoshi, can you take a look at this?

> Ah, good point. Maybe it's because I haven't paid attention to choose
> index_scan for upper sort node. I just put the sort node whatever the
> downer node is, so it might be needed to sink the information down to
> scan choice process that we use sort node upper. Could someone point
> me out how to do it, or which part of the existing code would be a
> good guide?

I know you need to wait for an answer about this, so I'd like to delay any
further performance tests until that's sorted out as it should affect
performance of larger tables quite a bit.

David.


Attachment

Re: Windowing Function Patch Review -> Performance Comparison.

From
"Hitoshi Harada"
Date:
2008/11/2 David Rowley <dgrowley@gmail.com>:
> Hitoshi Harada Wrote:
>> 2008/11/2 David Rowley <dgrowley@gmail.com>:
>> > Obervations:
>> >
>> > Test 3 and 5 did not seem to make use of an index to get a sorted list
>> of
>> > results. I disabled enable_seqscan but the planner still failed to
>> choose
>> > index_scan. Is there any reason for this? Perhaps I'm missing something.
>> > Hitoshi, can you take a look at this?
>
>> Ah, good point. Maybe it's because I haven't paid attention to choose
>> index_scan for upper sort node. I just put the sort node whatever the
>> downer node is, so it might be needed to sink the information down to
>> scan choice process that we use sort node upper. Could someone point
>> me out how to do it, or which part of the existing code would be a
>> good guide?
>
> I know you need to wait for an answer about this, so I'd like to delay any
> further performance tests until that's sorted out as it should affect
> performance of larger tables quite a bit.
>

I found how to do it, though it's only on the case you gave. Thinking
about the planner optimization of the Window nodes (and its attached
Sort nodes), we must consider the execution order of more than one
node. In the test case we only take care of only one window, but there
may be more window/sort node sets, which is too difficult to choose
the best execution order including the downer indexscan, mergejoin in
subquery and sort-based GROUP BY. So I didn't touch the complicated
planner jungle. I rewrote the patch so that only the given bottom
window's sort can consider indexscan. Deeper optimizations are over my
capability.

Attach is a delta patch against the last one. Also see the git diff:

http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=bbba638f721a7e1d11cb3ee6af3bc1d7d3c11aa8;hp=48b73ee574779a14a3c36d373d8544d59a5b8b46

Regards,



--
Hitoshi Harada

Attachment

Re: Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
Hitoshi Harada wrote:
> >> > Test 3 and 5 did not seem to make use of an index to get a sorted
> list
> >> of
> >> > results. I disabled enable_seqscan but the planner still failed to
> >> choose
> >> > index_scan. Is there any reason for this? Perhaps I'm missing
> something.
> >> > Hitoshi, can you take a look at this?
> >
> >> Ah, good point. Maybe it's because I haven't paid attention to choose
> >> index_scan for upper sort node. I just put the sort node whatever the
> >> downer node is, so it might be needed to sink the information down to
> >> scan choice process that we use sort node upper. Could someone point
> >> me out how to do it, or which part of the existing code would be a
> >> good guide?
> >
> > I know you need to wait for an answer about this, so I'd like to delay
> any
> > further performance tests until that's sorted out as it should affect
> > performance of larger tables quite a bit.
> >
>
> I found how to do it, though it's only on the case you gave. Thinking
> about the planner optimization of the Window nodes (and its attached
> Sort nodes), we must consider the execution order of more than one
> node. In the test case we only take care of only one window, but there
> may be more window/sort node sets, which is too difficult to choose
> the best execution order including the downer indexscan, mergejoin in
> subquery and sort-based GROUP BY. So I didn't touch the complicated
> planner jungle. I rewrote the patch so that only the given bottom
> window's sort can consider indexscan. Deeper optimizations are over my
> capability.

I've just looked into what some other implementations do. Sybase seems to do
exactly what you've done. It only looks at the first window clause in the
query. Oracle seems to use the index regardless to the position of the
window clause. To me personally what you've done seems fine for now. Perhaps
something could be done later to improve on this. Maybe someone else has
ideas about how to do it?

It seems quite similar to SELECT MAX(idxcol),MAX(idxcol2) where the planner
often makes use of 2 indexes when available, yet this case is probably far
more simple as there is always just 1 row. Costing would likely be more
complex with the windowing functions version.

Good work.

I'll continue with more benchmarks soon.

David.




Re: Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
Hitoshi Harada wrote:
> I found how to do it, though it's only on the case you gave. Thinking
> about the planner optimization of the Window nodes (and its attached
> Sort nodes), we must consider the execution order of more than one
> node. In the test case we only take care of only one window, but there
> may be more window/sort node sets, which is too difficult to choose
> the best execution order including the downer indexscan, mergejoin in
> subquery and sort-based GROUP BY. So I didn't touch the complicated
> planner jungle. I rewrote the patch so that only the given bottom
> window's sort can consider indexscan. Deeper optimizations are over my
> capability.

Sorry for the delay on this. I've updated the benchmark results using the
new patch you sent today. I did a dump and re-load of the tables, since some
of the numbers are randomly generated I wouldn't want to compare them to the
old results for any of the tests. This is a complete new list with the CVS
head as of this morning.

Test   Sub Query Self Join Vladimir    Windowing UOM Window over Best alt
Test 1 504       N/A       N/A      568       TPS 12.7%
Test 2 340.9     425       182    450.38    TPS 6.0%
Test 3 1.304     8.12      1.963    7.52      TPS -7.4%
Test 4 422       365       195    630        TPS 49.3%
Test 5 8.874     N/A       5825     31203     TPH 435.6%
Test 6 251       N/A       N/A      300       TPS 19.5%

Only test 3 and 5 made use of the index scan, performance dropped slightly
on test 3 but there's not much point in paying much attention to that since
we're probably close to the cross over point between a seqscan and indexscan
where the planner's decision is not as critical.

Certain Self join methods I used don't implement the exact requirements I've
stated at the top of the test. For example the meter reading for self join
requires no days to be missed.

Maybe multi window optimisation is one for 8.5's TODO

I've attached the test scripts.

David.

Attachment

Re: Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
Hitoshi Harada wrote:
> I found how to do it, though it's only on the case you gave. Thinking
> about the planner optimization of the Window nodes (and its attached
> Sort nodes), we must consider the execution order of more than one
> node. In the test case we only take care of only one window, but there
> may be more window/sort node sets, which is too difficult to choose
> the best execution order including the downer indexscan, mergejoin in
> subquery and sort-based GROUP BY. So I didn't touch the complicated
> planner jungle. I rewrote the patch so that only the given bottom
> window's sort can consider indexscan. Deeper optimizations are over my
> capability.

After more playing around with a few queries and testing some performance of
larger tables. I discovered something strange in the plan for this query.


david=# explain select date,lag(date,1) over (order by date) from
meter_Readings order by date;                                                QUERY PLAN
----------------------------------------------------------------------------
--------------------------------Sort  (cost=1038.73..1063.74 rows=10001 width=4)  Sort Key: date  ->  Window
(cost=0.00..374.27rows=10001 width=4)        ->  Index Scan using meter_readings_pkey on meter_readings 
(cost=0.00..299.27 rows=10001 width=4)
(4 rows)

Is the final sort still required? Is it not already sorted in the window?

The table I was testing on split the sort to disk, I would probably not have
noticed otherwise.

David.



Re: Windowing Function Patch Review -> Performance Comparison.

From
"Hitoshi Harada"
Date:
2008/11/10 David Rowley <dgrowley@gmail.com>:
> Hitoshi Harada wrote:
>> I found how to do it, though it's only on the case you gave. Thinking
>> about the planner optimization of the Window nodes (and its attached
>> Sort nodes), we must consider the execution order of more than one
>> node. In the test case we only take care of only one window, but there
>> may be more window/sort node sets, which is too difficult to choose
>> the best execution order including the downer indexscan, mergejoin in
>> subquery and sort-based GROUP BY. So I didn't touch the complicated
>> planner jungle. I rewrote the patch so that only the given bottom
>> window's sort can consider indexscan. Deeper optimizations are over my
>> capability.
>
> After more playing around with a few queries and testing some performance of
> larger tables. I discovered something strange in the plan for this query.
>
>
> david=# explain select date,lag(date,1) over (order by date) from
> meter_Readings order by date;
>                                                 QUERY PLAN
> ----------------------------------------------------------------------------
> --------------------------------
>  Sort  (cost=1038.73..1063.74 rows=10001 width=4)
>   Sort Key: date
>   ->  Window  (cost=0.00..374.27 rows=10001 width=4)
>         ->  Index Scan using meter_readings_pkey on meter_readings
> (cost=0.00..299.27 rows=10001 width=4)
> (4 rows)
>
> Is the final sort still required? Is it not already sorted in the window?
>

Oh, I forgot to mention about it. This behavior is also fixed and it
works without sort on the window now. I don't remember at all why I
did so and there's no comment around that but regression tests showed
there is no preblem without it.

Regards,

-- 
Hitoshi Harada


Re: Windowing Function Patch Review -> Performance Comparison.

From
"David Rowley"
Date:
Hitoshi Harada wrote:
> > david=# explain select date,lag(date,1) over (order by date) from
> > meter_Readings order by date;
> >                                                 QUERY PLAN
> > ------------------------------------------------------------------------
> ----
> > --------------------------------
> >  Sort  (cost=1038.73..1063.74 rows=10001 width=4)
> >   Sort Key: date
> >   ->  Window  (cost=0.00..374.27 rows=10001 width=4)
> >         ->  Index Scan using meter_readings_pkey on meter_readings
> > (cost=0.00..299.27 rows=10001 width=4)
> > (4 rows)
> >
> > Is the final sort still required? Is it not already sorted in the
> window?
> >
>
> Oh, I forgot to mention about it. This behavior is also fixed and it
> works without sort on the window now. I don't remember at all why I
> did so and there's no comment around that but regression tests showed
> there is no preblem without it.
>
> Regards,
>
> --
> Hitoshi Harada

Fantastic news! That will speed up the few test queries I have quite a bit
the sort was splitting to disk so performance was dropping quite a bit. This
might be a good time to say that the hardware that I'm testing on is ultra
slow. 600 Mhz Pentium III with only 28MB shared buffers.

I've done some more performance tests with the patch. Realising that I
really need to be comparing the performance with something else I decided to
have a query process a large table with then without a windowing function
just to see how much the query slows. Of course there is going to be an
overhead using a tuple store. I just wanted to see how much. My results are
not really very interesting, so I'll just include them in the bottom of the
email for those who want to see.

Casting my mind back to when the patch was always doing a sort before a
window with an order by even when a btree index was there, it's really come
a long way. I've little idea how difficult it would be to implement better
planning for the following. I suppose if it's difficult then it's probably
better to wait for the patch to be commited first, or maybe it's something
for 8.5.

SELECT department,      SUM(Salary),      ROW_NUMBER() OVER (ORDER BY department),      SUM(SUM(salary)) OVER (ORDER BY
departmentDESC) 
FROM employees
GROUP BY department ORDER BY department;

This query performs more sorts than really is needed. I suppose the most
efficient way to process it would be to process the 2nd window first then
the 1st before returning the results in the same order as the 1st window.

Currently the plan looks like:
                                        QUERY PLAN
----------------------------------------------------------------------------
-----------------Sort  (cost=1.33..1.34 rows=3 width=9)  Sort Key: department  ->  Window  (cost=1.25..1.31 rows=3
width=9)       ->  Sort  (cost=1.25..1.26 rows=3 width=9)              Sort Key: department              ->  Window
(cost=1.17..1.23rows=3 width=9)                    ->  Sort  (cost=1.17..1.18 rows=3 width=9)
SortKey: department                          ->  HashAggregate  (cost=1.10..1.15 rows=3 
width=9)                                ->  Seq Scan on employees  (cost=0.00..1.06
rows=6 width=9)


Maybe it's possible to sort the processing order of the windows based on the
ORDER BY clauses putting any that match the ORDER BY of the final results
last. I've not looked into this in much detail. Currently I cannot see any
scenario where it would be bad.

What do you think?

David


CREATE TABLE bigtable ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL
);

-- about 383MB of data
INSERT INTO bigtable (timestamp)
SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
FROM generate_series(1,10000000);

CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);

VACUUM ANALYZE bigtable;

-- base line test
david=# SELECT COUNT(*) FROM (select id,timestamp from bigtable order by id)
t; count
----------10000000
(1 row)

Time: 72862.238 ms

-- lag test
david=# SELECT COUNT(*) FROM (select id,LAG(timestamp,1) OVER (order by id)
from bigtable order by id) t; count
----------10000000
(1 row)

Time: 257713.350 ms

david=# SELECT COUNT(*) FROM (select id,NTILE(10) OVER (order by id) from
bigtable order by id) t; count
----------10000000
(1 row)

Time: 183131.425 ms





Re: Windowing Function Patch Review -> Performance Comparison.

From
"Hitoshi Harada"
Date:
2008/11/15 David Rowley <dgrowley@gmail.com>:
> Hitoshi Harada wrote:
>> > david=# explain select date,lag(date,1) over (order by date) from
>> > meter_Readings order by date;
>> >                                                 QUERY PLAN
>> > ------------------------------------------------------------------------
>> ----
>> > --------------------------------
>> >  Sort  (cost=1038.73..1063.74 rows=10001 width=4)
>> >   Sort Key: date
>> >   ->  Window  (cost=0.00..374.27 rows=10001 width=4)
>> >         ->  Index Scan using meter_readings_pkey on meter_readings
>> > (cost=0.00..299.27 rows=10001 width=4)
>> > (4 rows)
>> >
>> > Is the final sort still required? Is it not already sorted in the
>> window?
>> >
>>
>> Oh, I forgot to mention about it. This behavior is also fixed and it
>> works without sort on the window now. I don't remember at all why I
>> did so and there's no comment around that but regression tests showed
>> there is no preblem without it.
>>
>> Regards,
>>
>> --
>> Hitoshi Harada
>
> Fantastic news! That will speed up the few test queries I have quite a bit
> the sort was splitting to disk so performance was dropping quite a bit. This
> might be a good time to say that the hardware that I'm testing on is ultra
> slow. 600 Mhz Pentium III with only 28MB shared buffers.
>
> I've done some more performance tests with the patch. Realising that I
> really need to be comparing the performance with something else I decided to
> have a query process a large table with then without a windowing function
> just to see how much the query slows. Of course there is going to be an
> overhead using a tuple store. I just wanted to see how much. My results are
> not really very interesting, so I'll just include them in the bottom of the
> email for those who want to see.

Thanks for your test code. I attach the result of your test to which
another query is added. The added row_number() query has row buffer
strategy that doesn't hold tuplestore but only scans and returns rows.
So the difference between row_number(), 44 sec, and ntile(), 61 sec,
cases roughly shows how much tuplestore adds overhead. I had supposed
the row_number() case would be more efficient but still we can see it
works as an optimization.

> Casting my mind back to when the patch was always doing a sort before a
> window with an order by even when a btree index was there, it's really come
> a long way. I've little idea how difficult it would be to implement better
> planning for the following. I suppose if it's difficult then it's probably
> better to wait for the patch to be commited first, or maybe it's something
> for 8.5.

Yeah, the planner around group by, order, distinct, indexscan, and
window is quite complicated. Though I think I can manage to do it if
there left enough time, but at first the basic design should be
qualified by core hackers. I am waiting for subsequent review and
responses from Heikki and others.

> SELECT department,
>       SUM(Salary),
>       ROW_NUMBER() OVER (ORDER BY department),
>       SUM(SUM(salary)) OVER (ORDER BY department DESC)
> FROM employees
> GROUP BY department ORDER BY department;
>
> This query performs more sorts than really is needed. I suppose the most
> efficient way to process it would be to process the 2nd window first then
> the 1st before returning the results in the same order as the 1st window.
>
> Currently the plan looks like:
>
>                                         QUERY PLAN
> ----------------------------------------------------------------------------
> -----------------
>  Sort  (cost=1.33..1.34 rows=3 width=9)
>   Sort Key: department
>   ->  Window  (cost=1.25..1.31 rows=3 width=9)
>         ->  Sort  (cost=1.25..1.26 rows=3 width=9)
>               Sort Key: department
>               ->  Window  (cost=1.17..1.23 rows=3 width=9)
>                     ->  Sort  (cost=1.17..1.18 rows=3 width=9)
>                           Sort Key: department
>                           ->  HashAggregate  (cost=1.10..1.15 rows=3
> width=9)
>                                 ->  Seq Scan on employees  (cost=0.00..1.06
> rows=6 width=9)
>
>
> Maybe it's possible to sort the processing order of the windows based on the
> ORDER BY clauses putting any that match the ORDER BY of the final results
> last. I've not looked into this in much detail. Currently I cannot see any
> scenario where it would be bad.
>
> What do you think?
>

The patch doesn't have infrastracture to relocate each window node
yet. When relocating them we must re-number wfunc->winref so it's a
bit work, though possible. Also, I can imagine to fix only above case
but I'm afraid without having great overall sketch of planner
side-effect would come up...  For intance, what if the window query is
a subquery of group with sort strategy that assumes its subplan
doesn't sort. Maybe in that case upper group by doesn't notice the
sort key change done by window node relocating in subplan.

Regards,


-- 
Hitoshi Harada

$ uname -srpio
Linux 2.6.9-55.0.9.ELsmp i686 i386 GNU/Linux

# cpuinfo
Intel(R) Xeon(R) CPU           X3210  @ 2.13GHz



sample=# explain analyze select id, timestamp from bigtable order by id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------Index
Scanusing bigtable_pkey on bigtable  (cost=0.00..286808.13
 
rows=10000000 width=12) (actual time=0.021..11996.336 rows=10000000
loops=1)Total runtime: 20924.200 ms
(2 rows)


sample=# explain analyze select id, row_number() OVER (order by id)
from bigtable order by id;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------Window
(cost=0.00..361808.13 rows=10000000 width=4) (actual
 
time=0.080..35265.786 rows=10000000 loops=1)  ->  Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=4) (actual
time=0.075..13279.216 rows=10000000 loops=1)Total runtime: 44067.815 ms
(3 rows)


sample=# explain analyze select id,LAG(timestamp,1) over (order by id)
from bigtable order by id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------Window
(cost=0.00..411808.13 rows=10000000 width=12) (actual
 
time=30240.715..70439.057 rows=10000000 loops=1)  ->  Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=12) (actual
time=0.077..15302.919 rows=10000000 loops=1)Total runtime: 79658.314 ms
(3 rows)


sample=# explain analyze select id, ntile(10) OVER (order by id) from
bigtable order by id;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------Window
(cost=0.00..386808.13 rows=10000000 width=4) (actual
 
time=25158.467..52250.418 rows=10000000 loops=1)  ->  Index Scan using bigtable_pkey on bigtable
(cost=0.00..286808.13 rows=10000000 width=4) (actual
time=0.075..13088.061 rows=10000000 loops=1)Total runtime: 61275.279 ms
(3 rows)