Re: Windowing Function Patch Review -> Performance Comparison. - Mailing list pgsql-hackers

From David Rowley
Subject Re: Windowing Function Patch Review -> Performance Comparison.
Date
Msg-id 0DB02AB845AF4BCC85DBCBA2F187E364@amd64
Whole thread Raw
In response to Re: Windowing Function Patch Review -> Performance Comparison.  ("Hitoshi Harada" <umi.tanuki@gmail.com>)
Responses Re: Windowing Function Patch Review -> Performance Comparison.
List pgsql-hackers
Hitoshi Harada wrote:
> > david=# explain select date,lag(date,1) over (order by date) from
> > meter_Readings order by date;
> >                                                 QUERY PLAN
> > ------------------------------------------------------------------------
> ----
> > --------------------------------
> >  Sort  (cost=1038.73..1063.74 rows=10001 width=4)
> >   Sort Key: date
> >   ->  Window  (cost=0.00..374.27 rows=10001 width=4)
> >         ->  Index Scan using meter_readings_pkey on meter_readings
> > (cost=0.00..299.27 rows=10001 width=4)
> > (4 rows)
> >
> > Is the final sort still required? Is it not already sorted in the
> window?
> >
>
> Oh, I forgot to mention about it. This behavior is also fixed and it
> works without sort on the window now. I don't remember at all why I
> did so and there's no comment around that but regression tests showed
> there is no preblem without it.
>
> Regards,
>
> --
> Hitoshi Harada

Fantastic news! That will speed up the few test queries I have quite a bit
the sort was splitting to disk so performance was dropping quite a bit. This
might be a good time to say that the hardware that I'm testing on is ultra
slow. 600 Mhz Pentium III with only 28MB shared buffers.

I've done some more performance tests with the patch. Realising that I
really need to be comparing the performance with something else I decided to
have a query process a large table with then without a windowing function
just to see how much the query slows. Of course there is going to be an
overhead using a tuple store. I just wanted to see how much. My results are
not really very interesting, so I'll just include them in the bottom of the
email for those who want to see.

Casting my mind back to when the patch was always doing a sort before a
window with an order by even when a btree index was there, it's really come
a long way. I've little idea how difficult it would be to implement better
planning for the following. I suppose if it's difficult then it's probably
better to wait for the patch to be commited first, or maybe it's something
for 8.5.

SELECT department,      SUM(Salary),      ROW_NUMBER() OVER (ORDER BY department),      SUM(SUM(salary)) OVER (ORDER BY
departmentDESC) 
FROM employees
GROUP BY department ORDER BY department;

This query performs more sorts than really is needed. I suppose the most
efficient way to process it would be to process the 2nd window first then
the 1st before returning the results in the same order as the 1st window.

Currently the plan looks like:
                                        QUERY PLAN
----------------------------------------------------------------------------
-----------------Sort  (cost=1.33..1.34 rows=3 width=9)  Sort Key: department  ->  Window  (cost=1.25..1.31 rows=3
width=9)       ->  Sort  (cost=1.25..1.26 rows=3 width=9)              Sort Key: department              ->  Window
(cost=1.17..1.23rows=3 width=9)                    ->  Sort  (cost=1.17..1.18 rows=3 width=9)
SortKey: department                          ->  HashAggregate  (cost=1.10..1.15 rows=3 
width=9)                                ->  Seq Scan on employees  (cost=0.00..1.06
rows=6 width=9)


Maybe it's possible to sort the processing order of the windows based on the
ORDER BY clauses putting any that match the ORDER BY of the final results
last. I've not looked into this in much detail. Currently I cannot see any
scenario where it would be bad.

What do you think?

David


CREATE TABLE bigtable ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL
);

-- about 383MB of data
INSERT INTO bigtable (timestamp)
SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
FROM generate_series(1,10000000);

CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);

VACUUM ANALYZE bigtable;

-- base line test
david=# SELECT COUNT(*) FROM (select id,timestamp from bigtable order by id)
t; count
----------10000000
(1 row)

Time: 72862.238 ms

-- lag test
david=# SELECT COUNT(*) FROM (select id,LAG(timestamp,1) OVER (order by id)
from bigtable order by id) t; count
----------10000000
(1 row)

Time: 257713.350 ms

david=# SELECT COUNT(*) FROM (select id,NTILE(10) OVER (order by id) from
bigtable order by id) t; count
----------10000000
(1 row)

Time: 183131.425 ms





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Next
From: Tom Lane
Date:
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"