Re: Windowing Function Patch Review -> Performance Comparison. - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Windowing Function Patch Review -> Performance Comparison. |
Date | |
Msg-id | 0DB02AB845AF4BCC85DBCBA2F187E364@amd64 Whole thread Raw |
In response to | Re: Windowing Function Patch Review -> Performance Comparison. ("Hitoshi Harada" <umi.tanuki@gmail.com>) |
Responses |
Re: Windowing Function Patch Review -> Performance Comparison.
|
List | pgsql-hackers |
Hitoshi Harada wrote: > > david=# explain select date,lag(date,1) over (order by date) from > > meter_Readings order by date; > > QUERY PLAN > > ------------------------------------------------------------------------ > ---- > > -------------------------------- > > Sort (cost=1038.73..1063.74 rows=10001 width=4) > > Sort Key: date > > -> Window (cost=0.00..374.27 rows=10001 width=4) > > -> Index Scan using meter_readings_pkey on meter_readings > > (cost=0.00..299.27 rows=10001 width=4) > > (4 rows) > > > > Is the final sort still required? Is it not already sorted in the > window? > > > > Oh, I forgot to mention about it. This behavior is also fixed and it > works without sort on the window now. I don't remember at all why I > did so and there's no comment around that but regression tests showed > there is no preblem without it. > > Regards, > > -- > Hitoshi Harada Fantastic news! That will speed up the few test queries I have quite a bit the sort was splitting to disk so performance was dropping quite a bit. This might be a good time to say that the hardware that I'm testing on is ultra slow. 600 Mhz Pentium III with only 28MB shared buffers. I've done some more performance tests with the patch. Realising that I really need to be comparing the performance with something else I decided to have a query process a large table with then without a windowing function just to see how much the query slows. Of course there is going to be an overhead using a tuple store. I just wanted to see how much. My results are not really very interesting, so I'll just include them in the bottom of the email for those who want to see. Casting my mind back to when the patch was always doing a sort before a window with an order by even when a btree index was there, it's really come a long way. I've little idea how difficult it would be to implement better planning for the following. I suppose if it's difficult then it's probably better to wait for the patch to be commited first, or maybe it's something for 8.5. SELECT department, SUM(Salary), ROW_NUMBER() OVER (ORDER BY department), SUM(SUM(salary)) OVER (ORDER BY departmentDESC) FROM employees GROUP BY department ORDER BY department; This query performs more sorts than really is needed. I suppose the most efficient way to process it would be to process the 2nd window first then the 1st before returning the results in the same order as the 1st window. Currently the plan looks like: QUERY PLAN ---------------------------------------------------------------------------- -----------------Sort (cost=1.33..1.34 rows=3 width=9) Sort Key: department -> Window (cost=1.25..1.31 rows=3 width=9) -> Sort (cost=1.25..1.26 rows=3 width=9) Sort Key: department -> Window (cost=1.17..1.23rows=3 width=9) -> Sort (cost=1.17..1.18 rows=3 width=9) SortKey: department -> HashAggregate (cost=1.10..1.15 rows=3 width=9) -> Seq Scan on employees (cost=0.00..1.06 rows=6 width=9) Maybe it's possible to sort the processing order of the windows based on the ORDER BY clauses putting any that match the ORDER BY of the final results last. I've not looked into this in much detail. Currently I cannot see any scenario where it would be bad. What do you think? David CREATE TABLE bigtable ( id SERIAL NOT NULL PRIMARY KEY, timestamp TIMESTAMP NOT NULL ); -- about 383MB of data INSERT INTO bigtable (timestamp) SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL FROM generate_series(1,10000000); CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp); VACUUM ANALYZE bigtable; -- base line test david=# SELECT COUNT(*) FROM (select id,timestamp from bigtable order by id) t; count ----------10000000 (1 row) Time: 72862.238 ms -- lag test david=# SELECT COUNT(*) FROM (select id,LAG(timestamp,1) OVER (order by id) from bigtable order by id) t; count ----------10000000 (1 row) Time: 257713.350 ms david=# SELECT COUNT(*) FROM (select id,NTILE(10) OVER (order by id) from bigtable order by id) t; count ----------10000000 (1 row) Time: 183131.425 ms
pgsql-hackers by date: