Thread: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hi,
I was dealing with windowing function recently. I feel they are pretty useful and quite handy in lot of operations.
I am not sure why but my PostgreSQL does not seem to be using indexes for ORDER BY clause or PARTITION BY CLAUSE which I use with windowing function. I have tried ORDER BY and GROUP BY clauses in a normal sql statement and they seem to use indexes nicely.
Is this being already considered for development?
Best Regards,
Sameer Kumar | Database Consultant
ASHNIK PTE. LTD.
Sameer Kumar <sameer.kumar@ashnik.com> writes: > I am not sure why but my PostgreSQL does not seem to be using indexes for > ORDER BY clause or PARTITION BY CLAUSE which I use with windowing function. When the entire contents of the table have to be read, a seqscan-and-sort will frequently be estimated as cheaper than an indexscan. If you think this is not true on your hardware, you might need to adjust random_page_cost. regards, tom lane
Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Sameer Kumar
Date:
<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_extra"><br /><div class="gmail_quote">On Wed, Oct 23, 2013 at 10:58 PM, Tom Lane <span dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">SameerKumar <<a href="mailto:sameer.kumar@ashnik.com">sameer.kumar@ashnik.com</a>> writes:<br /> > Iam not sure why but my PostgreSQL does not seem to be using indexes for<br /> > ORDER BY clause or PARTITION BY CLAUSEwhich I use with windowing function.<br /><br /></div>When the entire contents of the table have to be read, a seqscan-and-sort<br/> will frequently be estimated as cheaper than an indexscan. If you think<br /> this is not true onyour hardware, you might need to adjust<br /> random_page_cost.<br /><br /> regards, tom lane<br/></blockquote></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Mymistake. I had understood the issue wrongly. </div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Actually when I usefunctions like max to find the maximum value grouped by another column I get a better performance when I try to do thesame operation using max() over().</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Takea look at below plan:</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \x</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">Expanded display is on.</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \dS= student_score;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> Table "enterprisedb.student_score"</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> Column | Type | Modifiers</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">--------------+-------------------------+-----------</font></div><divclass="gmail_default"><font color="#339999"face="arial, helvetica, sans-serif" size="1"> id | integer | not null</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> student_name| character varying(1000) |</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> score | integer |</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1"> course | character varying(100) |</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">Indexes:</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> "student_score_pkey" PRIMARY KEY, btree (id)</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> "idx_course" btree (course)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1"> "idx_score" btree (score)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br /></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# selectcount(*) from student_score ;</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 1 ]-</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">count | 122880</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1"><br /></font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">edb=# explain analyze select max(score) from student_score group by course;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD1 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | HashAggregate (cost=3198.20..3198.26rows=6 width=9) (actual time=110.792..110.793 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan onstudent_score (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.011..23.055 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD3 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 110.862ms</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br /></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explainanalyze select max(score) over(partition by course) from student_score ;</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | WindowAgg (cost=0.00..10324.65rows=122880 width=9) (actual time=36.145..224.504 rows=122880 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Index Scanusing idx_course on student_score (cost=0.00..8481.45 rows=122880 width=9) (actual time=0.037..85.283 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD3 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 242.949ms</font></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif">ASyou can see there is a difference of twice. On similarlines, when I have to find students who "topped" (had highest score) per course, I will fire something like below:</span><br/></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain analyze select student_namefrom student_score where (course,score)in (select course,max(score) from student_score group by course);</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Hash Semi Join (cost=3198.41..6516.76rows=7300 width=43) (actual time=113.727..181.045 rows=555 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Hash Cond: (((enterprisedb.student_score.course)::text= (enterprisedb.student_score.course)::text) AND (enterprisedb.student_score.score= (max(enterprisedb.student_score.score))))</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan onstudent_score (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..22.702 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Hash (cost=3198.32..3198.32rows=6 width=9) (actual time=111.521..111.521 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Buckets: 1024 Batches: 1 Memory Usage: 1kB</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> HashAggregate (cost=3198.20..3198.26 rows=6 width=9) (actual time=111.506..111.507 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan on student_score (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.002..23.303 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 181.284ms</font></div><div class="gmail_default" style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></div></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Analternative way of doing this could be:</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain analyzeselect student_name from (select student_name, dense_rank() over(partition by course order by score)rn from student_score)where rn=1 ;</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Subquery Scan on __unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD2 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Filter: (__unnamed_subquery_0.rn= 1)</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> WindowAgg (cost=12971.39..15428.99rows=122880 width=52) (actual time=2606.063..2928.061 rows=122880 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Sort (cost=12971.39..13278.59 rows=122880 width=52) (actual time=2606.020..2733.677 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD5 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Sort Key:student_score.course, student_score.score</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 6 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Sort Method:external merge Disk: 7576kB</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 7 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan on student_score (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..49.026 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD8 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 2958.653ms</font></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><fontsize="1"><br /></font></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">The second format of query couldbe more useful in DW and Data mining operations where I might not be always looking highest scorer. I may have to lookfor 2nd highest scorer. I can make this 2nd query parameterized and filter on rn could be 2 or 3 based on my interest. </div><br/></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Anotherthing, (I may be stupid and naive here) doesPostgreSQL re-uses the hash which has been already created for sort. In this case the inner query must have created ahash for windoing aggregate. Can't we use that same one while applying the the filter "rn=1" ?</div><br /></div></div>
Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
David Johnston
Date:
Sameer Kumar wrote > edb=# explain analyze select max(score) from student_score group by > course; This query returns 6 records. The window one returns 123,000. Why do you expect these to have anywhere near the same performance or plan? You can enable/disable indexes/scans to see what alternatives plans may provide but nothing here stands out as being obviously incorrect. I'm not really clear on what your question is. Generally it sounds as if you are wondering if there are any plans to I prove the algorithms behind window function processing. Are you just looking at symptoms and thus possibly have unreasonable expectations or do you actually see an avenue for improvement in the engine? > QUERY PLAN | Sort Method: external merge Disk: 7576kB Work memory; I/O is killing your performance on this query. It is more flexible but you pay a price for that. > Another thing, (I may be stupid and naive here) does PostgreSQL re-uses > the > hash which has been already created for sort. In this case the inner query > must have created a hash for windoing aggregate. Can't we use that same > one > while applying the the filter "rn=1" ? Probably but others more knowledgable will need to answer authoritatively. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Using-indexes-for-ORDER-BY-and-PARTITION-BY-clause-in-windowing-functions-tp5775605p5775708.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Sameer Kumar
Date:
<p dir="ltr">Agree that windowing function will return all the rows compared to max and group by returing only max rows pergroup. But even while arriving at the aggregate/sorting windowing function seems to spend more effort than group by/orderby. <p dir="ltr">I am just trying to see if we could somehow optimize the way windowing operations are performed.(May be in query rewrite). Datawarehouses could use that improvement.<br /> Its not my production box, so I canlive with disk sort. I have tried with huge sorting memory but still I see a similar difference in cost of sorting forgrouping/ordering Vs windowing function.<br /> Another thing regarding work_memory, I have generally seen that windowingfunctions expect more amount of memory for sorting compared to grouping/ordering clauses.<div class="gmail_quote">On24 Oct 2013 10:54, "David Johnston" <<a href="mailto:polobo@yahoo.com">polobo@yahoo.com</a>>wrote:<br type="attribution" /><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Sameer Kumar wrote<br /> > edb=# explain analyzeselect max(score) from student_score group by<br /> > course;<br /><br /> This query returns 6 records. The windowone returns 123,000. Why do you<br /> expect these to have anywhere near the same performance or plan?<br /><br />You can enable/disable indexes/scans to see what alternatives plans may<br /> provide but nothing here stands out as beingobviously incorrect.<br /><br /> I'm not really clear on what your question is. Generally it sounds as if<br /> youare wondering if there are any plans to I prove the algorithms behind<br /> window function processing. Are you justlooking at symptoms and thus<br /> possibly have unreasonable expectations or do you actually see an avenue for<br />improvement in the engine?<br /><br /><br /> > QUERY PLAN | Sort Method: external merge Disk: 7576kB<br/><br /> Work memory; I/O is killing your performance on this query. It is more<br /> flexible but you pay a pricefor that.<br /><br /><br /> > Another thing, (I may be stupid and naive here) does PostgreSQL re-uses<br /> >the<br /> > hash which has been already created for sort. In this case the inner query<br /> > must have createda hash for windoing aggregate. Can't we use that same<br /> > one<br /> > while applying the the filter "rn=1"?<br /><br /> Probably but others more knowledgable will need to answer authoritatively.<br /><br /> David J.<br /><br/><br /><br /><br /><br /> --<br /> View this message in context: <a href="http://postgresql.1045698.n5.nabble.com/Using-indexes-for-ORDER-BY-and-PARTITION-BY-clause-in-windowing-functions-tp5775605p5775708.html" target="_blank">http://postgresql.1045698.n5.nabble.com/Using-indexes-for-ORDER-BY-and-PARTITION-BY-clause-in-windowing-functions-tp5775605p5775708.html</a><br />Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.<br /><br /><br /> --<br /> Sent via pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote></div>
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Kyotaro HORIGUCHI
Date:
Hello, > Agree that windowing function will return all the rows compared to max and > group by returing only max rows per group. But even while arriving at the > aggregate/sorting windowing function seems to spend more effort than group > by/order by. (I'll apologise in advance for possible misreading..) The most cause of the difference in time comes from sorting. Over 90% of total execution time has elapsed while sorting (49ms->2733ms) for the one using windowing function. If this sort were useless, the execution time would be less than 300 ms - seems comparable enough to group-by query. | Subquery Scan on __unnamed_subquery_0 | (actual time=2606.075..2953.937 rows=558 loops=1) | Filter: (__unnamed_subquery_0.rn = 1) | -> WindowAgg (actual time=2606.063..2928.061 rows=122880 loops=1) | -> Sort (actual time=2606.020..2733.677 rows=122880 loops=1) | Sort Key: student_score.course, student_score.score | -> Seq Scan on student_score | (actual time=0.009..49.026 rows=122880 loops=1) As you see in above plan, sorting key is (course, score). If your point is the overall performance but not reusing a kind of 'hash', there's a big chance to eliminate this sorting if you are able to have an additional index, say, =# create index idx_co_sc on student_score using btree (course, score); With this index, you will get a different plan like this, > uniontest=# explain analyze select student_name from (select student_name, dense_rank() over(partition by course orderby score) rn, score from student_score) rnn where rn=2; > QUERY PLAN > ------------------------------------------------------------------------------- > Subquery Scan on rnn (actual time=0.088..319.403 rows=135 loops=1) > Filter: (rnn.rn = 2) > Rows Removed by Filter: 122746 > -> WindowAgg (actual time=0.037..296.851 rows=122881 loops=1) > -> Index Scan using idx_co_sc on student_score > (actual time=0.027..111.333 rows=122881 loops=1) > Total runtime: 319.483 ms Does this satisfies your needs? ======= > Another thing, (I may be stupid and naive here) does PostgreSQL > re-uses the hash which has been already created for sort. In > this case the inner query must have created a hash for windoing > aggregate. Can't we use that same one while applying the the > filter "rn=1" ? Generally saying, hashes cannot yield ordered output by its nature, I believe. Windowing function (execnode) always receives tuples sequentially in the window-defined order (as you see in the explained plan above) then processes the tuples in semi tuple-by-tuple manner to perform per-frame aggregaion, and finally outputs tuples of the same number to input. And furthermore, dense_rank() doesn't even need per-frame aggregations. So none of the planners so far seems to have chance to use a kind of hash tables to culculate/execute windowing fucntions. On the another point, automatically preserving some internal data within a query beyond the end of the query brings in 'when to discard it?' problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Sameer Kumar
Date:
(I'll apologise in advance for possible misreading..)
> Agree that windowing function will return all the rows compared to max and
> group by returing only max rows per group. But even while arriving at the
> aggregate/sorting windowing function seems to spend more effort than group
> by/order by.
The most cause of the difference in time comes from sorting. Over
90% of total execution time has elapsed while sorting
(49ms->2733ms) for the one using windowing function. If this sort
were useless, the execution time would be less than 300 ms -
seems comparable enough to group-by query.
I will agree with you on this point.
| Subquery Scan on __unnamed_subquery_0
| (actual time=2606.075..2953.937 rows=558 loops=1)
| Filter: (__unnamed_subquery_0.rn = 1)
| -> WindowAgg (actual time=2606.063..2928.061 rows=122880 loops=1)
| -> Sort (actual time=2606.020..2733.677 rows=122880 loops=1)
| Sort Key: student_score.course, student_score.score
| -> Seq Scan on student_score
| (actual time=0.009..49.026 rows=122880 loops=1)
As you see in above plan, sorting key is (course, score). If your
point is the overall performance but not reusing a kind of
'hash', there's a big chance to eliminate this sorting if you are
able to have an additional index, say,
=# create index idx_co_sc on student_score using btree (course, score);
With this index, you will get a different plan like this,
Exactly my point, can we look at making windowing functions smart and make use of available indexes?
> uniontest=# explain analyze select student_name from (select student_name, dense_rank() over(partition by course order by score) rn, score from student_score) rnn where rn=2;
> QUERY PLAN
> -------------------------------------------------------------------------------
> Subquery Scan on rnn (actual time=0.088..319.403 rows=135 loops=1)
> Filter: (rnn.rn = 2)
> Rows Removed by Filter: 122746
> -> WindowAgg (actual time=0.037..296.851 rows=122881 loops=1)
> -> Index Scan using idx_co_sc on student_score
> (actual time=0.027..111.333 rows=122881 loops=1)
> Total runtime: 319.483 ms
Does this satisfies your needs?
Not exactly. If I have missed to mention, this is not a production issue for me. I am trying to see if PostgreSQL planner produces best plans for Data Warehouse and mining oriented queries.
=======> Another thing, (I may be stupid and naive here) does PostgreSQLGenerally saying, hashes cannot yield ordered output by its
> re-uses the hash which has been already created for sort. In
> this case the inner query must have created a hash for windoing
> aggregate. Can't we use that same one while applying the the
> filter "rn=1" ?
nature, I believe.
I think Hashes can be efficiently used for sorting (and I believe they are used for joins too when a pre-sorted data set is not available via indexes). This again could my misinterpretation.
Windowing function (execnode) always receives tuples sequentially
in the window-defined order (as you see in the explained plan
above) then processes the tuples in semi tuple-by-tuple manner to
perform per-frame aggregaion, and finally outputs tuples of the
same number to input. And furthermore, dense_rank() doesn't even
need per-frame aggregations. So none of the planners so far seems
to have chance to use a kind of hash tables to culculate/execute
windowing fucntions. On the another point, automatically
preserving some internal data within a query beyond the end of
the query brings in 'when to discard it?' problem.
I lost you somewhere here. My be this is above my pay-grade :-)
Well, at least with Oracle and DB2 planners I have seen that the plan produced with dense_rank performs better than a series of nested SELECT MAX().
Regards
Sameer
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Kyotaro HORIGUCHI
Date:
Hello, > > With this index, you will get a different plan like this, > > > Exactly my point, can we look at making windowing functions > smart and make use of available indexes? I might have guessed.. > > Does this satisfies your needs? > > > Not exactly. If I have missed to mention, this is not a > production issue for me. I am trying to see if PostgreSQL > planner produces best plans for Data Warehouse and mining > oriented queries. I agree to the point. > I think Hashes can be efficiently used for sorting (and I > believe they are used for joins too when a pre-sorted data set > is not available via indexes). This again could my > misinterpretation. It is true if 'Sorting' means 'key classification without orderings'. Hashes should always appear at inner side of a join, I'm convinced. The "ordered' nature is not required for the case if outer side is already ordered. If not, separate sorting will needed. > I lost you somewhere here. My be this is above my pay-grade :-) Sorry for my crumsy english :-< > Well, at least with Oracle and DB2 planners I have seen that > the plan produced with dense_rank performs better than a series > of nested SELECT MAX(). I see your point. Although I don't know what plans they generates, and I don't see how to ordering and ranking without sorting. Could you let me see what they look like? # Nevertheless, I don't have the confidence that I can be of some # help.. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Sameer Kumar
Date:
Hello,I might have guessed..
> > With this index, you will get a different plan like this,
> >
> Exactly my point, can we look at making windowing functions
> smart and make use of available indexes?I agree to the point.
> > Does this satisfies your needs?
> >
> Not exactly. If I have missed to mention, this is not a
> production issue for me. I am trying to see if PostgreSQL
> planner produces best plans for Data Warehouse and mining
> oriented queries.It is true if 'Sorting' means 'key classification without
> I think Hashes can be efficiently used for sorting (and I
> believe they are used for joins too when a pre-sorted data set
> is not available via indexes). This again could my
> misinterpretation.
orderings'. Hashes should always appear at inner side of a join,
I'm convinced. The "ordered' nature is not required for the case
if outer side is already ordered. If not, separate sorting will
needed.Sorry for my crumsy english :-<
> I lost you somewhere here. My be this is above my pay-grade :-)
No, it was not your English. :-)
When I read it again and try to relate, I get your point. Actually true, hashes must always be performed as last option (if that is what you too meant) and if there are few other operations they must be the last one to be performed especially after sorting/grouping. Hashes must somehow make use of already sorted data (I think this something even you indicated)
When I read it again and try to relate, I get your point. Actually true, hashes must always be performed as last option (if that is what you too meant) and if there are few other operations they must be the last one to be performed especially after sorting/grouping. Hashes must somehow make use of already sorted data (I think this something even you indicated)
> Well, at least with Oracle and DB2 planners I have seen thatI see your point. Although I don't know what plans they
> the plan produced with dense_rank performs better than a series
> of nested SELECT MAX().
generates, and I don't see how to ordering and ranking without
sorting. Could you let me see what they look like?
# Nevertheless, I don't have the confidence that I can be of some
# help..
I will do that if I get a DB2 system or Oracle system running. I will try to replicate the same 2 test cases and share the plan. One thing which I am sure is, the below part of the plan
QUERY PLAN | Subquery Scan on __unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558 loops=1)
would be generated as RID scan in DB2 (which I have seen to perform better than normal subquery scans in DB2).
Regards
Sameer
Ashnik Pte Ltd
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Kyotaro HORIGUCHI
Date:
Hello, > When I read it again and try to relate, I get your point. Actually true, > hashes must always be performed as last option (if that is what you too > meant) and if there are few other operations they must be the last one to > be performed especially after sorting/grouping. Hashes must somehow make > use of already sorted data (I think this something even you indicated) Yes, some 'hash'es could preserve order selecting such a function for hash function. But at least PostgreSQL's 'HashAggregation' uses not-order-preserving function as hash function. So output cannot preserve input ordering. > I will do that if I get a DB2 system or Oracle system running. I will try > to replicate the same 2 test cases and share the plan. One thing which I am > sure is, the below part of the plan > > QUERY PLAN | Subquery Scan on __unnamed_subquery_0 > (cost=12971.39..16964.99 rows=614 width=43) (actual > time=2606.075..2953.937 rows=558 loops=1) > > would be generated as RID scan in DB2 (which I have seen to perform better > than normal subquery scans in DB2). DB2's document says it is used for 'index ORing' corresponds OR or IN ops, which seems to be a relative to BitmapOr of PostgreSQL, perhaps not to HashAggregates/SemiJoin. I tried to imagin the plan for the group_by case with repeated index scan and merging.. > select student_name > from student_score > where (course,score) in (select course,max(score) > from student_score group by course); Taking the advantage that the cardinarity of course is 8, this query could be transformed into 8 times of index scan and bitmaping. With hypothetical plan node LOOP, and BitmapScanAdd the plan could be, | BitmapHeapScan (rows = 155, loops = 1) | -> LOOP | ON Subquery (select distinct course from student_course) as c0 | -> BitmapScanAdd (loops = 8) | BitmapCond: (student_score.score = x) | -> Limit (rows = 1, loops = 8) AS x | -> Unique (rows = 1, loops = 8) | -> IndexScan using idx_score on student_course (rows = 1, loops = 8) | Filter (student_course.course = c0) I suppose this is one possibility of what DB2 is doing. If DB2 does the same optimization for ranking > 1 with the dense_rank() case, this plan might be like this, | BitmapHeapScan (rows = 133, loops = 1) | -> LOOP | ON Subquery (select distinct course from student_course) as c0 | -> BitmapScanAdd (loops = 8) | BitmapCond: (student_score.score = x) | -> Limit (rows = 1, loops = 8) AS x | -> Unique (rows = 2, loops = 8) | -> IndexScan using idx_score on student_course (rows = 18,loops= 8) | Filter (student_course.course = c0) Both plans surely seem to be done shortly for relatively small n's and number of courses. On the other hand, using semijoin as PostgreSQL does, creating HashAggregate storing nth place score for every course requires some memory to work on for each course. | Hash Semi Join | Hash Cond: (a.course = b.course and a.score = b.score) | -> SeqScan on student_score as a | -> Hash | -> HashAggregatesFunc (rows = 8) | Key = course, func = rankn(dense_rank(), n, key, val) | -> SeqScan on student_score (rows = 122880) Where, rankn() must keep socres down to nth rank and emits nth score as final value. I don't get more generic form for this mechanism right now, and the value to do in this specific manner seems not so much.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
From
Sameer Kumar
Date:
On Thu, Nov 14, 2013 at 6:51 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,Yes, some 'hash'es could preserve order selecting such a function
> When I read it again and try to relate, I get your point. Actually true,
> hashes must always be performed as last option (if that is what you too
> meant) and if there are few other operations they must be the last one to
> be performed especially after sorting/grouping. Hashes must somehow make
> use of already sorted data (I think this something even you indicated)
for hash function. But at least PostgreSQL's 'HashAggregation'
uses not-order-preserving function as hash function. So output
cannot preserve input ordering.DB2's document says it is used for 'index ORing' corresponds OR
> I will do that if I get a DB2 system or Oracle system running. I will try
> to replicate the same 2 test cases and share the plan. One thing which I am
> sure is, the below part of the plan
>
> QUERY PLAN | Subquery Scan on __unnamed_subquery_0
> (cost=12971.39..16964.99 rows=614 width=43) (actual
> time=2606.075..2953.937 rows=558 loops=1)
>
> would be generated as RID scan in DB2 (which I have seen to perform better
> than normal subquery scans in DB2).
or IN ops, which seems to be a relative to BitmapOr of
PostgreSQL, perhaps not to HashAggregates/SemiJoin.
I tried to imagin the plan for the group_by case with repeated
index scan and merging..
> select student_name
> from student_score
> where (course,score) in (select course,max(score)> from student_score group by course);Taking the advantage that the cardinarity of course is 8, this
query could be transformed into 8 times of index scan and
bitmaping.
With hypothetical plan node LOOP, and BitmapScanAdd the plan
could be,
| BitmapHeapScan (rows = 155, loops = 1)
| -> LOOP
| ON Subquery (select distinct course from student_course) as c0
| -> BitmapScanAdd (loops = 8)
| BitmapCond: (student_score.score = x)
| -> Limit (rows = 1, loops = 8) AS x
| -> Unique (rows = 1, loops = 8)
| -> IndexScan using idx_score on student_course (rows = 1, loops = 8)
| Filter (student_course.course = c0)
I suppose this is one possibility of what DB2 is doing. If DB2
does the same optimization for ranking > 1 with the dense_rank()
case, this plan might be like this,
I can not be sure but this one seems logically correct from cost and cardinality perspective(am not sure the operations that the DB2 planner would generate ). Need to test it.
| BitmapHeapScan (rows = 133, loops = 1)
| -> LOOP
| ON Subquery (select distinct course from student_course) as c0
| -> BitmapScanAdd (loops = 8)
| BitmapCond: (student_score.score = x)
| -> Limit (rows = 1, loops = 8) AS x
| -> Unique (rows = 2, loops = 8)
| -> IndexScan using idx_score on student_course (rows = 18,loops= 8)
| Filter (student_course.course = c0)
Both plans surely seem to be done shortly for relatively small
n's and number of courses.
I guess they would do well even when the cardinality of courses is fairly high (unless we hit a scenario where courses offered are more than/in the same decimal range as students opting for them).
On the other hand, using semijoin as PostgreSQL does, creating
HashAggregate storing nth place score for every course requires
some memory to work on for each course.
| Hash Semi Join
| Hash Cond: (a.course = b.course and a.score = b.score)
| -> SeqScan on student_score as a
| -> Hash
| -> HashAggregatesFunc (rows = 8)
| Key = course, func = rankn(dense_rank(), n, key, val)
| -> SeqScan on student_score (rows = 122880)
Where, rankn() must keep socres down to nth rank and emits nth
score as final value. I don't get more generic form for this
mechanism right now, and the value to do in this specific manner
seems not so much..
I feel the advantage could be more when dealing with a DW environment which has more complex aggregate and windowing queries. Extending this to other windowing function, it could be a great gain for DW and OLAP queries.
Regards
Sameer