Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions - Mailing list pgsql-hackers
From | Sameer Kumar |
---|---|
Subject | Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions |
Date | |
Msg-id | CADp-Sm5npBTnnt20w84=00P2d8XsQQ0pogoPHAgq2okttm+vdw@mail.gmail.com Whole thread Raw |
In response to | Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Using indexes for ORDER BY and PARTITION BY clause in windowing
functions
|
List | pgsql-hackers |
<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_extra"><br /><div class="gmail_quote">On Wed, Oct 23, 2013 at 10:58 PM, Tom Lane <span dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">SameerKumar <<a href="mailto:sameer.kumar@ashnik.com">sameer.kumar@ashnik.com</a>> writes:<br /> > Iam not sure why but my PostgreSQL does not seem to be using indexes for<br /> > ORDER BY clause or PARTITION BY CLAUSEwhich I use with windowing function.<br /><br /></div>When the entire contents of the table have to be read, a seqscan-and-sort<br/> will frequently be estimated as cheaper than an indexscan. If you think<br /> this is not true onyour hardware, you might need to adjust<br /> random_page_cost.<br /><br /> regards, tom lane<br/></blockquote></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Mymistake. I had understood the issue wrongly. </div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Actually when I usefunctions like max to find the maximum value grouped by another column I get a better performance when I try to do thesame operation using max() over().</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Takea look at below plan:</div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \x</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">Expanded display is on.</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \dS= student_score;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> Table "enterprisedb.student_score"</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> Column | Type | Modifiers</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">--------------+-------------------------+-----------</font></div><divclass="gmail_default"><font color="#339999"face="arial, helvetica, sans-serif" size="1"> id | integer | not null</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> student_name| character varying(1000) |</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> score | integer |</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1"> course | character varying(100) |</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">Indexes:</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"> "student_score_pkey" PRIMARY KEY, btree (id)</font></div><div class="gmail_default"><font color="#339999" face="arial,helvetica, sans-serif" size="1"> "idx_course" btree (course)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1"> "idx_score" btree (score)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br /></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# selectcount(*) from student_score ;</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 1 ]-</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">count | 122880</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1"><br /></font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">edb=# explain analyze select max(score) from student_score group by course;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD1 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | HashAggregate (cost=3198.20..3198.26rows=6 width=9) (actual time=110.792..110.793 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan onstudent_score (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.011..23.055 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD3 ]-------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 110.862ms</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br /></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explainanalyze select max(score) over(partition by course) from student_score ;</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | WindowAgg (cost=0.00..10324.65rows=122880 width=9) (actual time=36.145..224.504 rows=122880 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Index Scanusing idx_course on student_score (cost=0.00..8481.45 rows=122880 width=9) (actual time=0.037..85.283 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD3 ]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 242.949ms</font></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif">ASyou can see there is a difference of twice. On similarlines, when I have to find students who "topped" (had highest score) per course, I will fire something like below:</span><br/></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain analyze select student_namefrom student_score where (course,score)in (select course,max(score) from student_score group by course);</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Hash Semi Join (cost=3198.41..6516.76rows=7300 width=43) (actual time=113.727..181.045 rows=555 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Hash Cond: (((enterprisedb.student_score.course)::text= (enterprisedb.student_score.course)::text) AND (enterprisedb.student_score.score= (max(enterprisedb.student_score.score))))</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan onstudent_score (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..22.702 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Hash (cost=3198.32..3198.32rows=6 width=9) (actual time=111.521..111.521 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Buckets: 1024 Batches: 1 Memory Usage: 1kB</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> HashAggregate (cost=3198.20..3198.26 rows=6 width=9) (actual time=111.506..111.507 rows=6 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan on student_score (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.002..23.303 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 181.284ms</font></div><div class="gmail_default" style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></div></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Analternative way of doing this could be:</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain analyzeselect student_name from (select student_name, dense_rank() over(partition by course order by score)rn from student_score)where rn=1 ;</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Subquery Scan on __unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD2 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Filter: (__unnamed_subquery_0.rn= 1)</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"size="1">-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> WindowAgg (cost=12971.39..15428.99rows=122880 width=52) (actual time=2606.063..2928.061 rows=122880 loops=1)</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Sort (cost=12971.39..13278.59 rows=122880 width=52) (actual time=2606.020..2733.677 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD5 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Sort Key:student_score.course, student_score.score</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 6 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Sort Method:external merge Disk: 7576kB</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,sans-serif" size="1">-[ RECORD 7 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | -> Seq Scan on student_score (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..49.026 rows=122880 loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD8 ]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime: 2958.653ms</font></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><fontsize="1"><br /></font></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">The second format of query couldbe more useful in DW and Data mining operations where I might not be always looking highest scorer. I may have to lookfor 2nd highest scorer. I can make this 2nd query parameterized and filter on rn could be 2 or 3 based on my interest. </div><br/></div><div class="gmail_extra"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Anotherthing, (I may be stupid and naive here) doesPostgreSQL re-uses the hash which has been already created for sort. In this case the inner query must have created ahash for windoing aggregate. Can't we use that same one while applying the the filter "rn=1" ?</div><br /></div></div>
pgsql-hackers by date: