Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions - Mailing list pgsql-hackers

From Sameer Kumar
Subject Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Date
Msg-id CADp-Sm5npBTnnt20w84=00P2d8XsQQ0pogoPHAgq2okttm+vdw@mail.gmail.com
Whole thread Raw
In response to Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions  (David Johnston <polobo@yahoo.com>)
List pgsql-hackers
<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br
/></div><divclass="gmail_extra"><br /><div class="gmail_quote">On Wed, Oct 23, 2013 at 10:58 PM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class="im">SameerKumar <<a href="mailto:sameer.kumar@ashnik.com">sameer.kumar@ashnik.com</a>> writes:<br /> >
Iam not sure why but my PostgreSQL does not seem to be using indexes for<br /> > ORDER BY clause or PARTITION BY
CLAUSEwhich I use with windowing function.<br /><br /></div>When the entire contents of the table have to be read, a
seqscan-and-sort<br/> will frequently be estimated as cheaper than an indexscan.  If you think<br /> this is not true
onyour hardware, you might need to adjust<br /> random_page_cost.<br /><br />                         regards, tom
lane<br/></blockquote></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Mymistake. I had understood the issue
wrongly. </div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br
/></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Actually when I
usefunctions like max to find the maximum value grouped by another column I get a better performance when I try to do
thesame operation using max() over().</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Takea look at below plan:</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br /></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \x</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">Expanded display is
on.</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# \dS=
student_score;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"
size="1">        Table "enterprisedb.student_score"</font></div><div class="gmail_default"><font color="#339999"
face="arial,helvetica, sans-serif" size="1">    Column    |          Type           | Modifiers</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif"
size="1">--------------+-------------------------+-----------</font></div><divclass="gmail_default"><font
color="#339999"face="arial, helvetica, sans-serif" size="1"> id           | integer                 | not
null</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"
size="1"> student_name| character varying(1000) |</font></div><div class="gmail_default"><font color="#339999"
face="arial,helvetica, sans-serif" size="1"> score        | integer                 |</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1"> course       | character
varying(100) |</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"
size="1">Indexes:</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif"
size="1">   "student_score_pkey" PRIMARY KEY, btree (id)</font></div><div class="gmail_default"><font color="#339999"
face="arial,helvetica, sans-serif" size="1">    "idx_course" btree (course)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">    "idx_score" btree
(score)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br
/></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=#
selectcount(*) from student_score ;</font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1">-[ RECORD 1 ]-</font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1">count | 122880</font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1"><br /></font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1">edb=# explain analyze select max(score) from student_score group by
course;</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD1
]-------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | HashAggregate
 (cost=3198.20..3198.26rows=6 width=9) (actual time=110.792..110.793 rows=6 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2
]-------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   ->  Seq Scan
onstudent_score  (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.011..23.055 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD3
]-------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime:
110.862ms</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1"><br
/></font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=#
explainanalyze select max(score) over(partition by course) from student_score ;</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 1
]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | WindowAgg
 (cost=0.00..10324.65rows=122880 width=9) (actual time=36.145..224.504 rows=122880 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2
]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   ->  Index
Scanusing idx_course on student_score  (cost=0.00..8481.45 rows=122880 width=9) (actual time=0.037..85.283 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD3
]--------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime:
242.949ms</font></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif">ASyou can see there is a difference of twice. On
similarlines, when I have to find students who "topped" (had highest score) per course, I will fire something like
below:</span><br/></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain analyze select
student_namefrom student_score where (course,score)in (select course,max(score) from student_score group by
course);</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD1
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Hash Semi Join
 (cost=3198.41..6516.76rows=7300 width=43) (actual time=113.727..181.045 rows=555 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 2
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   Hash Cond:
(((enterprisedb.student_score.course)::text= (enterprisedb.student_score.course)::text) AND
(enterprisedb.student_score.score= (max(enterprisedb.student_score.score))))</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 3
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   ->  Seq Scan
onstudent_score  (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..22.702 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD4
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   ->  Hash
 (cost=3198.32..3198.32rows=6 width=9) (actual time=111.521..111.521 rows=6 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 5
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |         Buckets:
1024 Batches: 1  Memory Usage: 1kB</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,
sans-serif"size="1">-[ RECORD 6
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |         ->
 HashAggregate (cost=3198.20..3198.26 rows=6 width=9) (actual time=111.506..111.507 rows=6 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 7
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |              
-> Seq Scan on student_score  (cost=0.00..2583.80 rows=122880 width=9) (actual time=0.002..23.303 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD8
]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime:
181.284ms</font></div><div class="gmail_default"
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></div></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"><span
style="color:rgb(51,153,153);font-family:arial,helvetica,sans-serif"><br/></span></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Analternative way of doing this could
be:</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><br
/></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">edb=# explain
analyzeselect student_name from (select student_name, dense_rank() over(partition by course order by score)rn from
student_score)where rn=1 ;</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,
sans-serif"size="1">-[ RECORD 1
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Subquery Scan on
__unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD2
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   Filter:
(__unnamed_subquery_0.rn= 1)</font></div><div class="gmail_default"><font color="#339999" face="arial, helvetica,
sans-serif"size="1">-[ RECORD 3
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |   ->  WindowAgg
 (cost=12971.39..15428.99rows=122880 width=52) (actual time=2606.063..2928.061 rows=122880 loops=1)</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">-[ RECORD 4
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |         ->
 Sort (cost=12971.39..13278.59 rows=122880 width=52) (actual time=2606.020..2733.677 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD5
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |               Sort
Key:student_score.course, student_score.score</font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1">-[ RECORD 6
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |               Sort
Method:external merge  Disk: 7576kB</font></div><div class="gmail_default"><font color="#339999" face="arial,
helvetica,sans-serif" size="1">-[ RECORD 7
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN |              
-> Seq Scan on student_score  (cost=0.00..2583.80 rows=122880 width=52) (actual time=0.009..49.026 rows=122880
loops=1)</font></div><divclass="gmail_default"><font color="#339999" face="arial, helvetica, sans-serif" size="1">-[
RECORD8
]--------------------------------------------------------------------------------------------------------------------------------------</font></div><div
class="gmail_default"><fontcolor="#339999" face="arial, helvetica, sans-serif" size="1">QUERY PLAN | Total runtime:
2958.653ms</font></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)"><fontsize="1"><br /></font></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">The second format of query
couldbe more useful in DW and Data mining operations where I might not be always looking highest scorer. I may have to
lookfor 2nd highest scorer. I can make this 2nd query parameterized and filter on rn could be 2 or 3 based on my
interest. </div><br/></div><div class="gmail_extra"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif;color:rgb(51,153,153)">Anotherthing, (I may be stupid and naive here)
doesPostgreSQL re-uses the hash which has been already created for sort. In this case the inner query must have created
ahash for windoing aggregate. Can't we use that same one while applying the the filter "rn=1" ?</div><br /></div></div> 

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Location for external scripts for Extensions?
Next
From: Mike Blackwell
Date:
Subject: RULE regression test fragility?