Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries? - Mailing list pgsql-general

From Markus Winand
Subject Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?
Date
Msg-id B80CF88C-F9F6-41B3-A800-FD0A9A42CE7C@winand.at
Whole thread Raw
In response to Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?  (Thomas Kellerer <spam_eater@gmx.net>)
List pgsql-general
Hi!

I'd like to clarify this as the original author of the page in question.

Fist of all, I also recommend the row-value syntax as you can see on the "previous" page:
http://use-the-index-luke.com/sql/partial-results/fetch-next-page

I've also explained this procedure at conferences. Here are the slides:
http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way


And now about this quote:
> However, only SQL Server and the Oracle database can use them for a pipelined top-N query. PostgreSQL does not use
indexesfor those queries and therefore executes them very inefficiently. 

The very important thing here is "pipelined top-N query". This term is introduced two pages earlier:
http://use-the-index-luke.com/sql/partial-results/top-n-queries

The pipelined top-n query has two very important properties:
(1) it utilizes the index order to avoid the sort operation required to satisfy ORDER BY clause
(2) it realizes that it can stop processing as soon as it has delivered enough rows.

The execution plan from Thomas Kellerer sees to fulfill requirement (1) but definitively not (2).

Even with 9.3.2, I were not able to reproduce the result of Thomas (not showing any sort operation in the execution
plan)with the test data I also published at my website: 
   http://use-the-index-luke.com/sql/example-schema/postgresql/partial-results

Then I started fiddling around with the planner's cost settings and finally managed to get a plan similar to Thomas'
whensetting random_page_cost to 0.1 (setting it to 1, equal to seq_page_cost was not enough). However, that proves the
pointthat PostgreSQL can use an index to avoid the sort operation caused by order by (even for window functions). I'd
becurious what settings caused Thomas to get this result. 

The second requirement for pipelined top-n queries is not satisfied in Thomas' execution plan: it does read the full
index(actual rows=1000000), and applies the window function over all rows. Only in the end it throws away all
non-confirmingrows (Rows Removed by Filter: 999899). A pipelined top-n execution would not cause more than 300 rows
readfrom the index, and only 200 rows removed by filter. That's what the Oracle DB and SQL Server manage to do (Thomas:
pingme to track down why Oracle didn't for you). Considering that PG can use the index order to avoid the sort, it
stilldoesn't make very much sense if it cannot abort the index scan after fetching enough rows. So, not using the index
mighteven be right choice in unfiltered cases like this. 

Interestingly, I did not even get an Index Only Scan when just selecting column from the index with random_page_cost=1
andseq_page_cost=1. Again I had to reduce random_page_cost further down (0.1) to get an Index Only Scan. That does not
makesense to me. When both _page_costs are same, the Index Only Scan should get lower costs because the index is
smaller(338mb vs. 31 mb in my case). On top of that, the Index Only Scan avoids the sort. However, with 9.3.2 I get the
samecost for Index Only Scan as for Index Scan (had to enable_seqscan=off and enable_bitmapscan=off to get that). 

So, I have to change my page  (&book) to say something like this:

> PostgreSQL does not abort the index scan after fetching enough rows for those queries and therefore executes them
veryinefficiently. 


Thanks for the hint and always feel free to put my on CC regarding questions about stuff on Use The Index, Luke!

-markus

ps.: It's perfectly possible that PG could use indexes for window-functions before 9.3. I did definitively not fiddle
aroundwith cost settings at that time to force it into this plan. 
pps.: sorry for the delay, I'm not subscribed (just too much) but somebody was nice enough to ping me about this.
ppps: then I wondered how to properly reply without having the original messages. So I downloaded the .mbox from the
archiveand pushed reply there. Hope it ends up in the right thread :) 

Markus Winand
markus.winand@winand.at
T +43 1 9444047

> "A wonderful book…I highly recommend it." -Anders Janmyr
> http://sql-performance-explained.com/

Maderspergerstr. 1-3/9/11
1160 Wien
AUSTRIA


pgsql-general by date:

Previous
From: "Antman, Jason (CMG-Atlanta)"
Date:
Subject: Re: High Level Committers Wanted
Next
From: Pavel Stehule
Date:
Subject: Re: Is PostgreSQL 9.3 using indexes for pipelined top-N window function queries?