Re: Window functions and index usage - Mailing list pgsql-performance

From Robert Klemme
Subject Re: Window functions and index usage
Date
Msg-id CAM9pMnPOkMLCH6b8ToHY9ui-6LFKmwXZNzPNyLoDEVRzB-fQZg@mail.gmail.com
Whole thread Raw
In response to Window functions and index usage  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
Responses Re: Window functions and index usage  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
List pgsql-performance
On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> I have the following setup:
>
> create table test(id integer, seq integer);
> insert into test select generate_series(0, 100), generate_series(0, 1000);
> create unique index test_idx on test(id, seq);
> analyze test;
>
> Now I try to fetch the latest 5 values per id, ordered by seq from the
> table:
>
> select * from (
> select id, seq, row_number() over (partition by id order by seq)
>  from test
>  where id in (1, 2, 3)
> ) where row_number() <= 5;

It seems this fetches the *first* 5 values per id - and not the
latest.  For that you would need to "order by seq desc" in the window
and probably also in the index.

> This does not use the index on id, seq for sorting the data. It uses a
> bitmap index scan, and sequential scan when issued SET enable_bitmapscan to
> false. Tested both on git head and 8.4.8. See end of post for plans. It
> seems it would be possible to fetch the first values per id using the index,
> or at least skip the sorting.

Just guessing: since row_number is an analytic function and it can be
combined with any type of windowing only in "rare" cases do the
ordering of index columns and the ordering in the window align.  So
while your particular use case could benefit from this optimization
the overall judgement might be that it's not worthwhile to make the
optimizer more complex to cover this case - or I fail to see the more
general pattern here. :-)

> Is there some other way to run the query so that it would use the index? Is
> there plans to support the index usage for the above query assuming that it
> is possible to use the index for that query?
>
> The real world use case would be to fetch latest 5 threads per discussion
> forum in one query. Or fetch 3 latest comments for all patches in given
> commit fest in single query.

Is it really that realistic that someone wants the latest n entries
for *all* threads / patches?  It seems since this can result in a very
large data set this is probably not the type of query which is done
often.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: pkey is not used on productive database
Next
From: Anssi Kääriäinen
Date:
Subject: Re: Window functions and index usage