Thread: Window functions and index usage

Window functions and index usage

From

Anssi Kääriäinen

Date:

04 October 2011, 09:47:58

I have the following setup:

create table test(id integer, seq integer);
insert into test select generate_series(0, 100), generate_series(0, 1000);
create unique index test_idx on test(id, seq);
analyze test;

Now I try to fetch the latest 5 values per id, ordered by seq from the
table:

select * from (
select id, seq, row_number() over (partition by id order by seq)
   from test
  where id in (1, 2, 3)
) where row_number() <= 5;

This does not use the index on id, seq for sorting the data. It uses a
bitmap index scan, and sequential scan when issued SET enable_bitmapscan
to false. Tested both on git head and 8.4.8. See end of post for plans.
It seems it would be possible to fetch the first values per id using the
index, or at least skip the sorting.

If I emulate the behavior I want by using:
  (select id, seq from test where id = 1 order by seq limit 5)
union
  (select id, seq from test where id = 2 order by seq limit 5)
union
  (select id, seq from test where id = 2 order by seq limit 5);
I get two orders of magnitude faster execution.

Is there some other way to run the query so that it would use the index?
Is there plans to support the index usage for the above query assuming
that it is possible to use the index for that query?

The real world use case would be to fetch latest 5 threads per
discussion forum in one query. Or fetch 3 latest comments for all
patches in given commit fest in single query.

  - Anssi Kääriäinen

Normal plan (by the way, note the wildly inaccurate topmost row estimate):

                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on tmp  (cost=711.65..805.05 rows=958 width=16) (actual
time=10.543..27.469 rows=15 loops=1)
    Filter: (tmp.row_number <= 5)
    ->  WindowAgg  (cost=711.65..769.13 rows=2874 width=8) (actual
time=10.537..23.551 rows=3003 loops=1)
          ->  Sort  (cost=711.65..718.83 rows=2874 width=8) (actual
time=10.528..13.798 rows=3003 loops=1)
                Sort Key: test.id, test.seq
                Sort Method: quicksort  Memory: 182kB
                ->  Bitmap Heap Scan on test  (cost=59.04..546.55
rows=2874 width=8) (actual time=0.580..4.750 rows=3003 loops=1)
                      Recheck Cond: (id = ANY ('{1,2,3}'::integer[]))
                      ->  Bitmap Index Scan on test_idx
(cost=0.00..58.32 rows=2874 width=0) (actual time=0.490..0.490 rows=3003
loops=1)
                            Index Cond: (id = ANY ('{1,2,3}'::integer[]))
  Total runtime: 27.531 ms
(11 rows)

Plan with set enable_bitmapscan set to off:

                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Subquery Scan on tmp  (cost=2003.23..2096.64 rows=958 width=16)
(actual time=32.907..47.279 rows=15 loops=1)
    Filter: (tmp.row_number <= 5)
    ->  WindowAgg  (cost=2003.23..2060.71 rows=2874 width=8) (actual
time=32.898..44.053 rows=3003 loops=1)
          ->  Sort  (cost=2003.23..2010.42 rows=2874 width=8) (actual
time=32.883..36.067 rows=3003 loops=1)
                Sort Key: test.id, test.seq
                Sort Method: quicksort  Memory: 182kB
                ->  Seq Scan on test  (cost=0.00..1838.14 rows=2874
width=8) (actual time=0.017..26.733 rows=3003 loops=1)
                      Filter: (id = ANY ('{1,2,3}'::integer[]))
  Total runtime: 47.334 ms
(9 rows)

The UNION approach:

                                                               QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=28.47..28.62 rows=15 width=8) (actual
time=0.176..0.194 rows=15 loops=1)
    ->  Append  (cost=0.00..28.40 rows=15 width=8) (actual
time=0.026..0.149 rows=15 loops=1)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.024..0.045 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.022..0.034 rows=5
loops=1)
                      Index Cond: (id = 1)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.017..0.034 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.015..0.023 rows=5
loops=1)
                      Index Cond: (id = 2)
          ->  Limit  (cost=0.00..9.42 rows=5 width=8) (actual
time=0.021..0.037 rows=5 loops=1)
                ->  Index Scan using test_idx on test
(cost=0.00..1820.87 rows=967 width=8) (actual time=0.019..0.026 rows=5
loops=1)
                      Index Cond: (id = 3)
  Total runtime: 0.258 ms
(12 rows)

Re: Window functions and index usage

From

Robert Klemme

Date:

04 October 2011, 13:36:18

On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> I have the following setup:
>
> create table test(id integer, seq integer);
> insert into test select generate_series(0, 100), generate_series(0, 1000);
> create unique index test_idx on test(id, seq);
> analyze test;
>
> Now I try to fetch the latest 5 values per id, ordered by seq from the
> table:
>
> select * from (
> select id, seq, row_number() over (partition by id order by seq)
>  from test
>  where id in (1, 2, 3)
> ) where row_number() <= 5;

It seems this fetches the *first* 5 values per id - and not the
latest.  For that you would need to "order by seq desc" in the window
and probably also in the index.

> This does not use the index on id, seq for sorting the data. It uses a
> bitmap index scan, and sequential scan when issued SET enable_bitmapscan to
> false. Tested both on git head and 8.4.8. See end of post for plans. It
> seems it would be possible to fetch the first values per id using the index,
> or at least skip the sorting.

Just guessing: since row_number is an analytic function and it can be
combined with any type of windowing only in "rare" cases do the
ordering of index columns and the ordering in the window align.  So
while your particular use case could benefit from this optimization
the overall judgement might be that it's not worthwhile to make the
optimizer more complex to cover this case - or I fail to see the more
general pattern here. :-)

> Is there some other way to run the query so that it would use the index? Is
> there plans to support the index usage for the above query assuming that it
> is possible to use the index for that query?
>
> The real world use case would be to fetch latest 5 threads per discussion
> forum in one query. Or fetch 3 latest comments for all patches in given
> commit fest in single query.

Is it really that realistic that someone wants the latest n entries
for *all* threads / patches?  It seems since this can result in a very
large data set this is probably not the type of query which is done
often.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: Window functions and index usage

From

Anssi Kääriäinen

Date:

04 October 2011, 14:06:39

On 10/04/2011 04:27 PM, Robert Klemme wrote:
> On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
> <anssi.kaariainen@thl.fi>  wrote:
>> I have the following setup:
>>
>> create table test(id integer, seq integer);
>> insert into test select generate_series(0, 100), generate_series(0, 1000);
>> create unique index test_idx on test(id, seq);
>> analyze test;
>>
>> Now I try to fetch the latest 5 values per id, ordered by seq from the
>> table:
>>
>> select * from (
>> select id, seq, row_number() over (partition by id order by seq)
>>   from test
>>   where id in (1, 2, 3)
>> ) where row_number()<= 5;
> It seems this fetches the *first* 5 values per id - and not the
> latest.  For that you would need to "order by seq desc" in the window
> and probably also in the index.
>
Yeah. Sorry, wrong order. And the last line is wrong, it should be ")
tmp where row_number <= 5;".
>> This does not use the index on id, seq for sorting the data. It uses a
>> bitmap index scan, and sequential scan when issued SET enable_bitmapscan to
>> false. Tested both on git head and 8.4.8. See end of post for plans. It
>> seems it would be possible to fetch the first values per id using the index,
>> or at least skip the sorting.
> Just guessing: since row_number is an analytic function and it can be
> combined with any type of windowing only in "rare" cases do the
> ordering of index columns and the ordering in the window align.  So
> while your particular use case could benefit from this optimization
> the overall judgement might be that it's not worthwhile to make the
> optimizer more complex to cover this case - or I fail to see the more
> general pattern here. :-)
>
I think there are common use cases for this - see end of message for an
example.
>> Is there some other way to run the query so that it would use the index? Is
>> there plans to support the index usage for the above query assuming that it
>> is possible to use the index for that query?
>>
>> The real world use case would be to fetch latest 5 threads per discussion
>> forum in one query. Or fetch 3 latest comments for all patches in given
>> commit fest in single query.
> Is it really that realistic that someone wants the latest n entries
> for *all* threads / patches?  It seems since this can result in a very
> large data set this is probably not the type of query which is done
> often.
The idea is that the dataset isn't that large. And you don't have to
fetch them for all threads / patches. You might fetch them only for
patches in currently viewed commit fest. See
https://commitfest.postgresql.org/action/commitfest_view?id=12 for one
such use. What I have in mind is fetching first all the patches in the
commit fest in one go. Then issue query which would look something like:
  select * from
     (select comment_data, row_number() over (partition by patch_id
order by comment_date desc)
        from patch_comments
      where patch_id in (list of patch_ids fetched in first query)
    ) tmp where row_number <= 3;

Now you have all the data needed for the first column in the above
mentioned page.

I guess the commit fest application is fetching all the comments for the
patches in the commit fest in one query, and then doing the limit in
application code. This is fine because there aren't that many comments
per patch. But if you have dozens of forums and thousands of threads per
forum you can't do that.

This is useful in any situation where you want to show n latest entries
instead of the last entry in a list view. Latest modifications to an
object, latest commits to a file, latest messages to a discussion thread
or latest payments per project. Or 5 most popular videos per category,
10 highest paid employees per department and so on.

  - Anssi Kääriäinen

Re: Window functions and index usage

From

Tom Lane

Date:

04 October 2011, 14:36:57

=?ISO-8859-1?Q?Anssi_K=E4=E4ri=E4inen?= <anssi.kaariainen@thl.fi> writes:
> I have the following setup:
> create table test(id integer, seq integer);
> insert into test select generate_series(0, 100), generate_series(0, 1000);
> create unique index test_idx on test(id, seq);
> analyze test;

> Now I try to fetch the latest 5 values per id, ordered by seq from the
> table:

> select * from (
> select id, seq, row_number() over (partition by id order by seq)
>    from test
>   where id in (1, 2, 3)
> ) where row_number() <= 5;

> This does not use the index on id, seq for sorting the data. It uses a
> bitmap index scan, and sequential scan when issued SET enable_bitmapscan
> to false.

The cost estimates I get are 806 for bitmap scan and sort, 2097 for
seqscan and sort, 4890 for indexscan without sort.  It *can* use the
index for that query ... it just doesn't think it's a good idea.  It's
probably right, too.  At least, the actual runtimes go in the same order
on my machine.  Seqscan-and-sort very often beats an indexscan for
sorting a table, unless the table is clustered on the index or nearly so.

Note that it cannot use the index for both ordering and satisfying the
IN condition.  If it used the =ANY clause as an index condition, what
that would imply is three separate index searches and so the results
wouldn't necessarily be correctly ordered.  This is why the plain
indexscan costs out so expensive: it's a full-table scan.

            regards, tom lane

Re: Window functions and index usage

From

Anssi Kääriäinen

Date:

04 October 2011, 15:00:47

On 10/04/2011 05:36 PM, Tom Lane wrote:
> The cost estimates I get are 806 for bitmap scan and sort, 2097 for
> seqscan and sort, 4890 for indexscan without sort.  It *can* use the
> index for that query ... it just doesn't think it's a good idea.  It's
> probably right, too.  At least, the actual runtimes go in the same order
> on my machine.  Seqscan-and-sort very often beats an indexscan for
> sorting a table, unless the table is clustered on the index or nearly so.
I tested it and yes, it can use the index scan. But not in the way I
though it would be used.
> Note that it cannot use the index for both ordering and satisfying the
> IN condition.  If it used the =ANY clause as an index condition, what
> that would imply is three separate index searches and so the results
> wouldn't necessarily be correctly ordered.  This is why the plain
> indexscan costs out so expensive: it's a full-table scan.
This I don't understand. I would imagine it would be possible to execute
this query as get 5 first values for id 1, get 5 first values for id 2,
get 5 first values for id 3. At least if I do this by hand using UNION I
get two orders of magnitude faster execution time. I am not saying it
would be easy to do that, but to me it seems it would be possible to use
the index more efficiently for the example query. Or is the following
UNION query not equivalent to the window function query, assuming I am
not interested in the row_number column itself?

(select id, seq from test where id = 1 order by seq limit 5)
union
(select id, seq from test where id = 2 order by seq limit 5)
union
(select id, seq from test where id = 3 order by seq limit 5);

The results are in different order, but there is no order by in the
original query except in the OVER clause, so it should not matter.

  - Anssi Kääriäinen

Re: Window functions and index usage

From

Anssi Kääriäinen

Date:

04 October 2011, 15:05:05

On 10/04/2011 05:52 PM, Robert Klemme wrote:
> But then why do require using the second index column in the first
> place?  If the data set is small then the query is likely fast if the
> selection via id can use any index.
I mean the fetched dataset is not large, I didn't mean the dataset in
total isn't large. Imagine the commit fest application, but with 10000
comments per patch. You want to fetch the 100 patches in the current
commit fest, and 3 latest comments per patch.
>> And you don't have to fetch
>> them for all threads / patches. You might fetch them only for patches in
>> currently viewed commit fest. See
>> https://commitfest.postgresql.org/action/commitfest_view?id=12 for one such
>> use. What I have in mind is fetching first all the patches in the commit
>> fest in one go. Then issue query which would look something like:
>>   select * from
>>     (select comment_data, row_number() over (partition by patch_id order by
>> comment_date desc)
>>        from patch_comments
>>      where patch_id in (list of patch_ids fetched in first query)
>>    ) tmp where row_number<= 3;
> Interesting: I notice that I the query cannot successfully be simplified on 8.4:
>
> rklemme=>  select *,
> row_number() over (partition by id order by seq desc) as rn
> from test
> where id in (1,2,3)
> and rn<= 3
> ;
That can't be done, where conditions targeting window functions must be
done using subquery. There is no difference in 9.1 as far as I know.

> Again, what is easy for you as a human will likely be quite complex
> for the optimizer (knowing that the order by and the row_number output
> align).
I am not trying to say it is easy.

  - Anssi

Re: Window functions and index usage

From

Robert Klemme

Date:

04 October 2011, 15:18:23

On Tue, Oct 4, 2011 at 4:06 PM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> On 10/04/2011 04:27 PM, Robert Klemme wrote:
>>
>> On Tue, Oct 4, 2011 at 11:39 AM, Anssi Kääriäinen
>> <anssi.kaariainen@thl.fi>  wrote:
>>>
>>> I have the following setup:
>>>
>>> create table test(id integer, seq integer);
>>> insert into test select generate_series(0, 100), generate_series(0,
>>> 1000);
>>> create unique index test_idx on test(id, seq);
>>> analyze test;
>>>
>>> Now I try to fetch the latest 5 values per id, ordered by seq from the
>>> table:
>>>
>>> select * from (
>>> select id, seq, row_number() over (partition by id order by seq)
>>>  from test
>>>  where id in (1, 2, 3)
>>> ) where row_number()<= 5;
>>
>> It seems this fetches the *first* 5 values per id - and not the
>> latest.  For that you would need to "order by seq desc" in the window
>> and probably also in the index.
>>
> Yeah. Sorry, wrong order. And the last line is wrong, it should be ") tmp
> where row_number <= 5;".
>>>
>>> This does not use the index on id, seq for sorting the data. It uses a
>>> bitmap index scan, and sequential scan when issued SET enable_bitmapscan
>>> to
>>> false. Tested both on git head and 8.4.8. See end of post for plans. It
>>> seems it would be possible to fetch the first values per id using the
>>> index,
>>> or at least skip the sorting.
>>
>> Just guessing: since row_number is an analytic function and it can be
>> combined with any type of windowing only in "rare" cases do the
>> ordering of index columns and the ordering in the window align.  So
>> while your particular use case could benefit from this optimization
>> the overall judgement might be that it's not worthwhile to make the
>> optimizer more complex to cover this case - or I fail to see the more
>> general pattern here. :-)
>>
> I think there are common use cases for this - see end of message for an
> example.
>>>
>>> Is there some other way to run the query so that it would use the index?
>>> Is
>>> there plans to support the index usage for the above query assuming that
>>> it
>>> is possible to use the index for that query?
>>>
>>> The real world use case would be to fetch latest 5 threads per discussion
>>> forum in one query. Or fetch 3 latest comments for all patches in given
>>> commit fest in single query.
>>
>> Is it really that realistic that someone wants the latest n entries
>> for *all* threads / patches?  It seems since this can result in a very
>> large data set this is probably not the type of query which is done
>> often.
>
> The idea is that the dataset isn't that large.

But then why do require using the second index column in the first
place?  If the data set is small then the query is likely fast if the
selection via id can use any index.

> And you don't have to fetch
> them for all threads / patches. You might fetch them only for patches in
> currently viewed commit fest. See
> https://commitfest.postgresql.org/action/commitfest_view?id=12 for one such
> use. What I have in mind is fetching first all the patches in the commit
> fest in one go. Then issue query which would look something like:
>  select * from
>    (select comment_data, row_number() over (partition by patch_id order by
> comment_date desc)
>       from patch_comments
>     where patch_id in (list of patch_ids fetched in first query)
>   ) tmp where row_number <= 3;

Interesting: I notice that I the query cannot successfully be simplified on 8.4:

rklemme=> select *,
row_number() over (partition by id order by seq desc) as rn
from test
where id in (1,2,3)
and rn <= 3
;
ERROR:  column "rn" does not exist
LINE 5: and rn <= 3
            ^
rklemme=> select *,
row_number() over (partition by id order by seq desc) as rn
from test
where id in (1,2,3)
and row_number() <= 3;
ERROR:  window function call requires an OVER clause
LINE 5: and row_number() <= 3;
            ^
rklemme=> select *,
row_number() over (partition by id order by seq desc) as rn
from test
where id in (1,2,3)
and row_number() over (partition by id order by seq desc) <= 3;
ERROR:  window functions not allowed in WHERE clause
LINE 5: and row_number() over (partition by id order by seq desc) <=...
            ^
rklemme=>

I think I need to switch to 9.1 soon. :-)

> Now you have all the data needed for the first column in the above mentioned
> page.
>
> I guess the commit fest application is fetching all the comments for the
> patches in the commit fest in one query, and then doing the limit in
> application code. This is fine because there aren't that many comments per
> patch. But if you have dozens of forums and thousands of threads per forum
> you can't do that.
>
> This is useful in any situation where you want to show n latest entries
> instead of the last entry in a list view. Latest modifications to an object,
> latest commits to a file, latest messages to a discussion thread or latest
> payments per project. Or 5 most popular videos per category, 10 highest paid
> employees per department and so on.

Again, what is easy for you as a human will likely be quite complex
for the optimizer (knowing that the order by and the row_number output
align).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/