Thread: Using window functions to get the unpaginated count for paginated queries

Using window functions to get the unpaginated count for paginated queries

From
Clemens Park
Date:
Hi all,

Recently, during a performance improvement sweep for an application at my company, one of the hotspots that was discovered was pagination.

In order to display the correct pagination links on the page, the pagination library we used (most pagination libraries for that matter) ran the query with OFFSET and LIMIT to get the paginated results, and then re-ran the query without the OFFSET and LIMIT and wrapped them in a SELECT COUNT(*) FROM main_query to get the total number of rows.

In an attempt to optimize this, we used a window function as follows:

Given a query that looked as follows:

SELECT a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;

add total_entries_count column as follows:

SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;

This calculates the total number of unpaginated rows correctly, without affecting the runtime of the query.  At least as far as I can tell.

The questions I have are:

1) Are there any adverse effects that the above window function can have?
2) Are there any cases where the count would return incorrectly?
3) In general, is this an appropriate use-case for using window functions?

Thanks,
Clemens

Re: Using window functions to get the unpaginated count for paginated queries

From
"Albe Laurenz"
Date:
Clemens Park wrote:
> Recently, during a performance improvement sweep for an application at
my company, one of the hotspots
> that was discovered was pagination.
>
> In order to display the correct pagination links on the page, the
pagination library we used (most
> pagination libraries for that matter) ran the query with OFFSET and
LIMIT to get the paginated
> results, and then re-ran the query without the OFFSET and LIMIT and
wrapped them in a SELECT COUNT(*)
> FROM main_query to get the total number of rows.
>
> In an attempt to optimize this, we used a window function as follows:
>
> Given a query that looked as follows:
>
> SELECT a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> add total_entries_count column as follows:
>
> SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
> FROM table
> WHERE clauses
> OFFSET x LIMIT y;
>
> This calculates the total number of unpaginated rows correctly,
without affecting the runtime of the
> query.  At least as far as I can tell.

It can affect the runtime considerably.

I created a 10^6 row test table and tried:

test=> EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10;
                                                    QUERY PLAN
------------------------------------------------------------------------
------------------------------------------
 Limit  (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195
rows=10 loops=1)
   ->  Seq Scan on large  (cost=0.00..14902.00 rows=1000000 width=12)
(actual time=0.028..0.114 rows=110 loops=1)
 Total runtime: 0.251 ms
(3 rows)

test=> EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS
total_entries_count FROM large OFFSET 100 LIMIT 10;
                                                          QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------
 Limit  (cost=2.74..3.01 rows=10 width=12) (actual
time=1893.606..1893.625 rows=10 loops=1)
   ->  WindowAgg  (cost=0.00..27402.00 rows=1000000 width=12) (actual
time=1893.435..1893.559 rows=110 loops=1)
         ->  Seq Scan on large  (cost=0.00..14902.00 rows=1000000
width=12) (actual time=0.025..647.182 rows=1000000 loops=1)
 Total runtime: 1915.255 ms
(4 rows)

That is because the second query will have to scan all rows, while the
first one can stop scanning after 110 rows.

> The questions I have are:
>
> 1) Are there any adverse effects that the above window function can
have?

I can only think of the performance degradation mentioned above.

> 2) Are there any cases where the count would return incorrectly?

No.

> 3) In general, is this an appropriate use-case for using window
functions?

I think it is.

Maybe you can do better if you don't retrieve the total count
of rows for every set of rows you select.

Yours,
Laurenz Albe


Re: Using window functions to get the unpaginated count for paginated queries

From
Igor Romanchenko
Date:
1) Are there any adverse effects that the above window function can have?

It can cause severe performance degradation, as mentioned before.

2) Are there any cases where the count would return incorrectly?
 
It could return incorrect result if there are some rows with  table.id = NULL . count(table_field) returns the number of rows, where table_field is not NULL.

3) In general, is this an appropriate use-case for using window functions?

It does the job => it is an appropriate use-case for using window functions.


If this query causes performance degradation and you do not need the exact count of rows, it is better to use something from http://wiki.postgresql.org/wiki/Count_estimate or google for "fast postgresql count".

Re: Using window functions to get the unpaginated count for paginated queries

From
Clemens Park
Date:
Thanks for the reply everyone.

In my case, it looks like there is no real drawback then, since what used to happen is:

SELECT a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;

followed by:

SELECT COUNT(*) FROM (
  SELECT a,b,c
  FROM table
  WHERE clauses
);
(notice the lack of OFFSET and LIMIT)

and both of them were replaced with:

SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c
FROM table
WHERE clauses
OFFSET x LIMIT y;


On Wed, Nov 14, 2012 at 5:11 AM, Igor Romanchenko <igor.a.romanchenko@gmail.com> wrote:
1) Are there any adverse effects that the above window function can have?

It can cause severe performance degradation, as mentioned before.

2) Are there any cases where the count would return incorrectly?
 
It could return incorrect result if there are some rows with  table.id = NULL . count(table_field) returns the number of rows, where table_field is not NULL.

3) In general, is this an appropriate use-case for using window functions?

It does the job => it is an appropriate use-case for using window functions.


If this query causes performance degradation and you do not need the exact count of rows, it is better to use something from http://wiki.postgresql.org/wiki/Count_estimate or google for "fast postgresql count".