Thread: Pagination count strategies

Pagination count strategies

From
Leonardo M. Ramé
Date:
Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



Re: Pagination count strategies

From
Andrew Sullivan
Date:
On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ramé wrote:
>
> What strategy for showing the total number of records returned do you
> recommend?.

The best answer for this I've ever seen is to limit the number of rows
you're counting (at least at first) to some reasonably small number --
say 5000.  This is usually reasonably fast for a well-indexed query,
and your pagination can say something like "First n of at least 5000
results", unless you have fewer than 5000 results, in which case you
know the number (and the count returned quickly anyway).  As you're
displaying those first 5000 results, you can work in the background
getting a more accurate number.  This is more work for your
application, but it provides a much better user experience (and you
can delay getting the detailed number until the user pages through to
the second page of results, so you don't count everything needlessly
in case the user just uses the first page, which IME happens a lot).
Note that even Google doesn't give you an accurate number -- they just
say "about ten trillion" or whatever.

Hope that's useful,

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Pagination count strategies

From
Alban Hertroys
Date:
On 3 April 2014 15:34, Leonardo M. Ramé <l.rame@griensu.com> wrote:
> Hi, in one of our systems, we added a kind of pagination feature, that
> shows N records of Total records.
>
> To do this, we added a "count(*) over() as Total" field in our queries
> in replacement of doing two queries, one for fetching the records, and
> other for getting the count. This improved the performance, but we are't
> happy with the results yet, by removing the count, the query takes
> 200ms vs 2000ms with it.
>
> We are thinking of removing the count, but if we do that, the system
> will lack an interesting feature.
>
> What strategy for showing the total number of records returned do you
> recommend?.

Assuming your results are unique (and what would be the point of
showing duplicate results?) in a specific order, it should be possible
to (uniquely) identify the last record shown on a previous page and
display n results from there on.

To add a result counter for displaying purposes to that, since you
need to remember the last displayed result anyway, you might as well
store the relative record number of that result with it and count from
there on.

This works well if you only have a next/previous results link, but not
so well when people can pick arbitrary page numbers. It can work
relative to the current page regardless of which page the user chose
to navigate to next, but you'll have to go through all the results in
between that page and the current page... That should still be faster
than always counting from the start though (and you can be smart about
from which end you start counting).

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Pagination count strategies

From
Dorian Hoxha
Date:
Cache the total ?


On Thu, Apr 3, 2014 at 3:34 PM, Leonardo M. Ramé <l.rame@griensu.com> wrote:
Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Pagination count strategies

From
Torsten Förtsch
Date:
On 03/04/14 15:34, Leonardo M. Ramé wrote:
> Hi, in one of our systems, we added a kind of pagination feature, that
> shows N records of Total records.
>
> To do this, we added a "count(*) over() as Total" field in our queries
> in replacement of doing two queries, one for fetching the records, and
> other for getting the count. This improved the performance, but we are't
> happy with the results yet, by removing the count, the query takes
> 200ms vs 2000ms with it.
>
> We are thinking of removing the count, but if we do that, the system
> will lack an interesting feature.
>
> What strategy for showing the total number of records returned do you
> recommend?.

If you need only an estimated number and if your planner statistics are
up to date, you can use the planner.

Here is my implementation of the explain function. The COMMENT below
shows how to use it:

CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
RETURNS JSON AS $$
    DECLARE
        tmp TEXT;
    BEGIN
        EXECUTE 'EXPLAIN ('
             || array_to_string(array_append($1[2:array_upper($1, 1)],
'FORMAT JSON'), ', ')
             || ') '
             || $1[1] INTO tmp;
        RETURN tmp::JSON;
    END;
$$ LANGUAGE plpgsql VOLATILE;

COMMENT ON FUNCTION explain(VARIADIC TEXT[])
IS $def$
This function is a SQL interface to the planner. It returns the plan
(result of EXPLAIN) of the query passed as TEXT string as the first
parameter as JSON object.

The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
BUFFERS.

The function can be used to store plans in the database.

Another interesting usage is when you need only an estimated row
count for a query. You can use

  SELECT count(*) ...

This gives you an exact number but is usually slow. If your planner
statistics are up to date and the query is not too complicated, the
planner usually gives a good estimate and is much faster.

  SELECT explain('SELECT 1 FROM tb WHERE id>80000000')
           ->0->'Plan'->'Plan Rows';
$def$;


Torsten


Re: Pagination count strategies

From
Leonardo M. Ramé
Date:
On 2014-04-03 10:00:18 -0400, Andrew Sullivan wrote:
> On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ramé wrote:
> >
> > What strategy for showing the total number of records returned do you
> > recommend?.
>
> The best answer for this I've ever seen is to limit the number of rows
> you're counting (at least at first) to some reasonably small number --
> say 5000.  This is usually reasonably fast for a well-indexed query,
> and your pagination can say something like "First n of at least 5000
> results", unless you have fewer than 5000 results, in which case you
> know the number (and the count returned quickly anyway).  As you're
> displaying those first 5000 results, you can work in the background
> getting a more accurate number.  This is more work for your
> application, but it provides a much better user experience (and you
> can delay getting the detailed number until the user pages through to
> the second page of results, so you don't count everything needlessly
> in case the user just uses the first page, which IME happens a lot).
> Note that even Google doesn't give you an accurate number -- they just
> say "about ten trillion" or whatever.
>
> Hope that's useful,
>
> A
>

Sounds nice, is it possible to modify my "count(*) over()" to what you
suggest?.

--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



Re: Pagination count strategies

From
Leonardo M. Ramé
Date:
On 2014-04-03 17:19:56 +0200, Torsten Förtsch wrote:
> On 03/04/14 15:34, Leonardo M. Ramé wrote:
> > Hi, in one of our systems, we added a kind of pagination feature, that
> > shows N records of Total records.
> >
> > To do this, we added a "count(*) over() as Total" field in our queries
> > in replacement of doing two queries, one for fetching the records, and
> > other for getting the count. This improved the performance, but we are't
> > happy with the results yet, by removing the count, the query takes
> > 200ms vs 2000ms with it.
> >
> > We are thinking of removing the count, but if we do that, the system
> > will lack an interesting feature.
> >
> > What strategy for showing the total number of records returned do you
> > recommend?.
>
> If you need only an estimated number and if your planner statistics are
> up to date, you can use the planner.
>
> Here is my implementation of the explain function. The COMMENT below
> shows how to use it:
>
> CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
> RETURNS JSON AS $$
>     DECLARE
>         tmp TEXT;
>     BEGIN
>         EXECUTE 'EXPLAIN ('
>              || array_to_string(array_append($1[2:array_upper($1, 1)],
> 'FORMAT JSON'), ', ')
>              || ') '
>              || $1[1] INTO tmp;
>         RETURN tmp::JSON;
>     END;
> $$ LANGUAGE plpgsql VOLATILE;
>
> COMMENT ON FUNCTION explain(VARIADIC TEXT[])
> IS $def$
> This function is a SQL interface to the planner. It returns the plan
> (result of EXPLAIN) of the query passed as TEXT string as the first
> parameter as JSON object.
>
> The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
> BUFFERS.
>
> The function can be used to store plans in the database.
>
> Another interesting usage is when you need only an estimated row
> count for a query. You can use
>
>   SELECT count(*) ...
>
> This gives you an exact number but is usually slow. If your planner
> statistics are up to date and the query is not too complicated, the
> planner usually gives a good estimate and is much faster.
>
>   SELECT explain('SELECT 1 FROM tb WHERE id>80000000')
>            ->0->'Plan'->'Plan Rows';
> $def$;
>
>
> Torsten

Nice!, do you know if this will work on 8.4?.

--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



Re: Pagination count strategies

From
Andrew Sullivan
Date:
On Thu, Apr 03, 2014 at 12:44:23PM -0300, Leonardo M. Ramé wrote:
> Sounds nice, is it possible to modify my "count(*) over()" to what you
> suggest?.

I think the window_definition inside over() can contain a LIMIT, can't
it?  I didn't check just now, but I can't think any reason why not.
(ISTR when I did this in the past we didn't have window functions, so
I simulated it another way.)

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Pagination count strategies

From
Torsten Förtsch
Date:
On 03/04/14 17:44, Leonardo M. Ramé wrote:
> Nice!, do you know if this will work on 8.4?.
>
no way