Thread: Using Index-only scans to speed up count(*)

Using Index-only scans to speed up count(*)

From

Gurjeet Singh

Date:

07 June 2014, 12:35:59

While reading [1] in context of Postgres Hibernator, I see that
Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
driven by an index.

> 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index blocks

I am not sure if Postgres planner already allows this, but it would be
great if the planner considered driving a count(*) query using a
non-partial index, in the hopes that it turns into an index-only scan,
and hence returns count(*) result faster.The non-partial index may not
necessarily be the primary key index, it can be chosen purely based on
size, favouring smaller indexes.

This may alleviate some of the concerns of people migrating
applications from other DBMS' that perform count(*) in a blink of an
eye.

[1]: http://www.postgresql.org/message-id/20110507.022228.83883502.iwasaki@jp.FreeBSD.org

Best regards,

PS: Please note that I am not proposing to add support for the
optimizer hint embedded in Mitsuru's query.
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

Re: Using Index-only scans to speed up count(*)

From

Cédric Villemain

Date:

07 June 2014, 12:56:34

Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
> While reading [1] in context of Postgres Hibernator, I see that
> Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
> driven by an index.
>
> > 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index
> > blocks
> I am not sure if Postgres planner already allows this, but it would be
> great if the planner considered driving a count(*) query using a
> non-partial index, in the hopes that it turns into an index-only
> scan, and hence returns count(*) result faster.The non-partial index
> may not necessarily be the primary key index, it can be chosen purely
> based on size, favouring smaller indexes.

IIRC it is not (yet) possible to switch from index-scan to indexonly-
scan on the fly because the structure used are different (indexonly scan
needs to prepare a tuple struct to hold data, I'm not sure of the
details).
Indexonly scan is already used to answer count(*) but decision is done
during planning.

Now, it happens that this is an interesting idea which has already been
discussed if not on postgresql-hacker at least during pre/post-
conferences social events: being able to switch the plan during
execution if things are not working as expected (in the same topic you
have 'progress bar' for query execution, at least some mechanisms should
be shared by both features).

> PS: Please note that I am not proposing to add support for the
> optimizer hint embedded in Mitsuru's query.

:-)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Using Index-only scans to speed up count(*)

From

Gurjeet Singh

Date:

07 June 2014, 13:09:25

On Sat, Jun 7, 2014 at 8:56 AM, Cédric Villemain <cedric@2ndquadrant.com> wrote:
> Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
>
>> PS: Please note that I am not proposing to add support for the
>> optimizer hint embedded in Mitsuru's query.
>
> :-)

Even though I (sometimes) favor hints, and developed the optimizer
hints feature in EDB (PPAS), I know how much Postgres **hates** [1]
optimizer hints :) So just trying to wade off potential flamewar-ish
comments.

[1]: http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

Re: Using Index-only scans to speed up count(*)

From

Cédric Villemain

Date:

07 June 2014, 13:22:37

Le samedi 7 juin 2014 09:09:00 Gurjeet Singh a écrit :
> On Sat, Jun 7, 2014 at 8:56 AM, Cédric Villemain
<cedric@2ndquadrant.com> wrote:
> > Le samedi 7 juin 2014 08:35:27 Gurjeet Singh a écrit :
> >> PS: Please note that I am not proposing to add support for the
> >> optimizer hint embedded in Mitsuru's query.
> >>
> > :-)
>
> Even though I (sometimes) favor hints, and developed the optimizer
> hints feature in EDB (PPAS), I know how much Postgres **hates** [1]
> optimizer hints :) So just trying to wade off potential flamewar-ish
> comments.

There is a large benefits to users in preventing HINT in core: it makes
it mandatory for PostgreSQL to keep improving, and it makes it mandatory
for developers to find solutions around this constraint. There is at
least planner_hint contribution (search Oleg website), and added to a
postgresql hook on the parser you're done implementing HINT in userland.

I'm not arguing pro/cons about the feature (as you said the question has
been answered already) but arguing that arbitrary constraints challenge
us and produces good things for PostgreSQL in return.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Using Index-only scans to speed up count(*)

From

Noah Misch

Date:

07 June 2014, 15:04:25

On Sat, Jun 07, 2014 at 08:35:27AM -0400, Gurjeet Singh wrote:
> While reading [1] in context of Postgres Hibernator, I see that
> Mitsuru mentioned one of the ways other RDBMS allows count(*) to be
> driven by an index.
> 
> > 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load index blocks
> 
> I am not sure if Postgres planner already allows this, 

It does:

create table t (c, junk) as select *, repeat('a', 100) from generate_series(1,10000);
alter table t add primary key (c);
vacuum t;
analyze t;
explain select count(*) from t;

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com