Thread: Query on postgresql 7.4.2 not using index

Query on postgresql 7.4.2 not using index

From
Arnau
Date:
Hi all,

   I have the following running on postgresql version 7.4.2:

CREATE SEQUENCE agenda_user_group_id_seq
MINVALUE 1
MAXVALUE 9223372036854775807
CYCLE
INCREMENT 1
START 1;

CREATE TABLE AGENDA_USERS_GROUPS
(
  AGENDA_USER_GROUP_ID  INT8
                        CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
                        DEFAULT NEXTVAL('agenda_user_group_id_seq'),
  USER_ID               NUMERIC(10)
                        CONSTRAINT fk_agenda_uid  REFERENCES
AGENDA_USERS (USER_ID)
                        ON DELETE CASCADE
                        NOT NULL,
  GROUP_ID              NUMERIC(10)
                        CONSTRAINT fk_agenda_gid  REFERENCES
AGENDA_GROUPS (GROUP_ID)
                        ON DELETE CASCADE
                        NOT NULL,
  CREATION_DATE         DATE
                        DEFAULT CURRENT_DATE,
                        CONSTRAINT un_agndusrgrp_usergroup
UNIQUE(USER_ID, GROUP_ID)
);

CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );


When I execute:

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = 9;

it does a sequential scan and doesn't use the index and I don't
understand why, any idea? I have the same in postgresql 8.1 and it uses
the index :-|

Thanks
--
Arnau

Re: Query on postgresql 7.4.2 not using index

From
"chris smith"
Date:
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
> Hi all,
>
>    I have the following running on postgresql version 7.4.2:
>
> CREATE SEQUENCE agenda_user_group_id_seq
> MINVALUE 1
> MAXVALUE 9223372036854775807
> CYCLE
> INCREMENT 1
> START 1;
>
> CREATE TABLE AGENDA_USERS_GROUPS
> (
>   AGENDA_USER_GROUP_ID  INT8
>                         CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
>                         DEFAULT NEXTVAL('agenda_user_group_id_seq'),
>   USER_ID               NUMERIC(10)
>                         CONSTRAINT fk_agenda_uid  REFERENCES
> AGENDA_USERS (USER_ID)
>                         ON DELETE CASCADE
>                         NOT NULL,
>   GROUP_ID              NUMERIC(10)
>                         CONSTRAINT fk_agenda_gid  REFERENCES
> AGENDA_GROUPS (GROUP_ID)
>                         ON DELETE CASCADE
>                         NOT NULL,
>   CREATION_DATE         DATE
>                         DEFAULT CURRENT_DATE,
>                         CONSTRAINT un_agndusrgrp_usergroup
> UNIQUE(USER_ID, GROUP_ID)
> );
>
> CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
> CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
>
>
> When I execute:
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id = 9;

Try

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id::int8 = 9;

or

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = '9';

and let us know what happens.

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Query on postgresql 7.4.2 not using index

From
Arnau
Date:
chris smith wrote:
> On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
>
>>Hi all,
>>
>>   I have the following running on postgresql version 7.4.2:
>>
>>CREATE SEQUENCE agenda_user_group_id_seq
>>MINVALUE 1
>>MAXVALUE 9223372036854775807
>>CYCLE
>>INCREMENT 1
>>START 1;
>>
>>CREATE TABLE AGENDA_USERS_GROUPS
>>(
>>  AGENDA_USER_GROUP_ID  INT8
>>                        CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
>>                        DEFAULT NEXTVAL('agenda_user_group_id_seq'),
>>  USER_ID               NUMERIC(10)
>>                        CONSTRAINT fk_agenda_uid  REFERENCES
>>AGENDA_USERS (USER_ID)
>>                        ON DELETE CASCADE
>>                        NOT NULL,
>>  GROUP_ID              NUMERIC(10)
>>                        CONSTRAINT fk_agenda_gid  REFERENCES
>>AGENDA_GROUPS (GROUP_ID)
>>                        ON DELETE CASCADE
>>                        NOT NULL,
>>  CREATION_DATE         DATE
>>                        DEFAULT CURRENT_DATE,
>>                        CONSTRAINT un_agndusrgrp_usergroup
>>UNIQUE(USER_ID, GROUP_ID)
>>);
>>
>>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
>>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
>>
>>
>>When I execute:
>>
>>EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
>>WHERE group_id = 9;
>
>
> Try
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id::int8 = 9;
>
> or
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id = '9';
>
> and let us know what happens.
>


  The same, the table has 2547556 entries:

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups
espsm_moviltelevision-# WHERE group_id::int8 = 9;
                                                            QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on agenda_users_groups  (cost=0.00..59477.34 rows=12738
width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)
    Filter: ((group_id)::bigint = 9)
  Total runtime: 13452.114 ms
(3 filas)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups
espsm_moviltelevision-# WHERE group_id = '9';
                                                            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
    Filter: (group_id = 9::numeric)
  Total runtime: 7259.861 ms
(3 filas)

espsm_moviltelevision=# select count(*) from agenda_users_groups ;
   count
---------
  2547556


Thanks
--
Arnau

Re: Query on postgresql 7.4.2 not using index

From
"Guillaume Smet"
Date:
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups
> espsm_moviltelevision-# WHERE group_id = '9';
>                                                             QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
> width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
>     Filter: (group_id = 9::numeric)
>   Total runtime: 7259.861 ms
> (3 filas)

Arnau,

Why do you use a numeric instead of an integer/bigint??

IIRC, there were a few problems with index on numeric column on older
version of PostgreSQL.

You can't change the type of a column with 7.4, so create a new
integer column then copy the values in this new column, drop the old
one, rename the new one. Run vacuum analyze and recreate your index.

It should work far better with an int.

Note that you will have to update all the tables referencing this key...

--
Guillaume

Re: Query on postgresql 7.4.2 not using index

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 08:49, Arnau wrote:
> chris smith wrote:
> > On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
> >
> >>Hi all,
> >>
> >>   I have the following running on postgresql version 7.4.2:
> >>
> >>CREATE SEQUENCE agenda_user_group_id_seq
> >>MINVALUE 1
> >>MAXVALUE 9223372036854775807
> >>CYCLE
> >>INCREMENT 1
> >>START 1;
> >>
> >>CREATE TABLE AGENDA_USERS_GROUPS
> >>(
> >>  AGENDA_USER_GROUP_ID  INT8
> >>                        CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
> >>                        DEFAULT NEXTVAL('agenda_user_group_id_seq'),
> >>  USER_ID               NUMERIC(10)
> >>                        CONSTRAINT fk_agenda_uid  REFERENCES
> >>AGENDA_USERS (USER_ID)
> >>                        ON DELETE CASCADE
> >>                        NOT NULL,
> >>  GROUP_ID              NUMERIC(10)
> >>                        CONSTRAINT fk_agenda_gid  REFERENCES
> >>AGENDA_GROUPS (GROUP_ID)
> >>                        ON DELETE CASCADE
> >>                        NOT NULL,
> >>  CREATION_DATE         DATE
> >>                        DEFAULT CURRENT_DATE,
> >>                        CONSTRAINT un_agndusrgrp_usergroup
> >>UNIQUE(USER_ID, GROUP_ID)
> >>);
> >>
> >>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
> >>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );

SNIP

>   The same, the table has 2547556 entries:
>
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups
> espsm_moviltelevision-# WHERE group_id::int8 = 9;
>                                                             QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on agenda_users_groups  (cost=0.00..59477.34 rows=12738
> width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)
>     Filter: ((group_id)::bigint = 9)
>   Total runtime: 13452.114 ms
> (3 filas)
>
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups
> espsm_moviltelevision-# WHERE group_id = '9';
>                                                             QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
> width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
>     Filter: (group_id = 9::numeric)
>   Total runtime: 7259.861 ms
> (3 filas)
>
> espsm_moviltelevision=# select count(*) from agenda_users_groups ;
>    count
> ---------
>   2547556

OK, a few points.

1:  7.4.2 is WAY out of date for the 7.4 series.  The 7.4 series, also
it a bit out of date, and many issues in terms of performance have been
enhanced in the 8.x series.  You absolutely should update to the latest
7.4 series, as there are known data loss bugs and other issues in the
7.4.2 version.

2:  An index scan isn't always faster.  In this instance, it looks like
the number of rows that match in the last version of your query is well
over 10% of the rows.  Assuming your average row takes up <10% or so of
a block, which is pretty common, then you're going to have to hit almost
every block anyway to get your data.  So, an index scan is no win.

3:  To test whether or not an index scan IS a win, you can use the
enable_xxx settings to prove it to yourself:

set enable_seqscan = off;
explain analyze <your query here>;

and compare.  Note that the enable_seqscan = off thing is a sledge
hammer, not a nudge, and generally should NOT be used in production.  If
an index scan is generally a win for you, but the database isn't using
it, you might need to tune the database for your machine.  note that you
should NOT tune your database based on a single query.  You'll need to
reach a compromise on your settings that makes all your queries run
reasonably fast without the planner making insane decisions.  One of the
better postgresql tuning docs out there is the one at:
http://www.varlena.com/GeneralBits/Tidbits/perf.html .

Good luck.

Re: Query on postgresql 7.4.2 not using index

From
Tom Lane
Date:
Arnau <arnaulist@andromeiberica.com> writes:

>   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
> width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
>     Filter: (group_id = 9::numeric)
>   Total runtime: 7259.861 ms
> (3 filas)

> espsm_moviltelevision=# select count(*) from agenda_users_groups ;
>    count
> ---------
>   2547556

So the SELECT is fetching nearly 15% of the rows in the table.  The
planner is doing *the right thing* to use a seqscan, at least for
this particular group_id value.

            regards, tom lane

Re: Query on postgresql 7.4.2 not using index

From
Arnau
Date:
Tom Lane wrote:
> Arnau <arnaulist@andromeiberica.com> writes:
>
>
>>  Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
>>width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
>>    Filter: (group_id = 9::numeric)
>>  Total runtime: 7259.861 ms
>>(3 filas)
>
>
>>espsm_moviltelevision=# select count(*) from agenda_users_groups ;
>>   count
>>---------
>>  2547556
>
>
> So the SELECT is fetching nearly 15% of the rows in the table.  The
> planner is doing *the right thing* to use a seqscan, at least for
> this particular group_id value.


I have done the same tests on 8.1.0.


espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = 9;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78
rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
    Recheck Cond: (group_id = 9::numeric)
    ->  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26
rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
          Index Cond: (group_id = 9::numeric)
  Total runtime: 1004.966 ms
(5 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id::int8 = 9;
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777
width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
    Filter: ((group_id)::bigint = 9)
  Total runtime: 2571.496 ms
(3 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id::int8 = '9';
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777
width=8) (actual time=407.193..2182.880 rows=367026 loops=1)
    Filter: ((group_id)::bigint = 9::bigint)
  Total runtime: 2506.998 ms
(3 rows)

espsm_moviltelevision=# select count(*) from agenda_users_groups ;
   count
---------
  2555437
(1 row)


   Postgresql then uses the index, I don't understand why? in this
server I tried to tune the configuration, it's because of the tuning?
Because it's a newer version of postgresql?


Thanks for all the replies
--
Arnau

Re: Query on postgresql 7.4.2 not using index

From
Scott Marlowe
Date:
On Tue, 2006-04-25 at 10:47, Arnau wrote:
> Tom Lane wrote:
> > Arnau <arnaulist@andromeiberica.com> writes:
> >
> >
> >>espsm_moviltelevision=# select count(*) from agenda_users_groups ;
> >>   count
> >>---------
> >>  2547556
> >
> >
> > So the SELECT is fetching nearly 15% of the rows in the table.  The
> > planner is doing *the right thing* to use a seqscan, at least for
> > this particular group_id value.
>
>
> I have done the same tests on 8.1.0.
>
>
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups WHERE group_id = 9;
>                                                             QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78
> rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
>     Recheck Cond: (group_id = 9::numeric)
>     ->  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26
> rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
>           Index Cond: (group_id = 9::numeric)
>   Total runtime: 1004.966 ms
> (5 rows)

How big are these individual records?  I'm guessing a fairly good size,
since an index scan is winning.

> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups WHERE group_id::int8 = 9;
>                                                            QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777
> width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
>     Filter: ((group_id)::bigint = 9)
>   Total runtime: 2571.496 ms
> (3 rows)

OK.  Stop and think about what you're telling postgresql to do here.

You're telling it to cast the field group_id to int8, then compare it to
9.  How can it cast the group_id to int8 without fetching it?  That's
right, you're ensuring a seq scan.  You need to put the int8 cast on the
other side of that equality comparison, like:

where group_id = 9::int8



Re: Query on postgresql 7.4.2 not using index

From
Tom Lane
Date:
Arnau <arnaulist@andromeiberica.com> writes:
> I have done the same tests on 8.1.0.

Bitmap scans are a totally different animal that doesn't exist in 7.4.
A plain indexscan, such as 7.4 knows about, is generally not effective
for fetching more than a percent or two of the table.  The crossover
point for a bitmap scan is much higher (don't know exactly, but probably
something like 30-50%).

            regards, tom lane

Re: Query on postgresql 7.4.2 not using index

From
Arnau
Date:
>>I have done the same tests on 8.1.0.
>>
>>
>>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
>>agenda_users_groups WHERE group_id = 9;
>>                                                            QUERY PLAN

>>----------------------------------------------------------------------------------------------------------------------------------------------
>>  Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78
>>rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
>>    Recheck Cond: (group_id = 9::numeric)
>>    ->  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26
>>rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
>>          Index Cond: (group_id = 9::numeric)
>>  Total runtime: 1004.966 ms
>>(5 rows)
>
>
> How big are these individual records?  I'm guessing a fairly good size,
> since an index scan is winning.

   How I could know the size on an individual record?

>
>
>>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
>>agenda_users_groups WHERE group_id::int8 = 9;
>>                                                           QUERY PLAN

>>-------------------------------------------------------------------------------------------------------------------------------
>>  Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777
>>width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
>>    Filter: ((group_id)::bigint = 9)
>>  Total runtime: 2571.496 ms
>>(3 rows)
>
>
> OK.  Stop and think about what you're telling postgresql to do here.
>
> You're telling it to cast the field group_id to int8, then compare it to
> 9.  How can it cast the group_id to int8 without fetching it?  That's
> right, you're ensuring a seq scan.  You need to put the int8 cast on the
> other side of that equality comparison, like:
>
> where group_id = 9::int8

   I just did what Chris Smith asked me to do :), here I paste the
results I get when I change the cast.

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = 9::int8;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on agenda_users_groups  (cost=2722.33..30343.06
rows=400379 width=8) (actual time=147.723..714.473 rows=367026 loops=1)
    Recheck Cond: (group_id = 9::numeric)
    ->  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.33
rows=400379 width=0) (actual time=145.015..145.015 rows=367026 loops=1)
          Index Cond: (group_id = 9::numeric)
  Total runtime: 1038.537 ms
(5 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = '9'::int8;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on agenda_users_groups  (cost=2722.33..30343.06
rows=400379 width=8) (actual time=153.858..1192.838 rows=367026 loops=1)
    Recheck Cond: (group_id = 9::numeric)
    ->  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.33
rows=400379 width=0) (actual time=151.298..151.298 rows=367026 loops=1)
          Index Cond: (group_id = 9::numeric)
  Total runtime: 1527.039 ms
(5 rows)


Thanks
--
Arnau

Re: Query on postgresql 7.4.2 not using index

From
"chris smith"
Date:
> OK.  Stop and think about what you're telling postgresql to do here.
>
> You're telling it to cast the field group_id to int8, then compare it to
> 9.  How can it cast the group_id to int8 without fetching it?  That's
> right, you're ensuring a seq scan.  You need to put the int8 cast on the
> other side of that equality comparison, like:

Yeh that one was my fault :) I couldn't remember which way it went and
if 7.4.x had issues with int8 indexes..

--
Postgresql & php tutorials
http://www.designmagick.com/