Thread: Query on postgresql 7.4.2 not using index
Hi all, I have the following running on postgresql version 7.4.2: CREATE SEQUENCE agenda_user_group_id_seq MINVALUE 1 MAXVALUE 9223372036854775807 CYCLE INCREMENT 1 START 1; CREATE TABLE AGENDA_USERS_GROUPS ( AGENDA_USER_GROUP_ID INT8 CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY DEFAULT NEXTVAL('agenda_user_group_id_seq'), USER_ID NUMERIC(10) CONSTRAINT fk_agenda_uid REFERENCES AGENDA_USERS (USER_ID) ON DELETE CASCADE NOT NULL, GROUP_ID NUMERIC(10) CONSTRAINT fk_agenda_gid REFERENCES AGENDA_GROUPS (GROUP_ID) ON DELETE CASCADE NOT NULL, CREATION_DATE DATE DEFAULT CURRENT_DATE, CONSTRAINT un_agndusrgrp_usergroup UNIQUE(USER_ID, GROUP_ID) ); CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); When I execute: EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; it does a sequential scan and doesn't use the index and I don't understand why, any idea? I have the same in postgresql 8.1 and it uses the index :-| Thanks -- Arnau
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote: > Hi all, > > I have the following running on postgresql version 7.4.2: > > CREATE SEQUENCE agenda_user_group_id_seq > MINVALUE 1 > MAXVALUE 9223372036854775807 > CYCLE > INCREMENT 1 > START 1; > > CREATE TABLE AGENDA_USERS_GROUPS > ( > AGENDA_USER_GROUP_ID INT8 > CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY > DEFAULT NEXTVAL('agenda_user_group_id_seq'), > USER_ID NUMERIC(10) > CONSTRAINT fk_agenda_uid REFERENCES > AGENDA_USERS (USER_ID) > ON DELETE CASCADE > NOT NULL, > GROUP_ID NUMERIC(10) > CONSTRAINT fk_agenda_gid REFERENCES > AGENDA_GROUPS (GROUP_ID) > ON DELETE CASCADE > NOT NULL, > CREATION_DATE DATE > DEFAULT CURRENT_DATE, > CONSTRAINT un_agndusrgrp_usergroup > UNIQUE(USER_ID, GROUP_ID) > ); > > CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); > CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); > > > When I execute: > > EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups > WHERE group_id = 9; Try EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = 9; or EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = '9'; and let us know what happens. -- Postgresql & php tutorials http://www.designmagick.com/
chris smith wrote: > On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote: > >>Hi all, >> >> I have the following running on postgresql version 7.4.2: >> >>CREATE SEQUENCE agenda_user_group_id_seq >>MINVALUE 1 >>MAXVALUE 9223372036854775807 >>CYCLE >>INCREMENT 1 >>START 1; >> >>CREATE TABLE AGENDA_USERS_GROUPS >>( >> AGENDA_USER_GROUP_ID INT8 >> CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY >> DEFAULT NEXTVAL('agenda_user_group_id_seq'), >> USER_ID NUMERIC(10) >> CONSTRAINT fk_agenda_uid REFERENCES >>AGENDA_USERS (USER_ID) >> ON DELETE CASCADE >> NOT NULL, >> GROUP_ID NUMERIC(10) >> CONSTRAINT fk_agenda_gid REFERENCES >>AGENDA_GROUPS (GROUP_ID) >> ON DELETE CASCADE >> NOT NULL, >> CREATION_DATE DATE >> DEFAULT CURRENT_DATE, >> CONSTRAINT un_agndusrgrp_usergroup >>UNIQUE(USER_ID, GROUP_ID) >>); >> >>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); >>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); >> >> >>When I execute: >> >>EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups >>WHERE group_id = 9; > > > Try > > EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups > WHERE group_id::int8 = 9; > > or > > EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups > WHERE group_id = '9'; > > and let us know what happens. > The same, the table has 2547556 entries: espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups espsm_moviltelevision-# WHERE group_id::int8 = 9; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Seq Scan on agenda_users_groups (cost=0.00..59477.34 rows=12738 width=8) (actual time=3409.541..11818.794 rows=367026 loops=1) Filter: ((group_id)::bigint = 9) Total runtime: 13452.114 ms (3 filas) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups espsm_moviltelevision-# WHERE group_id = '9'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 width=8) (actual time=916.903..5763.830 rows=367026 loops=1) Filter: (group_id = 9::numeric) Total runtime: 7259.861 ms (3 filas) espsm_moviltelevision=# select count(*) from agenda_users_groups ; count --------- 2547556 Thanks -- Arnau
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote: > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups > espsm_moviltelevision-# WHERE group_id = '9'; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 > width=8) (actual time=916.903..5763.830 rows=367026 loops=1) > Filter: (group_id = 9::numeric) > Total runtime: 7259.861 ms > (3 filas) Arnau, Why do you use a numeric instead of an integer/bigint?? IIRC, there were a few problems with index on numeric column on older version of PostgreSQL. You can't change the type of a column with 7.4, so create a new integer column then copy the values in this new column, drop the old one, rename the new one. Run vacuum analyze and recreate your index. It should work far better with an int. Note that you will have to update all the tables referencing this key... -- Guillaume
On Tue, 2006-04-25 at 08:49, Arnau wrote: > chris smith wrote: > > On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote: > > > >>Hi all, > >> > >> I have the following running on postgresql version 7.4.2: > >> > >>CREATE SEQUENCE agenda_user_group_id_seq > >>MINVALUE 1 > >>MAXVALUE 9223372036854775807 > >>CYCLE > >>INCREMENT 1 > >>START 1; > >> > >>CREATE TABLE AGENDA_USERS_GROUPS > >>( > >> AGENDA_USER_GROUP_ID INT8 > >> CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY > >> DEFAULT NEXTVAL('agenda_user_group_id_seq'), > >> USER_ID NUMERIC(10) > >> CONSTRAINT fk_agenda_uid REFERENCES > >>AGENDA_USERS (USER_ID) > >> ON DELETE CASCADE > >> NOT NULL, > >> GROUP_ID NUMERIC(10) > >> CONSTRAINT fk_agenda_gid REFERENCES > >>AGENDA_GROUPS (GROUP_ID) > >> ON DELETE CASCADE > >> NOT NULL, > >> CREATION_DATE DATE > >> DEFAULT CURRENT_DATE, > >> CONSTRAINT un_agndusrgrp_usergroup > >>UNIQUE(USER_ID, GROUP_ID) > >>); > >> > >>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID ); > >>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID ); SNIP > The same, the table has 2547556 entries: > > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups > espsm_moviltelevision-# WHERE group_id::int8 = 9; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..59477.34 rows=12738 > width=8) (actual time=3409.541..11818.794 rows=367026 loops=1) > Filter: ((group_id)::bigint = 9) > Total runtime: 13452.114 ms > (3 filas) > > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups > espsm_moviltelevision-# WHERE group_id = '9'; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 > width=8) (actual time=916.903..5763.830 rows=367026 loops=1) > Filter: (group_id = 9::numeric) > Total runtime: 7259.861 ms > (3 filas) > > espsm_moviltelevision=# select count(*) from agenda_users_groups ; > count > --------- > 2547556 OK, a few points. 1: 7.4.2 is WAY out of date for the 7.4 series. The 7.4 series, also it a bit out of date, and many issues in terms of performance have been enhanced in the 8.x series. You absolutely should update to the latest 7.4 series, as there are known data loss bugs and other issues in the 7.4.2 version. 2: An index scan isn't always faster. In this instance, it looks like the number of rows that match in the last version of your query is well over 10% of the rows. Assuming your average row takes up <10% or so of a block, which is pretty common, then you're going to have to hit almost every block anyway to get your data. So, an index scan is no win. 3: To test whether or not an index scan IS a win, you can use the enable_xxx settings to prove it to yourself: set enable_seqscan = off; explain analyze <your query here>; and compare. Note that the enable_seqscan = off thing is a sledge hammer, not a nudge, and generally should NOT be used in production. If an index scan is generally a win for you, but the database isn't using it, you might need to tune the database for your machine. note that you should NOT tune your database based on a single query. You'll need to reach a compromise on your settings that makes all your queries run reasonably fast without the planner making insane decisions. One of the better postgresql tuning docs out there is the one at: http://www.varlena.com/GeneralBits/Tidbits/perf.html . Good luck.
Arnau <arnaulist@andromeiberica.com> writes: > Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 > width=8) (actual time=916.903..5763.830 rows=367026 loops=1) > Filter: (group_id = 9::numeric) > Total runtime: 7259.861 ms > (3 filas) > espsm_moviltelevision=# select count(*) from agenda_users_groups ; > count > --------- > 2547556 So the SELECT is fetching nearly 15% of the rows in the table. The planner is doing *the right thing* to use a seqscan, at least for this particular group_id value. regards, tom lane
Tom Lane wrote: > Arnau <arnaulist@andromeiberica.com> writes: > > >> Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 >>width=8) (actual time=916.903..5763.830 rows=367026 loops=1) >> Filter: (group_id = 9::numeric) >> Total runtime: 7259.861 ms >>(3 filas) > > >>espsm_moviltelevision=# select count(*) from agenda_users_groups ; >> count >>--------- >> 2547556 > > > So the SELECT is fetching nearly 15% of the rows in the table. The > planner is doing *the right thing* to use a seqscan, at least for > this particular group_id value. I have done the same tests on 8.1.0. espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1004.966 ms (5 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = 9; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 width=8) (actual time=457.963..2244.928 rows=367026 loops=1) Filter: ((group_id)::bigint = 9) Total runtime: 2571.496 ms (3 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id::int8 = '9'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 width=8) (actual time=407.193..2182.880 rows=367026 loops=1) Filter: ((group_id)::bigint = 9::bigint) Total runtime: 2506.998 ms (3 rows) espsm_moviltelevision=# select count(*) from agenda_users_groups ; count --------- 2555437 (1 row) Postgresql then uses the index, I don't understand why? in this server I tried to tune the configuration, it's because of the tuning? Because it's a newer version of postgresql? Thanks for all the replies -- Arnau
On Tue, 2006-04-25 at 10:47, Arnau wrote: > Tom Lane wrote: > > Arnau <arnaulist@andromeiberica.com> writes: > > > > > >>espsm_moviltelevision=# select count(*) from agenda_users_groups ; > >> count > >>--------- > >> 2547556 > > > > > > So the SELECT is fetching nearly 15% of the rows in the table. The > > planner is doing *the right thing* to use a seqscan, at least for > > this particular group_id value. > > > I have done the same tests on 8.1.0. > > > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups WHERE group_id = 9; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 > rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) > Recheck Cond: (group_id = 9::numeric) > -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 > rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) > Index Cond: (group_id = 9::numeric) > Total runtime: 1004.966 ms > (5 rows) How big are these individual records? I'm guessing a fairly good size, since an index scan is winning. > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups WHERE group_id::int8 = 9; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 > width=8) (actual time=457.963..2244.928 rows=367026 loops=1) > Filter: ((group_id)::bigint = 9) > Total runtime: 2571.496 ms > (3 rows) OK. Stop and think about what you're telling postgresql to do here. You're telling it to cast the field group_id to int8, then compare it to 9. How can it cast the group_id to int8 without fetching it? That's right, you're ensuring a seq scan. You need to put the int8 cast on the other side of that equality comparison, like: where group_id = 9::int8
Arnau <arnaulist@andromeiberica.com> writes: > I have done the same tests on 8.1.0. Bitmap scans are a totally different animal that doesn't exist in 7.4. A plain indexscan, such as 7.4 knows about, is generally not effective for fetching more than a percent or two of the table. The crossover point for a bitmap scan is much higher (don't know exactly, but probably something like 30-50%). regards, tom lane
>>I have done the same tests on 8.1.0. >> >> >>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM >>agenda_users_groups WHERE group_id = 9; >> QUERY PLAN >>---------------------------------------------------------------------------------------------------------------------------------------------- >> Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 >>rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) >> Recheck Cond: (group_id = 9::numeric) >> -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 >>rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) >> Index Cond: (group_id = 9::numeric) >> Total runtime: 1004.966 ms >>(5 rows) > > > How big are these individual records? I'm guessing a fairly good size, > since an index scan is winning. How I could know the size on an individual record? > > >>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM >>agenda_users_groups WHERE group_id::int8 = 9; >> QUERY PLAN >>------------------------------------------------------------------------------------------------------------------------------- >> Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 >>width=8) (actual time=457.963..2244.928 rows=367026 loops=1) >> Filter: ((group_id)::bigint = 9) >> Total runtime: 2571.496 ms >>(3 rows) > > > OK. Stop and think about what you're telling postgresql to do here. > > You're telling it to cast the field group_id to int8, then compare it to > 9. How can it cast the group_id to int8 without fetching it? That's > right, you're ensuring a seq scan. You need to put the int8 cast on the > other side of that equality comparison, like: > > where group_id = 9::int8 I just did what Chris Smith asked me to do :), here I paste the results I get when I change the cast. espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = 9::int8; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06 rows=400379 width=8) (actual time=147.723..714.473 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33 rows=400379 width=0) (actual time=145.015..145.015 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1038.537 ms (5 rows) espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups WHERE group_id = '9'::int8; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06 rows=400379 width=8) (actual time=153.858..1192.838 rows=367026 loops=1) Recheck Cond: (group_id = 9::numeric) -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33 rows=400379 width=0) (actual time=151.298..151.298 rows=367026 loops=1) Index Cond: (group_id = 9::numeric) Total runtime: 1527.039 ms (5 rows) Thanks -- Arnau
> OK. Stop and think about what you're telling postgresql to do here. > > You're telling it to cast the field group_id to int8, then compare it to > 9. How can it cast the group_id to int8 without fetching it? That's > right, you're ensuring a seq scan. You need to put the int8 cast on the > other side of that equality comparison, like: Yeh that one was my fault :) I couldn't remember which way it went and if 7.4.x had issues with int8 indexes.. -- Postgresql & php tutorials http://www.designmagick.com/