Thread: BIGINT indexes still with problems
Hi, I have read in this list and elsewhere the problem with indexes and big int. Still, I have an index that is used or not, depending of the parameter value used in the query. I am using PostgreSQL 7.4.3 on Linux RH ES3. Here's the table: test=> \d dmaildatum Table "public.dmaildatum" Column | Type | Modifiers ----------------+-----------------------------+----------- id | bigint | not null type | integer | not null idowneruser | bigint | not null idparent | bigint | not null creationdate | timestamp without time zone | not null name | character varying(1024) | not null deletedate | timestamp without time zone | not null lastmodifydate | timestamp without time zone | ffrom | character varying(255) | fto | text | fcc | text | fbcc | text | fsubject | text | fdate | timestamp without time zone | fread | boolean | nattachments | integer | fsize | bigint | hashtml | boolean | replyto | text | pop3id | character varying(255) | nvattachments | integer | Indexes: "dmaildatum_pkey" primary key, btree (id) "ix_dmaildatum_fdate" btree (fdate) "ix_dmaildatum_idowneruser" btree (idowneruser) "ix_dmaildatum_idparent" btree (idparent) "ix_dmaildatum_pop3id" btree (pop3id) Foreign-key constraints: "fk_dmaildatum_dmailfolderdatum" FOREIGN KEY (idparent) REFERENCES dmailfold erdatum(id) ON DELETE CASCADE Now, if I run this query (note the int8 cast - also tried with the '' cast to String, same results): test=> explain select * from dmaildatum where idparent=int8(783219); QUERY PLAN ------------------------------------------------------------------ Seq Scan on dmaildatum (cost=0.00..2241.71 rows=2229 width=272) Filter: (idparent = 783219::bigint) (2 rows) The index is not used. But with an identical query, only different parameter value: desknow=> explain select * from dmaildatum where idparent=int8(1187838); QUERY PLAN -------------------------------------------------------------------------------- --------------- Index Scan using ix_dmaildatum_idparent on dmaildatum (cost=0.00..284.05 rows= 102 width=272) Index Cond: (idparent = 1187838::bigint) (2 rows) The index is used! I also did a vacuum analyze, and restarted Postgres and it did not make any difference. I tried many other ID values (ex 783218 and 783220), and they seem to use the index correctly. Only that value doesn't. Can anyone explain why Postgres behaves differently in these two cases, or at least point to some hints? Thanks in advance for all your help -- _______________________________________________ Find what you are looking for with the Lycos Yellow Pages http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp?SRC=lycos10
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dan Ruthers wrote: | Now, if I run this query (note the int8 cast - also tried with the '' cast to String, same results): | test=> explain select * from dmaildatum where idparent=int8(783219); | QUERY PLAN | ------------------------------------------------------------------ | Seq Scan on dmaildatum (cost=0.00..2241.71 rows=2229 width=272) | Filter: (idparent = 783219::bigint) | (2 rows) | | The index is not used. But with an identical query, only different parameter value: | desknow=> explain select * from dmaildatum where idparent=int8(1187838); | QUERY PLAN | | -------------------------------------------------------------------------------- | --------------- | Index Scan using ix_dmaildatum_idparent on dmaildatum (cost=0.00..284.05 rows= | 102 width=272) | Index Cond: (idparent = 1187838::bigint) | (2 rows) | | The index is used! | I also did a vacuum analyze, and restarted Postgres and it did not make any difference. | I tried many other ID values (ex 783218 and 783220), and they seem to use the index correctly. Only that value doesn't. | | Can anyone explain why Postgres behaves differently in these two cases, or at least point to some hints? Because this means that a sequential scan is better for that value. Perform this selects: (1) select count(*) from dmaildatum; (2) select count(*) from dmaildatum where idparent=int8(783219); (3) select count(*) from dmaildatum where idparent=int8(1187838); I bet that the ratio (2)/(1) is greater then (3)/(1). Now show us the following results: explain analyze select * from dmaildatum where idparent=int8(783219); explain analyze select * from dmaildatum where idparent=int8(1187838); and repeat it again but executing before: set enable_seqscan = off; Depending on the results that you get may be you need to lower the index scan cost tuning the cpu related GUC variables. Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBGSfL7UpzwH2SGd4RAgBsAKCXvs2L/XUEmSGxBzEiAHmWasgShACeLvjp 9m12DSnj2tBuGSgldr4D9Po= =KTil -----END PGP SIGNATURE-----
On Mon, 9 Aug 2004, Dan Ruthers wrote: > Now, if I run this query (note the int8 cast - also tried with the '' cast to String, same results): > test=> explain select * from dmaildatum where idparent=int8(783219); > QUERY PLAN > ------------------------------------------------------------------ > Seq Scan on dmaildatum (cost=0.00..2241.71 rows=2229 width=272) > Filter: (idparent = 783219::bigint) > (2 rows) > > The index is not used. But with an identical query, only different parameter value: > desknow=> explain select * from dmaildatum where idparent=int8(1187838); > QUERY PLAN > > -------------------------------------------------------------------------------- > --------------- > Index Scan using ix_dmaildatum_idparent on dmaildatum (cost=0.00..284.05 rows= > 102 width=272) > Index Cond: (idparent = 1187838::bigint) > (2 rows) Look at the row estimates for the two cases. How many rows are actually returned and how long the queries take (explain analyze will give that information)?
Dan Ruthers wrote: > The index is used! > I also did a vacuum analyze, and restarted Postgres and it did not > make any difference. I tried many other ID values (ex 783218 and > 783220), and they seem to use the index correctly. Only that value > doesn't. Possibly, that is the most common value and the cost calculation yields that it would be more efficient to not use the index. If you disagree, please show the timings generated by EXPLAIN ANALYZE. -- Peter Eisentraut http://developer.postgresql.org/~petere/