index not used for bigint without explicit cast - Mailing list pgsql-bugs

From Dave E Martin XXIII
Subject index not used for bigint without explicit cast
Date
Msg-id 3ED2E46C.2060507@dave.to
Whole thread Raw
Responses Re: index not used for bigint without explicit cast
List pgsql-bugs
I could have swore there was a searchable bug database, but I can't seem
to find it, Therefore i'm assuming this is now the place.

On to the real issue:
 PostgreSQL 7.3.2 on i386-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3

I have a table with a bigint as its primary key. Alas, the optimizer
seems to always think a seq_scan is more efficient than an index scan on
this 170106 entry table (even after analyzing/vacuuming). This is
causing an application to take unacceptably long (it must perform
several of these insertions per second) to insert new entries into this
and another table, as the resulting referential integrity checks seem to
be taking relatively forever. Further experimention has revealed that an
index scan *will* be performed if the value in question is explicitly
converted to a bigint first. The JDBC driver (which i'm using) does not
do this however, and just puts the raw number (which I assume is correct
behaviour on its part).

I tried my application with both bigint primary keys and integer primary
keys (I did a dump and restore, massaging the datatype between, so the
application would be working with the same database)

With bigint primary keys:
processed 30 in 8233ms
processed 30 in 8482ms
processed 30 in 1748ms
processed 30 in 5072ms
processed 30 in 6142ms
processed 30 in 4529ms

With integer primary keys:
processed 30 in 1364ms
processed 30 in 1279ms
processed 30 in 704ms
processed 30 in 791ms
processed 30 in 707ms
processed 30 in 663ms
processed 30 in 753ms
processed 30 in 540ms

Unfortunately, for this application, it really needs bigint primary keys.

Below are the output from assorted analyze query experiements:

select count(*) from connection_data;
 count
--------
 170106
(1 row)

explain analyze select * from connection_data where id=23457;
                                                  QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on connection_data  (cost=0.00..3716.32 rows=1 width=86)
(actual time=51.52..529.14 rows=1 loops=1)
   Filter: (id = 23457)
 Total runtime: 529.23 msec
(3 rows)

explain analyze select * from connection_data where id=cast (23457 as
bigint);
                                                             QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using connection_data_pkey on connection_data
(cost=0.00..3.02 rows=1 width=86) (actual time=0.41..0.42 rows=1 loops=1)
   Index Cond: (id = 23457::bigint)
 Total runtime: 0.49 msec
(3 rows)

select version();
                                version
------------------------------------------------------------------------
 PostgreSQL 7.3.2 on i386-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.3
(1 row)

Also possibly of interest:

explain analyze select * from connection_data where id=234570000000000;
                                                              QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using connection_data_pkey on connection_data
(cost=0.00..3.02 rows=1 width=86) (actual time=44.60..44.60 rows=0 loops=1)
   Index Cond: (id = 234570000000000::bigint)
 Total runtime: 44.70 msec
(3 rows)

netflow=# explain analyze select * from connection_data where id=23457000;
                                                  QUERY
PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on connection_data  (cost=0.00..3716.32 rows=1 width=86)
(actual time=531.56..531.56 rows=0 loops=1)
   Filter: (id = 23457000)
 Total runtime: 531.64 msec
(3 rows)

Alas, even the following doesn't help:

create index connection_data_id4_tmp on connection_data(int4(id));
CREATE INDEX
netflow=# explain analyze select * from connection_data where id=23457000;
                                                  QUERY
PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on connection_data  (cost=0.00..3716.32 rows=1 width=86)
(actual time=570.67..570.67 rows=0 loops=1)
   Filter: (id = 23457000)
 Total runtime: 570.81 msec
(3 rows)

netflow=# analyze connection_data;
ANALYZE
netflow=# explain analyze select * from connection_data where id=23457000;
                                                  QUERY
PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on connection_data  (cost=0.00..3716.62 rows=1 width=86)
(actual time=651.57..651.57 rows=0 loops=1)
   Filter: (id = 23457000)
 Total runtime: 651.70 msec
(3 rows)

Even turning off enable_seqscan doesn't help. for some reason, its
convinced it can't use these indexes.

pgsql-bugs by date:

Previous
From: Serge Obeuf
Date:
Subject: Question to You
Next
From: Bruce Momjian
Date:
Subject: Re: Continue: Bug #924: Segmentation fault in libpq/PQconsumeInput