Thread: Using indices with long unique IDs.
Hello! I am planning to use unique IDs in the little system I am building. Now being more than a little paranoid (and having no idea about expected loads), I am wary of using int4 as a basis for uids (for the fear of ever running out of them). So the logical choice would be int8, right? Unfortunately quite wrong. Statement of the form: "SELECT * FROM table WHERE id=1" will never use index for id (assumming id is int8) since '1' is of type int4. This is confirmed both by documentation and SQL EXPLAIN (after set enable_seqscan TO 'off'). There are two suggested work-arounds: "SELECT * FROM table WHERE id='1'" "SELECT * FROM table WHERE id=1::int8" Unfortunately neither of them seem to be portable (for example, the one with single quotes fails if I create PreparedStatement in Java: con.prepareStatement("SELECT * FROM table WHERE id='?'"); apparently Java doesn't parse question mark inside quotes). I cannot have non-portable SQL as I am not convinced that PostgreSQL is the right choice for my system. Oh, and by the way, I am using Java + JDBC for my DB needs. So what other options do I have? One seems to be NUMERIC( 13, 0 ) [or something to that effect]. But what impact does it have performance-wise? (code-wise it makes no difference through JDBC whether I use int4, int8, or NUMERIC; I still represent that as 'long' in Java) One thing I am definitely noticing is that NUMERIC( 13, 0 ) does not seem to ever use sequential scan for queries (verified with EXPLAIN), it always goes for indices. It that bad (performance wise)? Are there any other issues I need to be aware of before settling on an uid type? Thanks in advance, ------------- Sergey Olefir Exigen Latvia, system analyst Honesty is a virtue. That is if you manage to survive.
Sergey Olefir wrote: >So the logical choice would be int8, right? Unfortunately quite wrong. >Statement of the form: "SELECT * FROM table WHERE id=1" >will never use index for id (assumming id is int8) since '1' is of type >int4. This is confirmed both by documentation and SQL EXPLAIN (after set >enable_seqscan TO 'off'). > > I'm using BIGSERIAL as the primary key for all my tables. Please tell me that what is described above will not be true for me as well! When I say: SELECT x, y, z FROM mytable WHERE pk_mybigint = 1; That had better be using an index, or in a few months, OMG! Let me check: leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on leads (cost=0.00..334.66 rows=1 width=263) (actual time=21.35..21.46 rows=1 loops=1) Filter: (lead_id = 555300) Total runtime: 21.53 msec (3 rows) leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300::bigint; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Index Scan using pk_leads on leads (cost=0.00..5.36 rows=1 width=263) (actual time=0.18..0.18 rows=1 loops=1) Index Cond: (lead_id = 555300::bigint) Total runtime: 0.24 msec (3 rows) Well, that just plain sucks. That means I've gotta go back and add casts to all my queries? Tell me it isn't so! Dante
On Fri, Jan 09, 2004 at 04:11:08AM -0600, D. Dante Lorenso wrote: > Sergey Olefir wrote: > > >So the logical choice would be int8, right? Unfortunately quite wrong. > >Statement of the form: "SELECT * FROM table WHERE id=1" > >will never use index for id (assumming id is int8) since '1' is of type > >int4. This is confirmed both by documentation and SQL EXPLAIN (after set > >enable_seqscan TO 'off'). I think you can use WHERE id=cast(1 AS bigint) > Well, that just plain sucks. That means I've gotta go back and > add casts to all my queries? > > Tell me it isn't so! Yeah, it is :-( AFAIU it is fixed in the current development version, but who knows if it will be done in time before you tables grow too big ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Porque francamente, si para saber manejarse a uno mismo hubiera que rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)
"Sergey Olefir" <so3lv@yahoo.com> writes: > Unfortunately neither of them seem to be portable (for example, the one with > single quotes fails if I create PreparedStatement in Java: > con.prepareStatement("SELECT * FROM table WHERE id='?'"); apparently Java > doesn't parse question mark inside quotes). That's wrong. You should not quote the ? in the query. If the driver is textually substituting the argument (as was necessary in 7.3 and prior) then it probably ought to quote it. If it's not there should be an option to tell it to quote even integer arguments. If it's any consolation 7.5 will use the index even if you put a plain integer in the query. If production is a long way off you could consider developing against a CVS build today and plan to use 7.5 for production when it comes out. -- greg
Hi, D. Dante Lorenso wrote: > Sergey Olefir wrote: > >> So the logical choice would be int8, right? Unfortunately quite wrong. >> Statement of the form: "SELECT * FROM table WHERE id=1" >> will never use index for id (assumming id is int8) since '1' is of type >> int4. This is confirmed both by documentation and SQL EXPLAIN (after set >> enable_seqscan TO 'off'). >> >> > I'm using BIGSERIAL as the primary key for all my tables. Please tell > me that what is described above will not be true for me as well! > When I say: > > SELECT x, y, z > FROM mytable > WHERE pk_mybigint = 1; > > That had better be using an index, or in a few months, OMG! Let me check: > > leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300; > QUERY > PLAN > --------------------------------------------------------------------------------------------------- > > Seq Scan on leads (cost=0.00..334.66 rows=1 width=263) (actual > time=21.35..21.46 rows=1 loops=1) > Filter: (lead_id = 555300) > Total runtime: 21.53 msec > (3 rows) > leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = > 555300::bigint; > QUERY > PLAN > ---------------------------------------------------------------------------------------------------------------- > > Index Scan using pk_leads on leads (cost=0.00..5.36 rows=1 > width=263) (actual time=0.18..0.18 rows=1 loops=1) > Index Cond: (lead_id = 555300::bigint) > Total runtime: 0.24 msec > (3 rows) > > Well, that just plain sucks. That means I've gotta go back and > add casts to all my queries? > > Tell me it isn't so! use always '' even for numbers. where int = '1' or bigint = '1'. Change your programing style :) C.