Thread: Using indices with long unique IDs.

Using indices with long unique IDs.

From
"Sergey Olefir"
Date:
Hello!

I am planning to use unique IDs in the little system I am building. Now
being more than a little paranoid (and having no idea about expected loads),
I am wary of using int4 as a basis for uids (for the fear of ever running
out of them).

So the logical choice would be int8, right? Unfortunately quite wrong.
Statement of the form:
"SELECT * FROM table WHERE id=1"
will never use index for id (assumming id is int8) since '1' is of type
int4. This is confirmed both by documentation and SQL EXPLAIN (after set
enable_seqscan TO 'off').

There are two suggested work-arounds:
"SELECT * FROM table WHERE id='1'"
"SELECT * FROM table WHERE id=1::int8"

Unfortunately neither of them seem to be portable (for example, the one with
single quotes fails if I create PreparedStatement in Java:
con.prepareStatement("SELECT * FROM table WHERE id='?'"); apparently Java
doesn't parse question mark inside quotes).

I cannot have non-portable SQL as I am not convinced that PostgreSQL is the
right choice for my system. Oh, and by the way, I am using Java + JDBC for
my DB needs.


So what other options do I have? One seems to be NUMERIC( 13, 0 ) [or
something to that effect]. But what impact does it have performance-wise?
(code-wise it makes no difference through JDBC whether I use int4, int8, or
NUMERIC; I still represent that as 'long' in Java)

One thing I am definitely noticing is that NUMERIC( 13, 0 ) does not seem to
ever use sequential scan for queries (verified with EXPLAIN), it always goes
for indices. It that bad (performance wise)? Are there any other issues I
need to be aware of before settling on an uid type?

Thanks in advance,
-------------
Sergey Olefir
Exigen Latvia, system analyst

Honesty is a virtue.
That is if you manage to survive.


Re: Using indices with long unique IDs.

From
"D. Dante Lorenso"
Date:
Sergey Olefir wrote:

>So the logical choice would be int8, right? Unfortunately quite wrong.
>Statement of the form: "SELECT * FROM table WHERE id=1"
>will never use index for id (assumming id is int8) since '1' is of type
>int4. This is confirmed both by documentation and SQL EXPLAIN (after set
>enable_seqscan TO 'off').
>
>
I'm using BIGSERIAL as the primary key for all my tables.  Please tell
me that what is described above will not be true for me as well!
When I say:

    SELECT x, y, z
    FROM mytable
    WHERE pk_mybigint = 1;

That had better be using an index, or in a few months, OMG!  Let me check:

    leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300;
                                                QUERY
PLAN

---------------------------------------------------------------------------------------------------
     Seq Scan on leads  (cost=0.00..334.66 rows=1 width=263) (actual
time=21.35..21.46 rows=1 loops=1)
       Filter: (lead_id = 555300)
     Total runtime: 21.53 msec
    (3 rows)

    leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id =
555300::bigint;
                                                       QUERY
PLAN

----------------------------------------------------------------------------------------------------------------
     Index Scan using pk_leads on leads  (cost=0.00..5.36 rows=1
width=263) (actual time=0.18..0.18 rows=1 loops=1)
       Index Cond: (lead_id = 555300::bigint)
     Total runtime: 0.24 msec
    (3 rows)

Well, that just plain sucks.  That means I've gotta go back and
add casts to all my queries?

Tell me it isn't so!

Dante





Re: Using indices with long unique IDs.

From
Alvaro Herrera
Date:
On Fri, Jan 09, 2004 at 04:11:08AM -0600, D. Dante Lorenso wrote:
> Sergey Olefir wrote:
>
> >So the logical choice would be int8, right? Unfortunately quite wrong.
> >Statement of the form: "SELECT * FROM table WHERE id=1"
> >will never use index for id (assumming id is int8) since '1' is of type
> >int4. This is confirmed both by documentation and SQL EXPLAIN (after set
> >enable_seqscan TO 'off').

I think you can use WHERE id=cast(1 AS bigint)


> Well, that just plain sucks.  That means I've gotta go back and
> add casts to all my queries?
>
> Tell me it isn't so!

Yeah, it is :-(  AFAIU it is fixed in the current development version,
but who knows if it will be done in time before you tables grow too big ...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?"  (Mafalda)

Re: Using indices with long unique IDs.

From
Greg Stark
Date:
"Sergey Olefir" <so3lv@yahoo.com> writes:

> Unfortunately neither of them seem to be portable (for example, the one with
> single quotes fails if I create PreparedStatement in Java:
> con.prepareStatement("SELECT * FROM table WHERE id='?'"); apparently Java
> doesn't parse question mark inside quotes).

That's wrong. You should not quote the ? in the query.

If the driver is textually substituting the argument (as was necessary in 7.3
and prior) then it probably ought to quote it. If it's not there should be an
option to tell it to quote even integer arguments.

If it's any consolation 7.5 will use the index even if you put a plain integer
in the query. If production is a long way off you could consider developing
against a CVS build today and plan to use 7.5 for production when it comes
out.

--
greg

Re: Using indices with long unique IDs.

From
CoL
Date:
Hi,

D. Dante Lorenso wrote:
> Sergey Olefir wrote:
>
>> So the logical choice would be int8, right? Unfortunately quite wrong.
>> Statement of the form: "SELECT * FROM table WHERE id=1"
>> will never use index for id (assumming id is int8) since '1' is of type
>> int4. This is confirmed both by documentation and SQL EXPLAIN (after set
>> enable_seqscan TO 'off').
>>
>>
> I'm using BIGSERIAL as the primary key for all my tables.  Please tell
> me that what is described above will not be true for me as well!
> When I say:
>
>    SELECT x, y, z
>    FROM mytable
>    WHERE pk_mybigint = 1;
>
> That had better be using an index, or in a few months, OMG!  Let me check:
>
>    leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id = 555300;
>                                                QUERY
> PLAN
> ---------------------------------------------------------------------------------------------------
>
>     Seq Scan on leads  (cost=0.00..334.66 rows=1 width=263) (actual
> time=21.35..21.46 rows=1 loops=1)
>       Filter: (lead_id = 555300)
>     Total runtime: 21.53 msec
>    (3 rows)
>      leads=> EXPLAIN ANALYSE SELECT * FROM leads WHERE lead_id =
> 555300::bigint;
>                                                       QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------
>
>     Index Scan using pk_leads on leads  (cost=0.00..5.36 rows=1
> width=263) (actual time=0.18..0.18 rows=1 loops=1)
>       Index Cond: (lead_id = 555300::bigint)
>     Total runtime: 0.24 msec
>    (3 rows)
>
> Well, that just plain sucks.  That means I've gotta go back and
> add casts to all my queries?
>
> Tell me it isn't so!

use always '' even for numbers. where int = '1' or bigint = '1'. Change
your programing style :)

C.