Re: int8 primary keys still not using index without manual - Mailing list pgsql-general

From Bruce Momjian
Subject Re: int8 primary keys still not using index without manual
Date
Msg-id 200311071827.hA7IR5503637@candle.pha.pa.us
Whole thread Raw
In response to Re: int8 primary keys still not using index without manual  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: int8 primary keys still not using index without manual
List pgsql-general
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > We can try removing most int2 functions and see if that makes such
> > conversions much easier, or try casting incoming constants to int2 to
> > see what happens.
>
> I already did that --- that was exactly the substance of the tests I was
> reporting in that message.
>
> I have been thinking lately that the fundamental approach is wrong
> anyway.  Basically the idea was to make the world safe for
> single-datatype index handling by removing all the cross-type comparison
> operators.  The reason
>
>     SELECT ... WHERE int8col = 42
>
> isn't indexable is that the = operator is int84eq, which is not to be
> found in the set of operators associated with an index on int8.  What
> we were thinking was that if we didn't have int84eq then the parser
> would be forced to promote the 42 to int8, and then the comparison using
> int8eq would be recognized as indexable.

Could we not always promote int4 to int8 for indexing purposes?  I
realize OID has issues, though, as you mention in that URL:

    This algorithm was wrong on both practical and theoretical levels;
    in the first place it's not very helpful to only be able to handle
    binary-compatible transformations, and in the second place there isn't
    any good guarantee that it's not changing the semantics when it replaces
    the operator.  For instance int4 < and oid < do not act the same.
    Depending on equality of operator names was a bad idea even then, and
    would be quite unworkable now in the world of schema search paths.

Could we just promote int4 constants to int8 always?  I just checked and
2^32-1 is already promoted to int8:

    select 4294967295;

so the funny thing is that:

    SELECT ... WHERE int8col = 4294967296;

already uses the int8 index.

I think the complex case you mentioned was oid.  Let's look at the
various possible constant comparisons against an oid column:

    -1    This constant would match no oid, so we could just allow the
    sequential scan.  I don't think anyone would expect good behavior for
    such a comparison.


    1    This could be promoted to oid cleanly.

    2^31+1    This will come in a int8, so we can just downcast to oid
    automatically.

I know your case was "<" comparison.  It would be:

    SELECT ... WHERE oidcol < -1;
    SELECT ... WHERE oidcol < 1;
    SELECT ... WHERE oidcol < 2147483649;  2^31+1

These all seem to work, I think.  -1 will not use an index, which is OK.

I am concerned about having to add catalog maintenance for every index
case, which seems it could be a lot.

Here is my logic.  I am having trouble getting the big picture on this:

    int2 fits in the int4 range
    int4 fits in the int8 range

    oid fits only in the int8 range, some oids fit in int4

This means a valid oid could come in as int4 or int8.

I realize this requires hard-coded comparisons to C include defines to
get the maxium for each type.  I know this breaks our type-neutral
style, but in this case, it seems it might be the cleanest way ---
abstracting this out into a table seems too hard.

Now for int2-based indexes.  Can't we just downcast constants to int2 if
they fit in the int2 valid range?

> I think this might actually be workable for int8, but it's not going to
> work for int2 without changing the initial typing of small integer
> constants, and we already know that that opens a Pandora's box of other
> problems.
>
> But quite aside from the semantic difficulties of rejiggering all that
> stuff, it's going to break other parts of the optimizer if we do it.
> In particular it will interfere with handling of mergejoins and
> recognizing transitive equality.  For example consider
>
>     SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42;
>
> Currently we are able to deduce a.int8col = 42 (where the operator
> is int84eq).  If we remove int84eq then the output of the parser for
> this example will look like
>
>     SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42;
>
> and the transitive equality will not be recognized because
> b.int4col::int8 is not the same expression as b.int4col.

Seems we should keep those cross-type comparisons around for col op col
comparisons, at least, as well as internal optimizer use as you
described.

> So I'm currently thinking we'd be better off not to try to eliminate
> the cross-type comparison operators.  Instead we need some solution
> that is narrowly focused on the problem of making a non-indexable
> comparison indexable, by converting a comparison value of the wrong
> datatype into the right datatype locally to the indexscan plan
> generation code.  I posted some speculation about that here:
> http://archives.postgresql.org/pgsql-hackers/2003-09/msg00983.php

Agreed.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-general by date:

Previous
From: Jeff
Date:
Subject: Re: Recovery Data Cant Be!!!
Next
From: Alvaro Herrera
Date:
Subject: Re: Service doesnt UP!!!!