Re: Query planner isn't using my indices - Mailing list pgsql-general

From Tom Lane
Subject Re: Query planner isn't using my indices
Date
Msg-id 20848.1010678924@sss.pgh.pa.us
Whole thread Raw
In response to Re: Query planner isn't using my indices  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
Martijn van Oosterhout <kleptog@svana.org> writes:
> Interestingly, I find the easiest fix is: never use unquoted constants.

> If it didn't break other peoples queries I'd suggest modifying the parser so
> that numbers are parsed as strings of type unknown instead of numbers.

Well, we don't want to lose the information that the input looked like a
number and not a string; that's a reasonable thing for the type
heuristics to use.  There was a proposal on the table to initially treat
numeric literals as type UNKNOWNNUMERIC and then have a resolution
mechanism for that, much like UNKNOWN but with the possible target types
restricted to the numeric category.  It still had some problems though.
One that I recall was that OIDs weren't accounted for; another was that
it's not real clear how to let the size of the value affect resolution
(eg, it's reasonable to resolve 123456 as int4 or int8, but not int2).

I think we had also talked about having the scanner initially resolve
a literal as the "smallest" possible containing type and then make sure
there are useful automatic up-conversions in place.  The trouble here
is that the numeric hierarchy isn't strictly ordered (think about float8
vs numeric; OID is also a fly in the ointment) so "smallest" isn't very
well defined.

Something I have been thinking about is that we may actually have too
many entries in pg_proc and pg_operator.  Earlier in this thread,
someone complained about this:

regression=# select 'foo'::char(3) || 'bar'::varchar;
ERROR:  Unable to identify an operator '||' for types 'character' and 'character varying'
        You will have to retype this query using an explicit cast

But it works fine if one of the operands is text:

regression=# select 'foo'::char(3) || 'bar'::text;
 ?column?
----------
 foobar
(1 row)

Why is that?  It's because we have too many || operators:

regression=# \do ||
                                    List of operators
 Name |   Left arg type   |  Right arg type   |    Result type    |      Description
------+-------------------+-------------------+-------------------+-------------
----------
 ||   | bit               | bit               | bit               | bitwise concatenation
 ||   | bytea             | bytea             | bytea             | concatenate
 ||   | character         | character         | character         | concatenate
 ||   | character varying | character varying | character varying | concatenate
 ||   | text              | text              | text              | concatenate
(5 rows)

There's no exact match for char || varchar, and the resolver can't figure
out whether to prefer char || char or varchar || varchar, so it punts.
OTOH text is the "preferred type" in the string category, so it will
select text || text if either operand is text.

IMHO the correct fix is to remove the entries for char || char and
varchar || varchar, so that there's only one textual || operator,
namely text || text.  This would mean the result was always considered
text and never char or varchar, but so what?  Those types are all
assignment-compatible anyway.

A related thought is that our numeric-hierarchy problems might be
greatly simplified if we eliminated most or all of the cross-datatype
numeric operators (eg, int4 plus int2) and insisted that these be
handled as a type coercion step plus a single-datatype operator.
In that way the resolver would be much less likely to be faced with
multiple alternatives that it couldn't choose among.

See the pghackers archives for past discussions if you want to
pursue this.

            regards, tom lane

pgsql-general by date:

Previous
From: "Christian Meunier"
Date:
Subject: Re: URGENT: pg_dump & Postgres 7.2b4
Next
From: Stephan Szabo
Date:
Subject: Re: duplicating table