Thread: regproc's lack of certainty is dangerous

regproc's lack of certainty is dangerous

From
Tom Lane
Date:
Deepak Bhole of Red Hat just pointed out to me a failure that he got
after some 7.3 stress testing:

> [ "`~!@#$%^&*()''| \final_db\n,.;:'" ]=# SELECT n.nspname, p.proname,
> o.oprcode::oid FROM pg_operator o, pg_proc p, pg_namespace n WHERE
> o.oid=270447::oid AND p.oid=o.oprcode::oid AND p.pronamespace=n.oid;

> ERROR:  There is more than one procedure named "[ ""`~!@#$%^&*()''|
> \final_schema\n,.;:'"" ]"."[ ""`~!@#$%^&*''| \ {func_for_op}\n,.;:'"" ]"

This error comes out of regprocin() when it finds multiple candidate
functions with the same name (and, presumably, different argument lists
or different schemas).  No big surprise, since that's what he had.
But it's a bit odd that regprocin() is being invoked, when there's no
regproc literal in the given query.

After some digging, it turns out that the error is appearing because
that function name is present in the pg_statistic entry for
pg_operator.oprcode.  *Any* query that causes the optimizer to become
interested in pg_operator.oprcode will fail under these circumstances
:-(.  And the user can't readily avoid this, since there's no way to
be sure which function names will happen to end up in the histogram.
"Never ANALYZE the table" isn't going to fly as a workaround.

I am not real sure what we should do about it.  Clearly there is more
risk than I'd realized in datatypes whose input routines may reject
strings that their output routines had produced in good faith.

One possible route is to try to eliminate the ambiguity, but I doubt
that that will work very effectively for regproc and friends --- the
whole point of those types is to resolve ambiguous input, and so the
possibility of failures in the input routine can't easily be removed.
regproc in particular needs its special behavior to be useful for
bootstrapping.

Another approach is to try to fix pg_statistic to avoid the problem by
not doing I/O conversions.  For scalar datatypes (those that have
associated array types) it'd be probably be feasible to store the
histogram and most-common-value arrays as arrays of the datatype itself,
not arrays of text; that should be a win for performance as well as
avoiding risky conversions.  I am not sure what to do about columns that
have datatypes without matching array types, though (that would include
array columns and domains, IIRC).  Maybe use array of bytea to hold the
internal representation of the type?

Any comments or better ideas out there?
        regards, tom lane


Re: regproc's lack of certainty is dangerous

From
Joe Conway
Date:
Tom Lane wrote:
> Another approach is to try to fix pg_statistic to avoid the problem by
> not doing I/O conversions.  For scalar datatypes (those that have
> associated array types) it'd be probably be feasible to store the
> histogram and most-common-value arrays as arrays of the datatype itself,
> not arrays of text; that should be a win for performance as well as
> avoiding risky conversions.  I am not sure what to do about columns that
> have datatypes without matching array types, though (that would include
> array columns and domains, IIRC).  Maybe use array of bytea to hold the
> internal representation of the type?

ISTM that the best (if not the only feasible) approach is using array of 
bytea to hold the internal representation of the type.

Joe




Re: regproc's lack of certainty is dangerous

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> I am not sure what to do about columns that
>> have datatypes without matching array types, though (that would include
>> array columns and domains, IIRC).  Maybe use array of bytea to hold the
>> internal representation of the type?

> ISTM that the best (if not the only feasible) approach is using array of 
> bytea to hold the internal representation of the type.

I'd like "select * from pg_statistic" to still produce readable output
whenever possible, though.  The bytea approach falls down badly on that
score, so I don't want to resort to it except where I absolutely must.

I think that we can actually get away (from an implementation point of
view) with a column containing arrays of different base types; array_out
will still work AFAIR.  It's an interesting question though how such a
column could reasonably be declared.  This ties into your recent
investigations into polymorphic array functions, perhaps.

Maybe "anyarray" shouldn't be quite so pseudo a pseudotype?
        regards, tom lane


Re: regproc's lack of certainty is dangerous

From
Joe Conway
Date:
Tom Lane wrote:
> I think that we can actually get away (from an implementation point of
> view) with a column containing arrays of different base types; array_out
> will still work AFAIR.  It's an interesting question though how such a
> column could reasonably be declared.  This ties into your recent
> investigations into polymorphic array functions, perhaps.
> 
> Maybe "anyarray" shouldn't be quite so pseudo a pseudotype?

I was having similar thoughts when you first posted this, but I wasn't 
sure you'd want to go there. I wonder what changes are required other 
than promoting the typtype from a 'p' to a 'b' and the I/O functions to 
array_out/array_in?

Joe





Re: regproc's lack of certainty is dangerous

From
Joe Conway
Date:
Tom Lane wrote:
> I think that we can actually get away (from an implementation point of
> view) with a column containing arrays of different base types; array_out
> will still work AFAIR.  It's an interesting question though how such a
> column could reasonably be declared.  This ties into your recent
> investigations into polymorphic array functions, perhaps.
> 
> Maybe "anyarray" shouldn't be quite so pseudo a pseudotype?
> 

More on this idea; here is a simple experiment:

regression=# update pg_type set typtype = 'b', typinput = 'array_in', 
typoutput = 'array_out' where oid = 2277;
UPDATE 1
regression=# create table bar(f1 int, f2 anyarray);
CREATE TABLE
regression=# insert into bar values (1,'{1,2}'::integer[]);
INSERT 744428 1
regression=# insert into bar values (2,'{a,b}'::text[]);
INSERT 744429 1
regression=# select * from bar; f1 |  f2
----+-------  1 | {1,2}  2 | {a,b}
(2 rows)

Interesting ... but then there is:
regression=# select f1, f2[2] from bar;
ERROR:  transformArraySubscripts: type anyarray is not an array

A bit more to do I guess.

Joe



Re: regproc's lack of certainty is dangerous

From
Tom Lane
Date:
I wrote:
>> I think that we can actually get away (from an implementation point of
>> view) with a column containing arrays of different base types; array_out
>> will still work AFAIR.  It's an interesting question though how such a
>> column could reasonably be declared.  This ties into your recent
>> investigations into polymorphic array functions, perhaps.
>> 
>> Maybe "anyarray" shouldn't be quite so pseudo a pseudotype?

I have committed a fix for these problems that makes pg_statistic's
columns into "anyarray" columns.  It turns out that my original concerns
about datatypes without associated array types don't matter --- we can
physically build such an array, regardless of whether we can point to a
pg_type entry that describes it.

This is certainly something of a kluge at the moment, because
pg_statistic is making use of facilities that don't exist at the SQL
level.  It gets the job done, but I'd like to see some fuller support
for "anyarray" in future.
        regards, tom lane