Thread: Simplifying unknown-literal handling

Simplifying unknown-literal handling

From
Tom Lane
Date:
For the past couple of releases we've had support for cstring
(null-terminated string) as a full fledged datatype: you set
typlen = -2 to indicate that strlen() must be used to calculate
the actual size of a Datum.

It occurs to me that we should change type UNKNOWN's internal
representation to be like cstring rather than like text.  The
advantage of this is that the places in the parser that currently
call unknownin and unknownout could be replaced by just
CStringGetDatum and DatumGetCString, respectively, thus saving
two palloc's and two memcpy's per string literal.  It's not much,
but considering that this happens every time we parse a string
literal, I think it'll add up to a savings worth the small amount
of effort needed.

Anyone see a reason not to change this?
        regards, tom lane


Re: Simplifying unknown-literal handling

From
Alvaro Herrera
Date:
On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:
> For the past couple of releases we've had support for cstring
> (null-terminated string) as a full fledged datatype: you set
> typlen = -2 to indicate that strlen() must be used to calculate
> the actual size of a Datum.
> 
> It occurs to me that we should change type UNKNOWN's internal
> representation to be like cstring rather than like text.  The
> advantage of this is that the places in the parser that currently
> call unknownin and unknownout could be replaced by just
> CStringGetDatum and DatumGetCString, respectively, thus saving
> two palloc's and two memcpy's per string literal.  It's not much,
> but considering that this happens every time we parse a string
> literal, I think it'll add up to a savings worth the small amount
> of effort needed.
> 
> Anyone see a reason not to change this?

Is there any way we use UNKNOWN to represent bytea literals?
Say, comparing a untyped literal to a bytea column?

-- 
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)


Re: Simplifying unknown-literal handling

From
Tom Lane
Date:
Alvaro Herrera <alvherre@surnet.cl> writes:
> On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:
>> Anyone see a reason not to change this?

> Is there any way we use UNKNOWN to represent bytea literals?
> Say, comparing a untyped literal to a bytea column?

We use UNKNOWN to represent the raw string literal before we've
figured out that we need to feed it to byteain.  There aren't
going to be any embedded nulls at that point, if that's what
you are wondering.

If we ever decide to try to support embedded nulls in datatype
external representations, there are going to be way more changes
needed than just changing UNKNOWN again ... for starters, changing
the I/O functions of every single built-in and user-defined data type.
I don't think that's ever going to happen, so I'm not particularly
worried about propagating the assumption into one more place.
        regards, tom lane


Re: Simplifying unknown-literal handling

From
Andrew - Supernews
Date:
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@surnet.cl> writes:
>> On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:
>>> Anyone see a reason not to change this?
>
>> Is there any way we use UNKNOWN to represent bytea literals?
>> Say, comparing a untyped literal to a bytea column?
>
> We use UNKNOWN to represent the raw string literal before we've
> figured out that we need to feed it to byteain.  There aren't
> going to be any embedded nulls at that point, if that's what
> you are wondering.

Are there any cases where UNKNOWN can be received from the frontend as
a binary value? I suspect there are.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Simplifying unknown-literal handling

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> Are there any cases where UNKNOWN can be received from the frontend as
> a binary value? I suspect there are.

Sure, but that's transparent because we have binary I/O converters.
You will have trouble if you try to inject an embedded zero that way,
but the end result will look about the same as when you try to inject
an embedded zero now: the data after the zero will be dropped on readout.
        regards, tom lane


Re: Simplifying unknown-literal handling

From
Andrew - Supernews
Date:
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> Are there any cases where UNKNOWN can be received from the frontend as
>> a binary value? I suspect there are.
>
> Sure, but that's transparent because we have binary I/O converters.
> You will have trouble if you try to inject an embedded zero that way,
> but the end result will look about the same as when you try to inject
> an embedded zero now: the data after the zero will be dropped on readout.

What happens if you send an UNKNOWN from the frontend as binary, and then
when the desired type is figured out, it turns out to be a bytea? It's
obviously not acceptable then to truncate after a zero byte.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Simplifying unknown-literal handling

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> What happens if you send an UNKNOWN from the frontend as binary, and then
> when the desired type is figured out, it turns out to be a bytea? It's
> obviously not acceptable then to truncate after a zero byte.

This isn't an issue, because if the desired type is something other than
UNKNOWN, we won't be using UNKNOWN's binary input converter.  The actual
flow of information in the case you're thinking of is:

1. Client sends Parse message with, say, queryINSERT INTO tab(byteacol) VALUES($1);
and the type of param 1 either not specified or given as UNKNOWN.

2. Backend infers actual type of param 1 from context as BYTEA.

3. Client may or may not bother issuing a Describe to find out actual
type of parameter(s).

4. Client sends BIND with a binary value; backend applies BYTEA's input
converter (which is essentially memcpy).

Offhand I think the only way you could actually invoke UNKNOWN's binary
input converter is by executing a PREPARE with a parameter position
specifically declared as UNKNOWN, vizPREPARE foo(unknown) AS ...
and then using foo as the target of a binary BIND message.  I don't
think we are under contract to promise that such a thing will have any
particular behavior; and certainly not to promise that it will behave
more like bytea than like text.  In any case there is no runtime
coercion from UNKNOWN to BYTEA, so you'd really have to work at it
to cons up a case where you got behavior you didn't like.
        regards, tom lane


Re: Simplifying unknown-literal handling

From
Andrew - Supernews
Date:
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> What happens if you send an UNKNOWN from the frontend as binary, and then
>> when the desired type is figured out, it turns out to be a bytea? It's
>> obviously not acceptable then to truncate after a zero byte.
>
> This isn't an issue, because if the desired type is something other than
> UNKNOWN, we won't be using UNKNOWN's binary input converter.  The actual
> flow of information in the case you're thinking of is:
>
> 1. Client sends Parse message with, say, query
>     INSERT INTO tab(byteacol) VALUES($1);
> and the type of param 1 either not specified or given as UNKNOWN.
>
> 2. Backend infers actual type of param 1 from context as BYTEA.

Hrm. I was thinking of the case where the backend can't necessarily do
this, but in fact in that case the Parse seems to fail.

> Offhand I think the only way you could actually invoke UNKNOWN's binary
> input converter is by executing a PREPARE with a parameter position
> specifically declared as UNKNOWN, viz

Which of course leads to the question of why UNKNOWN has a binary input
converter at all...

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Simplifying unknown-literal handling

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. Backend infers actual type of param 1 from context as BYTEA.

> Hrm. I was thinking of the case where the backend can't necessarily do
> this, but in fact in that case the Parse seems to fail.

Right, deliberately so, for precisely the reason that we need to know
the correct input converters to use.

>> Offhand I think the only way you could actually invoke UNKNOWN's binary
>> input converter is by executing a PREPARE with a parameter position
>> specifically declared as UNKNOWN, viz

> Which of course leads to the question of why UNKNOWN has a binary input
> converter at all...

Maybe it shouldn't.  It does need a binary output converter, to avoid
gratuitous failures in cases likeSELECT 'foo';
so I figure it's probably best to leave the input converter there ...
        regards, tom lane