Thread: Simplifying unknown-literal handling
For the past couple of releases we've had support for cstring (null-terminated string) as a full fledged datatype: you set typlen = -2 to indicate that strlen() must be used to calculate the actual size of a Datum. It occurs to me that we should change type UNKNOWN's internal representation to be like cstring rather than like text. The advantage of this is that the places in the parser that currently call unknownin and unknownout could be replaced by just CStringGetDatum and DatumGetCString, respectively, thus saving two palloc's and two memcpy's per string literal. It's not much, but considering that this happens every time we parse a string literal, I think it'll add up to a savings worth the small amount of effort needed. Anyone see a reason not to change this? regards, tom lane
On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote: > For the past couple of releases we've had support for cstring > (null-terminated string) as a full fledged datatype: you set > typlen = -2 to indicate that strlen() must be used to calculate > the actual size of a Datum. > > It occurs to me that we should change type UNKNOWN's internal > representation to be like cstring rather than like text. The > advantage of this is that the places in the parser that currently > call unknownin and unknownout could be replaced by just > CStringGetDatum and DatumGetCString, respectively, thus saving > two palloc's and two memcpy's per string literal. It's not much, > but considering that this happens every time we parse a string > literal, I think it'll add up to a savings worth the small amount > of effort needed. > > Anyone see a reason not to change this? Is there any way we use UNKNOWN to represent bytea literals? Say, comparing a untyped literal to a bytea column? -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Sallah, I said NO camels! That's FIVE camels; can't you count?" (Indiana Jones)
Alvaro Herrera <alvherre@surnet.cl> writes: > On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote: >> Anyone see a reason not to change this? > Is there any way we use UNKNOWN to represent bytea literals? > Say, comparing a untyped literal to a bytea column? We use UNKNOWN to represent the raw string literal before we've figured out that we need to feed it to byteain. There aren't going to be any embedded nulls at that point, if that's what you are wondering. If we ever decide to try to support embedded nulls in datatype external representations, there are going to be way more changes needed than just changing UNKNOWN again ... for starters, changing the I/O functions of every single built-in and user-defined data type. I don't think that's ever going to happen, so I'm not particularly worried about propagating the assumption into one more place. regards, tom lane
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@surnet.cl> writes: >> On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote: >>> Anyone see a reason not to change this? > >> Is there any way we use UNKNOWN to represent bytea literals? >> Say, comparing a untyped literal to a bytea column? > > We use UNKNOWN to represent the raw string literal before we've > figured out that we need to feed it to byteain. There aren't > going to be any embedded nulls at that point, if that's what > you are wondering. Are there any cases where UNKNOWN can be received from the frontend as a binary value? I suspect there are. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > Are there any cases where UNKNOWN can be received from the frontend as > a binary value? I suspect there are. Sure, but that's transparent because we have binary I/O converters. You will have trouble if you try to inject an embedded zero that way, but the end result will look about the same as when you try to inject an embedded zero now: the data after the zero will be dropped on readout. regards, tom lane
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: >> Are there any cases where UNKNOWN can be received from the frontend as >> a binary value? I suspect there are. > > Sure, but that's transparent because we have binary I/O converters. > You will have trouble if you try to inject an embedded zero that way, > but the end result will look about the same as when you try to inject > an embedded zero now: the data after the zero will be dropped on readout. What happens if you send an UNKNOWN from the frontend as binary, and then when the desired type is figured out, it turns out to be a bytea? It's obviously not acceptable then to truncate after a zero byte. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > What happens if you send an UNKNOWN from the frontend as binary, and then > when the desired type is figured out, it turns out to be a bytea? It's > obviously not acceptable then to truncate after a zero byte. This isn't an issue, because if the desired type is something other than UNKNOWN, we won't be using UNKNOWN's binary input converter. The actual flow of information in the case you're thinking of is: 1. Client sends Parse message with, say, queryINSERT INTO tab(byteacol) VALUES($1); and the type of param 1 either not specified or given as UNKNOWN. 2. Backend infers actual type of param 1 from context as BYTEA. 3. Client may or may not bother issuing a Describe to find out actual type of parameter(s). 4. Client sends BIND with a binary value; backend applies BYTEA's input converter (which is essentially memcpy). Offhand I think the only way you could actually invoke UNKNOWN's binary input converter is by executing a PREPARE with a parameter position specifically declared as UNKNOWN, vizPREPARE foo(unknown) AS ... and then using foo as the target of a binary BIND message. I don't think we are under contract to promise that such a thing will have any particular behavior; and certainly not to promise that it will behave more like bytea than like text. In any case there is no runtime coercion from UNKNOWN to BYTEA, so you'd really have to work at it to cons up a case where you got behavior you didn't like. regards, tom lane
On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: >> What happens if you send an UNKNOWN from the frontend as binary, and then >> when the desired type is figured out, it turns out to be a bytea? It's >> obviously not acceptable then to truncate after a zero byte. > > This isn't an issue, because if the desired type is something other than > UNKNOWN, we won't be using UNKNOWN's binary input converter. The actual > flow of information in the case you're thinking of is: > > 1. Client sends Parse message with, say, query > INSERT INTO tab(byteacol) VALUES($1); > and the type of param 1 either not specified or given as UNKNOWN. > > 2. Backend infers actual type of param 1 from context as BYTEA. Hrm. I was thinking of the case where the backend can't necessarily do this, but in fact in that case the Parse seems to fail. > Offhand I think the only way you could actually invoke UNKNOWN's binary > input converter is by executing a PREPARE with a parameter position > specifically declared as UNKNOWN, viz Which of course leads to the question of why UNKNOWN has a binary input converter at all... -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 2. Backend infers actual type of param 1 from context as BYTEA. > Hrm. I was thinking of the case where the backend can't necessarily do > this, but in fact in that case the Parse seems to fail. Right, deliberately so, for precisely the reason that we need to know the correct input converters to use. >> Offhand I think the only way you could actually invoke UNKNOWN's binary >> input converter is by executing a PREPARE with a parameter position >> specifically declared as UNKNOWN, viz > Which of course leads to the question of why UNKNOWN has a binary input > converter at all... Maybe it shouldn't. It does need a binary output converter, to avoid gratuitous failures in cases likeSELECT 'foo'; so I figure it's probably best to leave the input converter there ... regards, tom lane