Thread: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
Sayyid Ali Sajjad Rizavi
Date:
Hi !
I discovered an interesting behavior in PostgreSQL bulk update query using `from (values %s)` syntax.
Let's see an example;
```
update persons p
set age = t.age
from (
values
('uuid1', null),
('uuid2', null)
) as t(id, age)
```
The `age` column is of type integer. The above query will give this error: "age" is of type integer but expression is of type text. (PostgreSQL resolves the type as a text).
But if we change the values to these;
```
values
('uuid1', 21),
('uuid2', null)
```
We won't get any error because PostgreSQL will detect that at least one integer value exists in the 2nd position, so let's resolve this guy to `integer`.
The issue here is that it's very unexpected behavior which might succeed in most of the cases and fail in one case. This behavior can be seen in the `parser/parse_coerce.c` file.
```
/*
* If all the inputs were UNKNOWN type --- ie, unknown-type literals ---
* then resolve as type TEXT. This situation comes up with constructs
* like SELECT (CASE WHEN foo THEN 'bar' ELSE 'baz' END); SELECT 'foo'
* UNION SELECT 'bar'; It might seem desirable to leave the construct's
* output type as UNKNOWN, but that really doesn't work, because we'd
* probably end up needing a runtime coercion from UNKNOWN to something
* else, and we usually won't have it. We need to coerce the unknown
* literals while they are still literals, so a decision has to be made
* now.
*/
if (ptype == UNKNOWNOID)
ptype = TEXTOID;
* If all the inputs were UNKNOWN type --- ie, unknown-type literals ---
* then resolve as type TEXT. This situation comes up with constructs
* like SELECT (CASE WHEN foo THEN 'bar' ELSE 'baz' END); SELECT 'foo'
* UNION SELECT 'bar'; It might seem desirable to leave the construct's
* output type as UNKNOWN, but that really doesn't work, because we'd
* probably end up needing a runtime coercion from UNKNOWN to something
* else, and we usually won't have it. We need to coerce the unknown
* literals while they are still literals, so a decision has to be made
* now.
*/
if (ptype == UNKNOWNOID)
ptype = TEXTOID;
```
So here are the 2 options I suggest:
Option 1: Cast to the relevant column type in that position (to `integer` in this case), whenever we have an unknown type.
Option 2: Always give error if unknown type is not casted to desired type (`null::integer` will be necessary).
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
"David G. Johnston"
Date:
On Wednesday, January 4, 2023, Sayyid Ali Sajjad Rizavi <sasrizavi@gmail.com> wrote:
Option 1: Cast to the relevant column type in that position (to `integer` in this case), whenever we have an unknown type.
This happens when possible so any remaining cases are not possible. Or, at least apparently not worth the effort it would take to make work.
Option 2: Always give error if unknown type is not casted to desired type (`null::integer` will be necessary).
Breaking working queries for this is not acceptable.
David J.
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
Sayyid Ali Sajjad Rizavi
Date:
Breaking working queries for this is not acceptable.
Good point, let's exclude Option 2.
This happens when possible so any remaining cases are not possible. Or, at least apparently not worth the effort it would take to make work.
Actually this doesn't happen when all of the values in that position are null. Or maybe I don't understand what you mean.
If we don't consider the effort it would take to make it work, do you think Option 1 would be good to have? Because when I
have an integer column in that position, I wouldn't want the unknown (null) values I supply to be resolved to `text` type.
On Thu, Jan 5, 2023 at 11:23 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wednesday, January 4, 2023, Sayyid Ali Sajjad Rizavi <sasrizavi@gmail.com> wrote:Option 1: Cast to the relevant column type in that position (to `integer` in this case), whenever we have an unknown type.This happens when possible so any remaining cases are not possible. Or, at least apparently not worth the effort it would take to make work.Option 2: Always give error if unknown type is not casted to desired type (`null::integer` will be necessary).Breaking working queries for this is not acceptable.David J.
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
"David G. Johnston"
Date:
Please don’t top-post
Breaking working queries for this is not acceptable.Good point, let's exclude Option 2.This happens when possible so any remaining cases are not possible. Or, at least apparently not worth the effort it would take to make work.Actually this doesn't happen when all of the values in that position are null. Or maybe I don't understand what you mean.If we don't consider the effort it would take to make it work, do you think Option 1 would be good to have? Because when Ihave an integer column in that position, I wouldn't want the unknown (null) values I supply to be resolved to `text` type.
The VALUES subquery has to produce its tabular output without being aware of how the outer query is going to use it. The second column of your values subquery lacks type information so the system chooses a default - text.
Dealing with types is one of the harder medium-hard problems in computer science…encountering this problem in real life has never seen me motivated enough to gripe about it rather than just add an explicit cast and move on. And I’ve been around long enough to know that the project is, and long has been, aware of the dull pain points in this area.
David J.
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
Ashutosh Bapat
Date:
On Thu, Jan 5, 2023 at 12:42 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > The VALUES subquery has to produce its tabular output without being aware of how the outer query is going to use it. Thesecond column of your values subquery lacks type information so the system chooses a default - text. > > Dealing with types is one of the harder medium-hard problems in computer science…encountering this problem in real lifehas never seen me motivated enough to gripe about it rather than just add an explicit cast and move on. And I’ve beenaround long enough to know that the project is, and long has been, aware of the dull pain points in this area. > being here for quite a few years now I agree. It's tempting to trying to fix a problem in this area since it seems the fix is simple but it is hard to realize the wider impact that simple fix has. Still let me try to propose something :) we cast a quoted value to UNKNOWN type, but this is a special value null which can be casted to any SQL data type. Probably we could add a ANYNULLTYPE or some such generic null type which can be casted to any data type. Then a null value without any type is labeled as ANYNULLTYPE if specific type information is not available. This problem wouldn't arise then. Of course that's a lot of code to fix seemingly rare problem so may not be worth it still. -- Best Wishes, Ashutosh Bapat
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
Tom Lane
Date:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes: > we cast a quoted value to UNKNOWN type, but this is a special value > null which can be casted to any SQL data type. Probably we could add a > ANYNULLTYPE or some such generic null type which can be casted to any > data type. Then a null value without any type is labeled as > ANYNULLTYPE if specific type information is not available. And ... how does that differ from the existing behavior of UNKNOWN? regards, tom lane
Re: Resolve iso-8859-1 type to relevant type instead of text type while bulk update using values
From
Ashutosh Bapat
Date:
On Fri, Jan 6, 2023 at 8:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes: > > we cast a quoted value to UNKNOWN type, but this is a special value > > null which can be casted to any SQL data type. Probably we could add a > > ANYNULLTYPE or some such generic null type which can be casted to any > > data type. Then a null value without any type is labeled as > > ANYNULLTYPE if specific type information is not available. > > And ... how does that differ from the existing behavior of UNKNOWN? > From the below comment /* * If all the inputs were UNKNOWN type --- ie, unknown-type literals --- * then resolve as type TEXT. This situation comes up with constructs * like SELECT (CASE WHEN foo THEN 'bar' ELSE 'baz' END); SELECT 'foo' * UNION SELECT 'bar'; It might seem desirable to leave the construct's * output type as UNKNOWN, but that really doesn't work, because we'd * probably end up needing a runtime coercion from UNKNOWN to something * else, and we usually won't have it. We need to coerce the unknown * literals while they are still literals, so a decision has to be made * now. */ A constant null can be coerced to be null of any data type. So it doesn't need to be coerced to text or anything for the reason mentioned in the comment. Using UNKNOWN type, we have problem of not being able to coerce it to another type. But ANYNULLVALUE can be coerced to anything and thus can continue to be used till a point where we know the data type it needs to be coerced to. -- Best Wishes, Ashutosh Bapat