Thread: Broken type checking for empty subqueries
Hello, after upgrade from PG11 to PG14.8 the following statement started failing: select a::BIGINT FROM ( SELECT '$maybeNumber'::VARCHAR AS a WHERE NOT '$maybeNumber' LIKE '%maybeNumber' ) AS foo; Expected (as in PG11): The query should return empty result with one column. Actual: SQL Error [22P02]: ERROR: invalid input syntax for type bigint: "$maybeNumber". Where is a workaround that shows the behavior is quite non-stable. Following passes normally while still using the same subquery with one more empty result from another one: select a::BIGINT FROM ( SELECT '$maybeNumber'::VARCHAR AS a WHERE NOT '$maybeNumber' LIKE '%maybeNumber' UNION SELECT '0' AS a WHERE false ) AS foo; Thanks all for the great work you do on PG. Marek Malevic
On 9/28/23 11:08, Marek Malevič wrote: > Hello, > > after upgrade from PG11 to PG14.8 the following statement started failing: > > select > a::BIGINT > FROM ( > SELECT > '$maybeNumber'::VARCHAR AS a > WHERE NOT '$maybeNumber' LIKE '%maybeNumber' > ) AS foo; > > Expected (as in PG11): The query should return empty result with one > column. > Actual: SQL Error [22P02]: ERROR: invalid input syntax for type bigint: > "$maybeNumber". > > Where is a workaround that shows the behavior is quite non-stable. > Following passes normally while still using the same subquery with one > more empty result from another one: > > select > a::BIGINT > FROM ( > SELECT > '$maybeNumber'::VARCHAR AS a > WHERE NOT '$maybeNumber' LIKE '%maybeNumber' > UNION > SELECT '0' AS a WHERE false > ) AS foo; I bisected this back to 4be058fe9ec5e630239b656af21fc083371f30ed, "In the planner, replace an empty FROM clause with a dummy RTE." -- Vik Fearing
On Fri, 29 Sept 2023 at 00:28, Vik Fearing <vik@postgresfriends.org> wrote: > I bisected this back to 4be058fe9ec5e630239b656af21fc083371f30ed, "In > the planner, replace an empty FROM clause with a dummy RTE." This just seems to be due to the fact that that commit allowed FROM-less subqueries to be pulled up and that now allows constant folding to attempt to perform the cast to BIGINT. An OFFSET 0 in the subquery would prevent the subquery being pulled up. After the pullup, and constant folding the WHERE clause, the query becomes: select '$maybeNumber'::bigint where false; When we constant-fold the target list, we'll get the error as it'll try and convert the string to a bigint. Without that ability, we'd have to do more run-time evaluation in queries such as: postgres=# explain verbose select '10' + 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 11 but as you can see, the planner knows the value is 11. There have been reports about these things before. For example [1]. I've not looked at the code to see if this would be practical or not, but I wonder if we could reduce these bug reports by using the new soft error reporting that's now done in the input functions to have constant folding just silently not do any folding for the expression if a cast fails. I don't particularly see anything wrong with making these queries work if they currently fail. I imagine most people who have had them fail during constant folding have just redesigned or found some hack to prevent the folding from taking place anyway. Overall, it wouldn't be great if we had to ensure we keep these sorts of queries working as they did in previous versions. That would just prevent us from adding certain optimisations that might be very useful. David [1] https://www.postgresql.org/message-id/17637-5904e3fdee533c7f@postgresql.org
On Fri, 29 Sept 2023 at 01:41, David Rowley <dgrowleyml@gmail.com> wrote: > I've not looked at the code to see if this would be practical or not, > but I wonder if we could reduce these bug reports by using the new > soft error reporting that's now done in the input functions to have > constant folding just silently not do any folding for the expression > if a cast fails. I don't particularly see anything wrong with making > these queries work if they currently fail. I imagine most people who > have had them fail during constant folding have just redesigned or > found some hack to prevent the folding from taking place anyway. I got curious if this would be difficult or not and came up with the attached attempt-to-do-it-to-see-if-it-works grade patch. On a very quick test, it does seem to work. Patched: postgres=# explain verbose select '123a'::varchar::bigint where false; -- previously failed QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=8) Output: ('123a'::cstring)::bigint One-Time Filter: false (3 rows) postgres=# explain verbose select '123'::varchar::bigint where false; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=8) Output: '123'::bigint One-Time Filter: false (3 rows) Master: postgres=# explain verbose select '123a'::varchar::bigint where false; ERROR: invalid input syntax for type bigint: "123a" postgres=# explain verbose select '123'::varchar::bigint where false; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=8) Output: '123'::bigint One-Time Filter: false (3 rows) I didn't really follow the soft error development work done in d9f7f5d32 very closely. I'm assuming any extension that has not updated its type input function(s) to use errsave/ereturn instead of elog/ereport will just continue to fail and there's nothing really we can do about that. Maybe that's not really a problem as constant folding would raise errors today for invalid inputs for the given type anyway. It would at least work nicer for all our built-in types that did have their input functions modified to suppress invalid inputs when there's an ErrorSaveContext. Should we consider doing something like this? David
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > I've not looked at the code to see if this would be practical or not, > but I wonder if we could reduce these bug reports by using the new > soft error reporting that's now done in the input functions to have > constant folding just silently not do any folding for the expression > if a cast fails. Sadly, I doubt that would cover enough of the problem space to make much difference to people who try to do this sort of thing. > ... I imagine most people who > have had them fail during constant folding have just redesigned or > found some hack to prevent the folding from taking place anyway. Yeah. The given query looks like it was already hacked to avoid constant-folding, though I wonder if whoever wrote it really understood that. Otherwise it'd be a lot more natural to use something like CASE. Anyway, adding OFFSET 0 to suppress sub-select folding is probably the best answer here. regards, tom lane
On Fri, 29 Sept 2023 at 03:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I've not looked at the code to see if this would be practical or not, > > but I wonder if we could reduce these bug reports by using the new > > soft error reporting that's now done in the input functions to have > > constant folding just silently not do any folding for the expression > > if a cast fails. > > Sadly, I doubt that would cover enough of the problem space to make > much difference to people who try to do this sort of thing. hmm, I guess it does nothing for stuff like overflow errors since we didn't adjust the ereports of those functions: We'd still get: postgres=# explain verbose select 0x7FFFFFFF + 1 where false; ERROR: integer out of range David
David Rowley <dgrowleyml@gmail.com> writes: > On Fri, 29 Sept 2023 at 03:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sadly, I doubt that would cover enough of the problem space to make >> much difference to people who try to do this sort of thing. > hmm, I guess it does nothing for stuff like overflow errors since we > didn't adjust the ereports of those functions: Yeah. I suppose we could start trying to expand the use of soft error reporting, but it'd be a herculean, probably multi-year task to get to where even "most cases" would be covered. I was actually wondering after sending my previous message whether CASE wouldn't do the job for the OP. We do have a rule that once a CASE test expression has reduced to constant-true or constant-false, we skip const-folding for CASE arms that are thereby proven unreachable. Thus for instance regression=# select case when 1 < 2 then 42 else 1/0 end; case ------ 42 (1 row) The big trick in using this is to understand what is covered by const-folding and what is not. Initial conversion of a literal string isn't, because that's done at parse time. So this still fails: regression=# select case when 1 < 2 then 42 else 'foo'::int end; ERROR: invalid input syntax for type integer: "foo" LINE 1: select case when 1 < 2 then 42 else 'foo'::int end; ^ but you can get around that like this: regression=# select case when 1 < 2 then 42 else 'foo'::text::int end; case ------ 42 (1 row) because the text-to-int conversion is considered a run-time not parse-time operation. So I wonder whether that weird-looking sub-select couldn't be converted to a simple SELECT CASE with both more readability and better future-proofing against optimizer improvements. Another way is to push it all into procedural logic, ie a plpgsql function that doesn't try to evaluate the problematic expression until it's checked the test expression. regards, tom lane
On Fri, 29 Sept 2023 at 10:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah. I suppose we could start trying to expand the use of soft > error reporting, but it'd be a herculean, probably multi-year > task to get to where even "most cases" would be covered. I had a go at trying to categorise the reports we've received over the years about this. My search was just limited to the search term "constant folding" It looks like of the 9 below, the input function soft errors would cover #0 (this report), #1 and #4. Division by 0 covers another 2 (#5 and #8). So we could fix "most cases" if we added soft errors to arithmetic. Also, from looking at [1], there has been some interest in the past to stop these surprising const-folding errors. 0. This thread. Casting issue. text can't be parsed by type's input function. effectively: # explain select '$maybeNumber'::bigint where false; ERROR: invalid input syntax for type bigint: "$maybeNumber" 1. Bug 17637 https://www.postgresql.org/message-id/17637-5904e3fdee533c7f%40postgresql.org Casting issue in case statement with tmp as (select 1::int8 id, '123.4'::text v) select t.id, case m.mapped_to when 1 then v::float8 else null end, case m.mapped_to when 2 then v::bool else null end from tmp t join tmap m on m.id = t.id; ERROR: invalid input syntax for type boolean: "123.4" 2. https://www.postgresql.org/message-id/CAAT35tGXUYgjjViNZ5%2B9nFkrOcmgE4ce%2BVvekjgKDy_C38RT2g%40mail.gmail.com immutable function raising an exception SELECT raise_exception_immutable('foo') WHERE false; ERROR: foo 3. Bug 16545 https://www.postgresql.org/message-id/16545-affff840bc4e72ca%40postgresql.org Complaint about coalesce evaluating arguments after the first non-NULL value. # SELECT coalesce((SELECT 'ONE'), (SELECT 'TWO' WHERE '123' ~ ((xpath('/tag/text()','<tag>[</tag>'))[1]::TEXT) ) ); ERROR: invalid regular expression: brackets [] not balanced ereport in RE_compile_and_cache(). 4. https://www.postgresql.org/message-id/C0FDEC5E-0E01-4FAB-A7A6-3FAC1F94B51E%40gmail.com Appears an error from the t7.value::numeric case below. -- CASE WHEN t9.is_internal_namespace = true -- AND t9.code = 'STORAGE_POSITION.STORAGE_RACK_ROW' -- AND (t10.code = 'INTEGER' OR t10.code = 'REAL') -- THEN t7.value::numeric = 1 -- ELSE false -- END not much further details. 5. https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B17BB4EF8%40ntex2010a.host.magwien.gv.at division by zero. test=> CREATE FUNCTION zero() RETURNS integer IMMUTABLE LANGUAGE SQL AS 'SELECT 0'; CREATE FUNCTION test=> SELECT CASE WHEN (SELECT zero()) = 0 THEN -1 ELSE 1/zero() END; ERROR: division by zero 6. https://www.postgresql.org/message-id/flat/11494.1144794560%40sss.pgh.pa.us#68696f2d794cd2d64f9c596782ee8f3a Some procedural language error. 7. https://www.postgresql.org/message-id/flat/815.1049942992%40sss.pgh.pa.us#0874c4dffad1d389fe3f812f18039583 A very old one. Unsure if it relates to casting or a bug that was fixed. 8. https://www.postgresql.org/message-id/20020414165222.914FB475451%40postgresql.org A very old one from 2002 SELECT CASE WHEN 1 = 2 THEN 1 / 0 WHEN 1 = 1 THEN 1.0 END; ERROR: floating point exception! The last floating point operation either exceeded legal ranges or was a divide by zero Stopped as these reports are getting very old and less valuable. David [1] https://www.postgresql.org/message-id/265964.1595523454%40sss.pgh.pa.us