Re: BUG #17364: Errors caused by views - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #17364: Errors caused by views |
Date | |
Msg-id | 3011206.1642010290@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #17364: Errors caused by views (PG Bug reporting form <noreply@postgresql.org>) |
List | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > I run the following SQL statements on postgres13.2. They are simple, but an > error occurs. > create table test1 (id int, c_data varchar); > insert into test1 values (1, '201201'); > insert into test1 values (2, 'error'); > create table test2(id int, status varchar); > insert into test2 values (1, 'on'); > insert into test2 values (2, 'off'); > create view test_view as > SELECT test1.c_data > from test1 LEFT JOIN test2 > ON (test2.id = test1.id) > WHERE (test2.status)::text = 'on'::text; > select * from test_view; > c_data > ---------- > 20201101 > (1 row) > select * from > (select to_date(c_data, 'YYYYMMDD') as date > from test_view) t1 > where t1.date >= '2020 - 10 - 01':: TIMESTAMP; > ERROR: invalid value "erro" for "YYYY" > DETAIL: Value must be an integer. The reason this is failing is that the outer WHERE clause is being pushed down to the scan of test1, so it's applied before the join occurs, and then to_date spits up on the "error" value. You can see that in EXPLAIN: =# explain select * from (select to_date(c_data, 'YYYYMMDD') as date from test_view) t1 where t1.date >= '2020-10-01':: TIMESTAMP; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=25.95..56.75 rows=13 width=4) Hash Cond: (test1.id = test2.id) -> Seq Scan on test1 (cost=0.00..29.05 rows=423 width=36) Filter: (to_date((c_data)::text, 'YYYYMMDD'::text) >= '2020-10-01 00:00:00'::timestamp without time zone) -> Hash (cost=25.88..25.88 rows=6 width=4) -> Seq Scan on test2 (cost=0.00..25.88 rows=6 width=4) Filter: ((status)::text = 'on'::text) (7 rows) That's not something we're going to consider a bug, because if we prevented the push-down from happening it would be quite disastrous for the performance of many queries. There are a couple of things you can do to control it yourself, though: * Insert "OFFSET 0" as an optimization fence: =# explain select * from (select to_date(c_data, 'YYYYMMDD') as date from test_view offset 0) t1 where t1.date >= '2020-10-01':: TIMESTAMP; QUERY PLAN --------------------------------------------------------------------------- Subquery Scan on t1 (cost=25.95..54.36 rows=13 width=4) Filter: (t1.date >= '2020-10-01 00:00:00'::timestamp without time zone) -> Hash Join (cost=25.95..53.89 rows=38 width=4) Hash Cond: (test1.id = test2.id) -> Seq Scan on test1 (cost=0.00..22.70 rows=1270 width=36) -> Hash (cost=25.88..25.88 rows=6 width=4) -> Seq Scan on test2 (cost=0.00..25.88 rows=6 width=4) Filter: ((status)::text = 'on'::text) (8 rows) OFFSET 0 in the view definition would serve too, I think, but I didn't try it. * Wrap the possibly-error-causing function in a VOLATILE wrapper function: =# create or replace function mytodate(varchar) returns date as $$ select to_date($1, 'YYYYMMDD')$$ language sql volatile; CREATE FUNCTION =# explain verbose select * from (select mytodate(c_data) as date from test_view) t1 where t1.date >= '2020-10-01':: TIMESTAMP; QUERY PLAN ------------------------------------------------------------------------------- Subquery Scan on t1 (cost=25.95..54.36 rows=13 width=4) Output: t1.date Filter: (t1.date >= '2020-10-01 00:00:00'::timestamp without time zone) -> Hash Join (cost=25.95..53.89 rows=38 width=4) Output: to_date((test1.c_data)::text, 'YYYYMMDD'::text) Hash Cond: (test1.id = test2.id) -> Seq Scan on public.test1 (cost=0.00..22.70 rows=1270 width=36) Output: test1.id, test1.c_data -> Hash (cost=25.88..25.88 rows=6 width=4) Output: test2.id -> Seq Scan on public.test2 (cost=0.00..25.88 rows=6 width=4) Output: test2.id Filter: ((test2.status)::text = 'on'::text) (13 rows) The presence of a volatile function in the subquery's output list has about the same effect as OFFSET 0, preventing the WHERE condition from being pushed down past it. In both cases you're significantly de-optimizing your query, so these answers might not be very satisfactory if you have a large amount of data to work with. I'd really suggest reconsidering your data design, or making the query a bit less fragile, perhaps with a sub-select expression along the lines of case when c_data ~ '^\d+$' then to_date(c_data, 'YYYYMMDD') else null end regards, tom lane
pgsql-bugs by date: