Thread: [HACKERS] Function Volatility and Views Unexpected Behavior
David Kohn <djk447@gmail.com> writes: > I encountered some unexpected behavior when debugging a query that was > taking longer than expected, basically, a volatile function that makes a > column in a view is called even when that column is not selected in the > query, making it so that the function is called for every row in the view, > I'm not sure that that would necessarily be the expected behavior, as it > was my understanding that columns that are not selected are not evaluated, > for instance if there was a join in a view that produced some columns and > said columns were not selected, I would expect it to be optimized away. No, this is the expected behavior; we don't like optimization to change the number of calls of a volatile function from what would occur in naive evaluation of the query. If that prospect doesn't bother you, it's likely because your function isn't really volatile ... > The other problem is that the function call does not appear in the query > plan. I think "explain verbose" will fix that for you. regards, tom lane
`explain analyze verbose select data1 from table1_silly_view where id >=10 and id <= 100;`
does an index scan and only runs the volatile function for rows in the view where id >= 10 and id <=100
Subquery Scan on table1_silly_view (cost=0.29..33.77 rows=91 width=8) (actual time=2.552..206.563 rows=91 loops=1)
Output: table1_silly_view.data1
-> Index Scan using table1_pkey on public.table1 (cost=0.29..32.86 rows=91 width=20) (actual time=2.550..206.425 rows=91 loops=1)
Output: NULL::integer, table1.data1, something_silly(table1.id)
Index Cond: ((table1.id >= 10) AND (table1.id <= 100))
Planning time: 0.526 ms
Execution time: 206.724 ms
whereas
`explain analyze verbose select data1 from table1_silly_view where id in (select id from table1 where id >= 10 and id <=100);`
does a full sequential scan, over the view, producing whatever side effects the volatile function does for every row in the view even though they produce the same output and have what should be equivalent quals.
Hash Semi Join (cost=11.24..2793.50 rows=91 width=8) (actual time=23.603..22759.297 rows=91 loops=1)
Output: table1_1.data1
Hash Cond: (table1_1.id = table1.id)
-> Seq Scan on public.table1 table1_1 (cost=0.00..2655.00 rows=10000 width=20) (actual time=2.468..22720.942 rows=10000 loops=1)
Output: table1_1.id, table1_1.data1, something_silly(table1_1.id)
-> Hash (cost=10.11..10.11 rows=91 width=4) (actual time=0.484..0.484 rows=91 loops=1)
Output: table1.id
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Index Only Scan using table1_pkey on public.table1 (cost=0.29..10.11 rows=91 width=4) (actual time=0.383..0.430 rows=91 loops=1)
Output: table1.id
Index Cond: ((table1.id >= 10) AND (table1.id <= 100))
Heap Fetches: 91
Planning time: 0.877 ms
Execution time: 22759.448 ms
I recognize that it is an anti-pattern to put a volatile function call in a view, and don't know that there's a better way of dealing with it, as not using indexes in a view that has a volatile function call in it at all seems like a very bad choice, but still think it might be something to document better.
-David
David Kohn <djk447@gmail.com> writes:
> I encountered some unexpected behavior when debugging a query that was
> taking longer than expected, basically, a volatile function that makes a
> column in a view is called even when that column is not selected in the
> query, making it so that the function is called for every row in the view,
> I'm not sure that that would necessarily be the expected behavior, as it
> was my understanding that columns that are not selected are not evaluated,
> for instance if there was a join in a view that produced some columns and
> said columns were not selected, I would expect it to be optimized away.
No, this is the expected behavior; we don't like optimization to change
the number of calls of a volatile function from what would occur in naive
evaluation of the query. If that prospect doesn't bother you, it's
likely because your function isn't really volatile ...
> The other problem is that the function call does not appear in the query
> plan.
I think "explain verbose" will fix that for you.
regards, tom lane
On Wed, Jul 12, 2017 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Kohn <djk447@gmail.com> writes: >> I encountered some unexpected behavior when debugging a query that was >> taking longer than expected, basically, a volatile function that makes a >> column in a view is called even when that column is not selected in the >> query, making it so that the function is called for every row in the view, >> I'm not sure that that would necessarily be the expected behavior, as it >> was my understanding that columns that are not selected are not evaluated, >> for instance if there was a join in a view that produced some columns and >> said columns were not selected, I would expect it to be optimized away. > > No, this is the expected behavior; we don't like optimization to change > the number of calls of a volatile function from what would occur in naive > evaluation of the query. If that prospect doesn't bother you, it's > likely because your function isn't really volatile ... I don't think I agree with that. If something is VOLATILE, that means you want it to be recalculated each time, but it doesn't necessarily mean that you want it calculated if it in no way changes the result set. I guess maybe there's a difference between a VOLATILE function like random(), which is expected to produce a different answer each time but probably has no side effects that you care about (unless you care about the fact that the state of the PRNG has changed) and pg_sleep(), whose return value is always the same but whose side effects are of critical importance. Maybe we need separate terms for volatile-because-the-answer-is-unstable and volatile-because-it-has-side-effects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company