Re: Strange behavior of function date_trunc - Mailing list pgsql-general
From | Pavel Luzanov |
---|---|
Subject | Re: Strange behavior of function date_trunc |
Date | |
Msg-id | 1b32883d-4078-54d2-96ae-297178f8ea01@postgrespro.ru Whole thread Raw |
In response to | Re: Strange behavior of function date_trunc (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Strange behavior of function date_trunc
|
List | pgsql-general |
Hello, On 05.05.2021 16:55, Tomas Vondra wrote: > Well, it'd not like date_trunc is executed for each row while now() is > executed only once. The functions are executed for each row in both > cases... A couple more experiments. Since I can't to track usage of system functions, I decided to play with user defined stable function. In this case it is a wrapper for date_trunc: CREATE OR REPLACE FUNCTION user_date_trunc(field text, source timestamptz) RETURNS timestamptz AS $$ BEGIN RETURN date_trunc(field, source); END; $$ LANGUAGE plpgsql STABLE; SET track_functions = 'all'; SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc); Let's create a test table: CREATE TABLE t AS SELECT g.x, g.x::text AS y FROM generate_series('2021-01-01', '2021-06-01', '1 s'::interval) AS g(x) ORDER BY random(); VACUUM ANALYZE t; SET random_page_cost = 1.1; -- for ssd disks Previous query with user defined function: EXPLAIN (ANALYZE, SETTINGS) SELECT * FROM t WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..3520631.00 rows=13045175 width=31) (actual time=63.977..7157.852 rows=13046401 loops=1) Filter: (x >= user_date_trunc('year'::text, '2021-05-06 00:00:00+03'::timestamp with time zone)) Settings: random_page_cost = '1.1' Planning Time: 0.373 ms JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.404 ms, Inlining 53.122 ms, Optimization 7.100 ms, Emission 3.600 ms, Total 64.227 ms Execution Time: 7502.564 ms SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc; funcname | calls -----------------+---------- user_date_trunc | 13046402 The function is executed once for each row in the table +1. As you explained earlier. Ok. But when there is an index on the column, the situation changes: CREATE INDEX ON t(x); ANALYZE t; SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc); EXPLAIN (ANALYZE, SETTINGS) SELECT * FROM t WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Index Scan using t_x_idx on t (cost=0.69..373188.48 rows=13046480 width=31) (actual time=3.163..26476.012 rows=13046401 loops=1) Index Cond: (x >= user_date_trunc('year'::text, '2021-05-06 00:00:00+03'::timestamp with time zone)) Settings: random_page_cost = '1.1' Planning Time: 0.294 ms JIT: Functions: 3 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.551 ms, Inlining 0.000 ms, Optimization 0.380 ms, Emission 2.587 ms, Total 3.518 ms Execution Time: 26787.377 ms SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc; funcname | calls -----------------+------- user_date_trunc | 2 Only two times! Does having an index allow the function value to be cached? By the way, an index is used to access the table, although seq scan are much better. Now with scalar subquery: EXPLAIN (ANALYZE, SETTINGS) SELECT * FROM t WHERE t.x >= (SELECT user_date_trunc('year', '2021-05-06'::timestamptz)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using t_x_idx on t (cost=0.70..194745.67 rows=4348827 width=31) (actual time=3.000..26543.395 rows=13046401 loops=1) Index Cond: (x >= $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.26 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1) Settings: random_page_cost = '1.1' Planning Time: 0.071 ms JIT: Functions: 4 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.671 ms, Inlining 0.000 ms, Optimization 0.188 ms, Emission 2.583 ms, Total 3.441 ms Execution Time: 26875.105 ms SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = 'user_date_trunc'::regproc; funcname | calls -----------------+------- user_date_trunc | 3 The function is executed once, but the index is still in use. The second question. What is the reason for choosing an index scan? Seq scan with constant value only: EXPLAIN (ANALYZE, SETTINGS) SELECT * FROM t WHERE t.x >= '2021-01-01'::timestamptz; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..259011.00 rows=13046480 width=31) (actual time=2.135..913.628 rows=13046401 loops=1) Filter: (x >= '2021-01-01 00:00:00+03'::timestamp with time zone) Settings: random_page_cost = '1.1' Planning Time: 0.140 ms JIT: Functions: 2 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.392 ms, Inlining 0.000 ms, Optimization 0.163 ms, Emission 1.837 ms, Total 2.391 ms Execution Time: 1195.985 ms The statistics on t.x shows that the condition in the query is not selective and seq scan are preferred over index scan. SELECT n_distinct, (histogram_bounds::text::text[])[1] FROM pg_stats WHERE tablename = 't' AND attname = 'x'; n_distinct | histogram_bounds ------------+------------------------ -1 | 2021-01-01 00:01:10+03 -- Pavel Luzanov Postgres Professional: https://postgrespro.com The Russian Postgres Company
pgsql-general by date: