Re: Function-based index not used in a simple query - Mailing list pgsql-sql

From Tom Lane
Subject Re: Function-based index not used in a simple query
Date
Msg-id 3425.959722596@sss.pgh.pa.us
Whole thread Raw
In response to Function-based index not used in a simple query  (Rostislav Opocensky <orbis@pictus.org>)
Responses Re: Function-based index not used in a simple query
List pgsql-sql
Rostislav Opocensky <orbis@pictus.org> writes:
>     SELECT *
>         FROM vals
>         WHERE trunc_to_day(timestamp) = trunc_to_day('28.5.2000');
>
>     Seq Scan on vals  (cost=0.00..27.50 rows=10 width=44)

The problem here is that the optimizer will only consider an indexscan
for a clause that looks like index_key OP constant.  It doesn't think
that trunc_to_day('28.5.2000') is a constant, because you haven't told
it that it can pre-evaluate that function call --- and for all it knows,
the function might generate different values when evaluated at different
rows of the table, rendering an indexscan incorrect.  (Consider a
function like random() ...)  If you mark trunc_to_day() as a "cachable"
function then the system will know it's safe to collapse the righthand
side to a constant:

CREATE FUNCTION trunc_to_day(timestamp) RETURNS timestamp AS 'DECLARE tstamp ALIAS FOR $1;BEGIN    RETURN
date_trunc(''day'',tstamp);END;' LANGUAGE 'plpgsql'WITH (isCachable);
 

This is a little bit of a cheat for this function, because what
isCachable actually implies is that the function's value depends
on the presented arguments *and nothing else*.  I believe date_trunc
depends on the current TZ setting, so it's not really cachable.
You could possibly get away with it for your application, but you
might run into problems like this:

1. You write trunc_to_day('28.5.2000') in a table's column  default expression, stored rule, or some such.  The system
decidesit can fold that expression down to a simple constant,  giving you (say) 2000-05-20 00:00-04 --- ie, midnight in
whatever timezone you are using.  That value gets stored in  the column or rule definition.
 

2. You then access the database while running with a different  TZ setting.  You wonder why you don't get the answers
youexpected.
 

Come to think of it, your index on trunc_to_day() is fundamentally
risky because different backends running with different TZ settings
would generate different index entries for the "same" data entry.
For example, if you make an entry dated '28.5.2000' while running in
one timezone, and then try to query with "= trunc_to_day('28.5.2000')"
while running in another timezone, you won't match that entry if the
index is used, because the timestamp generated on-the-fly from
trunc_to_day('28.5.2000') won't equal the one stored in the index.

(Hmm ... I wonder if that suggests that we shouldn't allow indexes
to be built using functions that are not isCachable?)

You could avoid some of these risks by having trunc_to_day return
a 'date' rather than a 'timestamp', thereby removing at least some
of the timezone dependency from the index definition.
        regards, tom lane


pgsql-sql by date:

Previous
From: mikeo
Date:
Subject: Re: [GENERAL] Re: remove line type?
Next
From: Ryan Bradetich
Date:
Subject: Re: Use of index in 7.0 vs 6.5