Thread: Why is the comparison between timestamp and date so much slower then between two dates
Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
I came across something strange today. Consider the following table: CREATE TABLE price_history ( product_id integer, valid_from date, valid_to date, price integer ); CREATE INDEX i1 ON price_history (product_id, valid_from, valid_to); The table contains 5 million rows and 5000 distinct product_ids To get the current price for each product I used the following query: select * from price_history where current_date between valid_from and valid_to; The execution plan is not really surprising: Index Scan using i1 on public.price_history (cost=0.44..61980.61 rows=5133 width=16) (actual time=0.177..527.693 rows=5000loops=1) Output: product_id, valid_from, price, valid_to Index Cond: ((('now'::cstring)::date >= ph.valid_from) AND (('now'::cstring)::date <= ph.valid_to)) Buffers: shared hit=24160 Planning time: 0.395 ms Execution time: 528.193 ms Now I tried the same query using now() instead of current_date, which shows exactly the same execution plan, but is 3 timesslower: Index Scan using i1 on public.price_history (cost=0.44..61980.60 rows=5133 width=16) (actual time=0.406..1902.241 rows=5000loops=1) Output: product_id, valid_from, price, valid_to Index Cond: ((now() >= ph.valid_from) AND (now() <= ph.valid_to)) Buffers: shared hit=24160 Planning time: 0.615 ms Execution time: 1902.777 ms The above plans were taken on my Windows laptop with Postgres 9.5.1 On a CentOS server with 9.5.0 I can see the same difference: Plan using now() Index Scan using i1 on public.price_history (cost=0.44..110570.50 rows=44944 width=16) (actual time=0.182..837.903 rows=5000loops=1) Output: product_id, valid_from, valid_to, price Index Cond: ((now() >= price_history.valid_from) AND (now() <= price_history.valid_to)) Buffers: shared hit=24160 Planning time: 0.106 ms Execution time: 838.529 ms Plan using current_date: Index Scan using i1 on public.price_history (cost=0.44..110570.51 rows=44944 width=16) (actual time=0.052..180.856 rows=5000loops=1) Output: product_id, valid_from, valid_to, price Index Cond: ((('now'::cstring)::date >= price_history.valid_from) AND (('now'::cstring)::date <= price_history.valid_to)) Buffers: shared hit=24160 Planning time: 0.115 ms Execution time: 181.226 ms (I don't know why the estimates on the CentOS installation are so different from the one on my laptop given that both tablescontain exactly the same data and were analyzed properly before running explain plan - but that is a different question). So my question is: why is comparing a timestamp to a date so much slower? Thomas
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes: > So my question is: why is comparing a timestamp to a date so much slower? The date has to be up-converted to a timestamptz (not timestamp). I think the expensive part of that is determining what timezone applies, in particular whether DST is active. You could try it with "localtimestamp" (no parens) instead of "now()" to see how it performs with a non-tz timestamp. regards, tom lane
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Alban Hertroys
Date:
On 13 April 2016 at 15:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Kellerer <spam_eater@gmx.net> writes: >> So my question is: why is comparing a timestamp to a date so much slower? > > The date has to be up-converted to a timestamptz (not timestamp). > I think the expensive part of that is determining what timezone > applies, in particular whether DST is active. You could try it > with "localtimestamp" (no parens) instead of "now()" to see how > it performs with a non-tz timestamp. > > regards, tom lane Or... you can manually down-convert the timestamptz now() to a date ;) That's basically what you're doing when you use current_date instead of now(). The reason that the other way around is so much more expensive is that the database needs to do that conversion twice for every row in the table. When down-converting now(), the DB only needs to do that once for all rows. Regards, Alban Hertroys. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
Tom Lane schrieb am 13.04.2016 um 15:45: >> So my question is: why is comparing a timestamp to a date so much slower? > > The date has to be up-converted to a timestamptz (not timestamp). > I think the expensive part of that is determining what timezone > applies, in particular whether DST is active. You could try it > with "localtimestamp" (no parens) instead of "now()" to see how > it performs with a non-tz timestamp. localtimestamp is indeed faster then now(), but still a bit slower then current_date (700ms vs 500ms after 5 runs for each) But as the value of now() won't change throughout the runtime of the statement (actually the transaction), I wonder why it is being converted for every row. Thomas
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
Alban Hertroys schrieb am 13.04.2016 um 16:39: >>> So my question is: why is comparing a timestamp to a date so much slower? > > The reason that the other way around is so much more expensive is that > the database needs to do that conversion twice for every row in the > table. When down-converting now(), the DB only needs to do that once > for all rows. Why does it do that for each row? The value of now() won't change while the statement is running, so this conversion could be done once at the start of the statement.
Re: Re: Why is the comparison between timestamp and date so much slower then between two dates
From
"Mike Sofen"
Date:
|-----Original Message----- |From: Thomas Kellerer Sent: Wednesday, April 13, 2016 11:37 PM | |Alban Hertroys schrieb am 13.04.2016 um 16:39: |>>> So my question is: why is comparing a timestamp to a date so much slower? |> |> The reason that the other way around is so much more expensive is that |> the database needs to do that conversion twice for every row in the |> table. When down-converting now(), the DB only needs to do that once |> for all rows. | |Why does it do that for each row? The value of now() won't change while the |statement is running, so this conversion could be done once at the start of the |statement. The general rule in the SQL Server world is that using a function in a Where clause or join will eliminate usage of an indexthat would have been leveraged if the function didn't exist. The reason is that functions are non-deterministic, sothe optimizer can't possibly tell in advance what the outcome will be and thus takes the safest route to completion. I'mbetting that the same logic holds in PG (I just haven't tested it enough to be absolutely sure). In the case of now() in the Where clause, to avoid the conversion/loss of index usage, I always place (what should be a staticvalue anyway) the output of now() into a local variable and then use that in the Where clause...and get my index back. This is just a style of coding (no functions in where clauses/joins), but one that doesn't seem prevalent in PG...insteadI see people using functions within functions within functions, the cascading impact of which becomes very hardto unravel. Mike Sofen
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
Mike Sofen schrieb am 14.04.2016 um 14:29: > The general rule in the SQL Server world is that using a function in > a Where clause or join will eliminate usage of an index that would > have been leveraged if the function didn't exist. The reason is that > functions are non-deterministic, so the optimizer can't possibly tell > in advance what the outcome will be and thus takes the safest route > to completion. > I'm betting that the same logic holds in PG (I just > haven't tested it enough to be absolutely sure). Well, this is only true if the function "hides" the value of a column, or if the function is not marked stable. A condition like: where x = some_function(42) can absolutely use an index on the column x (and I'm pretty sure this is true for SQL Server as well). You can even create an index on a function expression, so that something like where some_function(x) = 42 can make use of an index if that is defined as: on table_name((some_function(x))) (Something SQL Server can't do) You can only create such an index if the function is marked as "immutable" which basically says that when calling the same function twice with the same value it will return the exact same value: http://www.postgresql.org/docs/current/static/xfunc-volatility.html But in general I do agree that one should be very careful with conditions where the types don't match or where expressions are used that can't make use of an index. > In the case of now() in the Where clause, to avoid the > conversion/loss of index usage, I always place (what should be a > static value anyway) the output of now() into a local variable and > then use that in the Where clause...and get my index back. now() (and current_timestamp as well) are defined to return the same value throughout the entire transaction. So the optimizer _should_ be smart enough to do the conversion only once at the beginning of the statement and then use that converted value during the execution of the statement without the need to re-evaluate it for each row. But my question wasn't about whether it's a good idea to use a function in the where clause, but why there is such a huge(!) difference in performance between now() and current_date especially given the fact that both are only evaluated once. Thomas
Re: Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes: > So the optimizer _should_ be smart enough to do the conversion only > once at the beginning of the statement and then use that converted > value during the execution of the statement without the need > to re-evaluate it for each row. It's not; especially not in your originally posted case where the up-conversion happens on the variable not the pseudo-constant. regards, tom lane
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
Tom Lane schrieb am 14.04.2016 um 15:57: >> So the optimizer _should_ be smart enough to do the conversion only >> once at the beginning of the statement and then use that converted >> value during the execution of the statement without the need >> to re-evaluate it for each row. > > It's not; especially not in your originally posted case where the > up-conversion happens on the variable not the pseudo-constant. Which variable are you referring to? The original query only uses a simple between condition where the comparison value is a "constant" (the value of "now()")
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Alban Hertroys
Date:
> On 14 Apr 2016, at 15:12, Thomas Kellerer <spam_eater@gmx.net> wrote: > > now() (and current_timestamp as well) are defined to return the > same value throughout the entire transaction. > > So the optimizer _should_ be smart enough to do the conversion only > once at the beginning of the statement and then use that converted > value during the execution of the statement without the need > to re-evaluate it for each row. As I understand it, that's not how it works. If the optimizer would down-convert the value of now() from a timestamp to a date, it would lose precision, possibly resultingin wrong results for corner cases in general. For that reason, it chooses to do the opposite and up-converts the dates. But, because the dates are fields and not constants,it has to do so for every row. If that's indeed what happens, then indeed, now() gets evaluated only once, but the slow-down is caused by having to do conversions(for two field values) for every row. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Why is the comparison between timestamp and date so much slower then between two dates
From
Thomas Kellerer
Date:
Alban Hertroys schrieb am 14.04.2016 um 21:22: >> now() (and current_timestamp as well) are defined to return the >> same value throughout the entire transaction. >> >> So the optimizer _should_ be smart enough to do the conversion >> only once at the beginning of the statement and then use that >> converted value during the execution of the statement without the >> need to re-evaluate it for each row. > > As I understand it, that's not how it works. > > If the optimizer would down-convert the value of now() from a > timestamp to a date, it would lose precision, possibly resulting in > wrong results for corner cases in general. For that reason, it > chooses to do the opposite and up-converts the dates. But, because > the dates are fields and not constants, it has to do so for every > row. > > If that's indeed what happens, then indeed, now() gets evaluated only > once, but the slow-down is caused by having to do conversions (for > two field values) for every row. Ah, that makes sense.