Thread: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?
I have an abstract solution for a problem in postgresql's handling of what should be constant data. We had problem with a query taking way too long, basically we had this: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp ; turning it into: select date_part('hour',t_date) as hour, transval as val from st where id = 500 AND hit_date >= '2000-12-07 14:27:24-08'::timestamp AND hit_date <= '2000-12-07 14:27:24-08'::timestamp ; (doing the -24 hours seperately) The values of cost went from: (cost=0.00..127.24 rows=11 width=12) to: (cost=0.00..4.94 rows=1 width=12) By simply assigning each sql "function" a taint value for constness one could easily reduce: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan to: '2000-12-07 14:27:24-08'::timestamp by applying the expression and rewriting the query. Each function should have a marker that explains whether when given a const input if the output might vary, that way subexpressions can be collapsed until an input becomes non-const. Here, let's break up: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan What we have is: timestamp(const) - timespan(const) we have timestamp defined like so: const timestamp(const string) non-const timestamp(non-const) and timespan like so: const timespan(const string) non-const timespan(non-const) So now we have: const timestamp((const string)'2000-12-07 14:27:24-08')- const timespan((const string)'24 hours') ----------------------------------------------------------- const- const ---------------- const then eval the query. You may want to allow a function to have a hook where it can eval a const because depending on the const it may or may not be able to return a const, for instance if some string you passed to timestamp() caused it to return non-const data. Or maybe this is fixed in 7.1? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> We had problem with a query taking way too long, basically > we had this: > > select > date_part('hour',t_date) as hour, > transval as val > from st > where > id = 500 > AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24 > hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp > ; > > turning it into: > > select > date_part('hour',t_date) as hour, > transval as val > from st > where > id = 500 > AND hit_date >= '2000-12-07 14:27:24-08'::timestamp > AND hit_date <= '2000-12-07 14:27:24-08'::timestamp > ; Perhaps I'm being daft, but why should hit_date be both >= and <= the exact same time and date? (or did you mean to subtract 24 hours from your example and forgot?) > (doing the -24 hours seperately) > > The values of cost went from: > (cost=0.00..127.24 rows=11 width=12) > to: > (cost=0.00..4.94 rows=1 width=12) > > By simply assigning each sql "function" a taint value for constness > one could easily reduce: > '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan > to: > '2000-12-07 14:27:24-08'::timestamp You mean '2000-12-06', don't you? > Each function should have a marker that explains whether when given a > const input if the output might vary, that way subexpressions can be > collapsed until an input becomes non-const. There is "with (iscachable)". Does CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS 'SELECT $1-''24 hours''::interval' WITH (iscachable) work faster? -- Joel Burton, Director of Information Systems -*- jburton@scw.org Support Center of Washington (www.scw.org)
Alfred Perlstein <bright@wintelcom.net> writes: > Each function should have a marker that explains whether when given > a const input if the output might vary, that way subexpressions can > be collapsed until an input becomes non-const. We already have that and do that. The reason the datetime-related routines are generally not marked 'proiscachable' is that there's this weird notion of a CURRENT time value, which means that the result of a datetime calculation may vary depending on when you do it, even though the inputs don't. Note that CURRENT here does not mean translating 'now' to current time during input conversion, it's a special-case data value inside the system. I proposed awhile back (see pghackers thread "Constant propagation and similar issues" from mid-September) that we should eliminate the CURRENT concept, so that datetime calculations can be constant-folded safely. That, um, didn't meet with universal approval... but I still think it would be a good idea. In the meantime you can cheat by defining functions that you choose to mark ISCACHABLE, as has been discussed several times in the archives. regards, tom lane
* Joel Burton <jburton@scw.org> [001207 15:52] wrote: > > We had problem with a query taking way too long, basically > > we had this: > > > > select > > date_part('hour',t_date) as hour, > > transval as val > > from st > > where > > id = 500 > > AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24 > > hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp > > ; > > > > turning it into: > > > > select > > date_part('hour',t_date) as hour, > > transval as val > > from st > > where > > id = 500 > > AND hit_date >= '2000-12-07 14:27:24-08'::timestamp > > AND hit_date <= '2000-12-07 14:27:24-08'::timestamp > > ; > > Perhaps I'm being daft, but why should hit_date be both >= and <= > the exact same time and date? (or did you mean to subtract 24 > hours from your example and forgot?) Yes, typo. > > (doing the -24 hours seperately) > > > > The values of cost went from: > > (cost=0.00..127.24 rows=11 width=12) > > to: > > (cost=0.00..4.94 rows=1 width=12) > > > > By simply assigning each sql "function" a taint value for constness > > one could easily reduce: > > '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan > > to: > > '2000-12-07 14:27:24-08'::timestamp > > You mean '2000-12-06', don't you? Yes, typo. :) > > > Each function should have a marker that explains whether when given a > > const input if the output might vary, that way subexpressions can be > > collapsed until an input becomes non-const. > > There is "with (iscachable)". > > Does > > CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS > 'SELECT $1-''24 hours''::interval' WITH (iscachable) > > work faster? It could be, but it could be done in the sql compiler/planner explicitly to save me from myself, no? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
* Tom Lane <tgl@sss.pgh.pa.us> [001207 16:45] wrote: > Alfred Perlstein <bright@wintelcom.net> writes: > > Each function should have a marker that explains whether when given > > a const input if the output might vary, that way subexpressions can > > be collapsed until an input becomes non-const. > > We already have that and do that. > > The reason the datetime-related routines are generally not marked > 'proiscachable' is that there's this weird notion of a CURRENT time > value, which means that the result of a datetime calculation may > vary depending on when you do it, even though the inputs don't. > > Note that CURRENT here does not mean translating 'now' to current > time during input conversion, it's a special-case data value inside > the system. > > I proposed awhile back (see pghackers thread "Constant propagation and > similar issues" from mid-September) that we should eliminate the CURRENT > concept, so that datetime calculations can be constant-folded safely. > That, um, didn't meet with universal approval... but I still think it > would be a good idea. I agree with you that doing anything to be able to fold these would be nice. However there's a hook mentioned in my abstract that explains that if a constant makes it into a function, you can provide a hook so that the function can return whether or not that constant is cacheable. If the date functions used that hook to get a glimpse of the constant data passed in, they could return 'cachable' if it doesn't contain the 'CURRENT' stuff you're talking about. something like this could be called on input to "maybe-cachable" functions: int date_cachable_hook(const char *datestr) { if (strcasecmp("current", datestr) == 0) return (UNCACHEABLE);return (CACHEABLE); } Or maybe I'm missunderstanding what CURRENT implies? I do see that on: http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html both you and Thomas Lockhart agree that CURRENT is a broken concept because it can cause btree inconsistancies and should probably be removed anyway. No one seems to dispute that, and then the thread leads off into discussions about optimizer hints. > In the meantime you can cheat by defining functions that you choose > to mark ISCACHABLE, as has been discussed several times in the archives. Yes, but it doesn't help the niave user (me :) ) much. :( Somehow I doubt that if 'CURRENT' was ifdef'd people would complain. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
Alfred Perlstein <bright@wintelcom.net> writes: > ... However there's a hook mentioned in my abstract that > explains that if a constant makes it into a function, you can > provide a hook so that the function can return whether or not that > constant is cacheable. Oh, I see --- you're right, I missed that part of your proposal. I dunno ... if we had more than one example of a case where this was needed (and if that example weren't demonstrably broken for other reasons), maybe that'd be worth doing. But it seems like a lot of mechanism to add to solve a problem we shouldn't have anyway. > I do see that on: > http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html > both you and Thomas Lockhart agree that CURRENT is a broken concept > because it can cause btree inconsistancies and should probably be > removed anyway. I had forgotten the btree argument, actually ... thanks for reminding me! I think it's too late to do anything about this for 7.1, in any case, but I'll put removing CURRENT back on the front burner for 7.2. regards, tom lane