Thread: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
Alfred Perlstein
Date:
I have an abstract solution for a problem in postgresql's
handling of what should be constant data.

We had problem with a query taking way too long, basically
we had this:

select date_part('hour',t_date) as hour, transval as val
from st
where id = 500  AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan AND hit_date <= '2000-12-07
14:27:24-08'::timestamp
;

turning it into:

select date_part('hour',t_date) as hour, transval as val
from st
where id = 500  AND hit_date >= '2000-12-07 14:27:24-08'::timestamp AND hit_date <= '2000-12-07
14:27:24-08'::timestamp
;

(doing the -24 hours seperately)

The values of cost went from:
(cost=0.00..127.24 rows=11 width=12)
to:
(cost=0.00..4.94 rows=1 width=12)

By simply assigning each sql "function" a taint value for constness
one could easily reduce: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
to: '2000-12-07 14:27:24-08'::timestamp
by applying the expression and rewriting the query.

Each function should have a marker that explains whether when given
a const input if the output might vary, that way subexpressions can
be collapsed until an input becomes non-const.

Here, let's break up: '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan

What we have is:  timestamp(const) - timespan(const)

we have timestamp defined like so:
const timestamp(const string)
non-const timestamp(non-const)

and timespan like so:
const timespan(const string)
non-const timespan(non-const)

So now we have:  const timestamp((const string)'2000-12-07 14:27:24-08')- const timespan((const string)'24 hours')
-----------------------------------------------------------  const- const
----------------  const

then eval the query.

You may want to allow a function to have a hook where it can
eval a const because depending on the const it may or may not
be able to return a const, for instance if some string
you passed to timestamp() caused it to return non-const data.

Or maybe this is fixed in 7.1?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
"Joel Burton"
Date:
> We had problem with a query taking way too long, basically
> we had this:
> 
> select
>   date_part('hour',t_date) as hour,
>   transval as val
> from st
> where
>   id = 500 
>   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24
>   hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> ;
> 
> turning it into:
> 
> select
>   date_part('hour',t_date) as hour,
>   transval as val
> from st
> where
>   id = 500 
>   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp
>   AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> ;

Perhaps I'm being daft, but why should hit_date be both >= and <= 
the exact same time and date? (or did you mean to subtract 24 
hours from your example and forgot?)

> (doing the -24 hours seperately)
> 
> The values of cost went from:
> (cost=0.00..127.24 rows=11 width=12)
> to:
> (cost=0.00..4.94 rows=1 width=12)
> 
> By simply assigning each sql "function" a taint value for constness
> one could easily reduce:
>   '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
> to:
>   '2000-12-07 14:27:24-08'::timestamp

You mean '2000-12-06', don't you?

> Each function should have a marker that explains whether when given a
> const input if the output might vary, that way subexpressions can be
> collapsed until an input becomes non-const.

There is "with (iscachable)".

Does

CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS
'SELECT $1-''24 hours''::interval' WITH (iscachable)

work faster?

--
Joel Burton, Director of Information Systems -*- jburton@scw.org
Support Center of Washington (www.scw.org)


Re: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
Tom Lane
Date:
Alfred Perlstein <bright@wintelcom.net> writes:
> Each function should have a marker that explains whether when given
> a const input if the output might vary, that way subexpressions can
> be collapsed until an input becomes non-const.

We already have that and do that.

The reason the datetime-related routines are generally not marked
'proiscachable' is that there's this weird notion of a CURRENT time
value, which means that the result of a datetime calculation may
vary depending on when you do it, even though the inputs don't.

Note that CURRENT here does not mean translating 'now' to current
time during input conversion, it's a special-case data value inside
the system.

I proposed awhile back (see pghackers thread "Constant propagation and
similar issues" from mid-September) that we should eliminate the CURRENT
concept, so that datetime calculations can be constant-folded safely.
That, um, didn't meet with universal approval... but I still think it
would be a good idea.

In the meantime you can cheat by defining functions that you choose
to mark ISCACHABLE, as has been discussed several times in the archives.
        regards, tom lane


Re: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
Alfred Perlstein
Date:
* Joel Burton <jburton@scw.org> [001207 15:52] wrote:
> > We had problem with a query taking way too long, basically
> > we had this:
> > 
> > select
> >   date_part('hour',t_date) as hour,
> >   transval as val
> > from st
> > where
> >   id = 500 
> >   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp - '24
> >   hours'::timespan AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> > ;
> > 
> > turning it into:
> > 
> > select
> >   date_part('hour',t_date) as hour,
> >   transval as val
> > from st
> > where
> >   id = 500 
> >   AND hit_date >= '2000-12-07 14:27:24-08'::timestamp
> >   AND hit_date <= '2000-12-07 14:27:24-08'::timestamp
> > ;
> 
> Perhaps I'm being daft, but why should hit_date be both >= and <= 
> the exact same time and date? (or did you mean to subtract 24 
> hours from your example and forgot?)

Yes, typo.

> > (doing the -24 hours seperately)
> > 
> > The values of cost went from:
> > (cost=0.00..127.24 rows=11 width=12)
> > to:
> > (cost=0.00..4.94 rows=1 width=12)
> > 
> > By simply assigning each sql "function" a taint value for constness
> > one could easily reduce:
> >   '2000-12-07 14:27:24-08'::timestamp - '24 hours'::timespan
> > to:
> >   '2000-12-07 14:27:24-08'::timestamp
> 
> You mean '2000-12-06', don't you?

Yes, typo. :)

> 
> > Each function should have a marker that explains whether when given a
> > const input if the output might vary, that way subexpressions can be
> > collapsed until an input becomes non-const.
> 
> There is "with (iscachable)".
> 
> Does
> 
> CREATE FUNCTION YESTERDAY(timestamp) RETURNS timestamp AS
> 'SELECT $1-''24 hours''::interval' WITH (iscachable)
> 
> work faster?

It could be, but it could be done in the sql compiler/planner
explicitly to save me from myself, no?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [001207 16:45] wrote:
> Alfred Perlstein <bright@wintelcom.net> writes:
> > Each function should have a marker that explains whether when given
> > a const input if the output might vary, that way subexpressions can
> > be collapsed until an input becomes non-const.
> 
> We already have that and do that.
> 
> The reason the datetime-related routines are generally not marked
> 'proiscachable' is that there's this weird notion of a CURRENT time
> value, which means that the result of a datetime calculation may
> vary depending on when you do it, even though the inputs don't.
> 
> Note that CURRENT here does not mean translating 'now' to current
> time during input conversion, it's a special-case data value inside
> the system.
> 
> I proposed awhile back (see pghackers thread "Constant propagation and
> similar issues" from mid-September) that we should eliminate the CURRENT
> concept, so that datetime calculations can be constant-folded safely.
> That, um, didn't meet with universal approval... but I still think it
> would be a good idea.

I agree with you that doing anything to be able to fold these would
be nice.  However there's a hook mentioned in my abstract that
explains that if a constant makes it into a function, you can
provide a hook so that the function can return whether or not that
constant is cacheable.

If the date functions used that hook to get a glimpse of the constant
data passed in, they could return 'cachable' if it doesn't contain
the 'CURRENT' stuff you're talking about.

something like this could be called on input to "maybe-cachable"
functions:

int
date_cachable_hook(const char *datestr)
{
if (strcasecmp("current", datestr) == 0)    return (UNCACHEABLE);return (CACHEABLE);
}

Or maybe I'm missunderstanding what CURRENT implies?

I do see that on: http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html

both you and Thomas Lockhart agree that CURRENT is a broken concept
because it can cause btree inconsistancies and should probably be
removed anyway.

No one seems to dispute that, and then the thread leads off into
discussions about optimizer hints.

> In the meantime you can cheat by defining functions that you choose
> to mark ISCACHABLE, as has been discussed several times in the archives.

Yes, but it doesn't help the niave user (me :) ) much. :(

Somehow I doubt that if 'CURRENT' was ifdef'd people would complain.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: abstract: fix poor constant folding in 7.0.x, fixed in 7.1?

From
Tom Lane
Date:
Alfred Perlstein <bright@wintelcom.net> writes:
> ... However there's a hook mentioned in my abstract that
> explains that if a constant makes it into a function, you can
> provide a hook so that the function can return whether or not that
> constant is cacheable.

Oh, I see --- you're right, I missed that part of your proposal.

I dunno ... if we had more than one example of a case where this was
needed (and if that example weren't demonstrably broken for other
reasons), maybe that'd be worth doing.  But it seems like a lot of
mechanism to add to solve a problem we shouldn't have anyway.

> I do see that on:
>   http://www.postgresql.org/mhonarc/pgsql-hackers/2000-09/msg00408.html
> both you and Thomas Lockhart agree that CURRENT is a broken concept
> because it can cause btree inconsistancies and should probably be
> removed anyway.

I had forgotten the btree argument, actually ... thanks for reminding
me!

I think it's too late to do anything about this for 7.1, in any case,
but I'll put removing CURRENT back on the front burner for 7.2.
        regards, tom lane