Thread: Constant propagation and similar issues

Constant propagation and similar issues

From
Jules Bean
Date:
I'm sure your aware of these limitations, but I'd thought I'd mention
them just in case, and to see if you have plans to sort them out:

I have a query of the form:

SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day';

..i.e. all rows 'older' than 1 day.  This could be efficiently
processed using the index on date1, but sadly pg doesn't know this ;-(
This transforms an operation which should be O(1) to O(rows)....

More worryingly, when I investigated the above, I find it doesn't even
use the index for

SELECT * FROM .... 
WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval;

...so it doesn't realise that constant-constant is constant,
notwithstanding the more complex issues that now() is pseudo-constant.

This could be fixed by constant folding, I guess; any plans for that?

Jules


Re: Constant propagation and similar issues

From
Tom Lane
Date:
Jules Bean <jules@jellybean.co.uk> writes:
> I have a query of the form:
> SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day';
> ..i.e. all rows 'older' than 1 day.  This could be efficiently
> processed using the index on date1, but sadly pg doesn't know this ;-(

No, and I don't think it should.  Should we implement a general
algebraic equation solver, and fire it up for every single query,
in order to see if the user has written an indexable condition in
a peculiar form?  I don't think we want to expend either the development
effort or the runtime on that.  If you are concerned about performance
of this sort of query, you'll need to transform it to
SELECT * FROM .... WHERE date1 < now() - interval '1 day';

Of course that still leaves you with problem (b),

> SELECT * FROM .... 
> WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval;

> ...so it doesn't realise that constant-constant is constant,
> notwithstanding the more complex issues that now() is pseudo-constant.

Most of the datetime operations are not considered constant-foldable.
The reason is that type timestamp has a special value CURRENT that
is a symbolic representation of current time (this is NOT what now()
produces, but might be thought of as a data-driven way of invoking
now()).  This value will get reduced to a simple constant when it is
fed into an arithmetic operation.  Hence, premature evaluation changes
the results and would not be a correct optimization.

AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
proposing that we eliminate it to make the world safe for constant-
folding timestamp operations.  (Thomas, any comments here?)

In the meantime, there is a workaround that's been discussed on the
mailing lists before --- create a function that hides the
"unsafe-to-fold" operations and mark it iscachable:
create function ago(interval) returns timestamp as'select now() - $1' language 'sql' with (iscachable);

Then something like
SELECT * FROM .... WHERE date1 < ago('1 day');

will be considered indexable.  You can shoot yourself in the foot with
this --- don't try to write ago(constant) in a rule or function
definition --- but in interactive queries it'll get the job done.
        regards, tom lane


Re: Constant propagation and similar issues

From
"Ross J. Reedstrom"
Date:
On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote:
> 
> Most of the datetime operations are not considered constant-foldable.
> The reason is that type timestamp has a special value CURRENT that
> is a symbolic representation of current time (this is NOT what now()
> produces, but might be thought of as a data-driven way of invoking
> now()).  This value will get reduced to a simple constant when it is
> fed into an arithmetic operation.  Hence, premature evaluation changes
> the results and would not be a correct optimization.
> 
> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
> proposing that we eliminate it to make the world safe for constant-
> folding timestamp operations.  (Thomas, any comments here?)
> 

I checked the ansi SQL'99 docs, and CURRENT as a date special constant
is not a part of the standard (although CURRENT is a keyword: it is 
used in the context of cursors)

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: Constant propagation and similar issues

From
Jules Bean
Date:
On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote:
> Jules Bean <jules@jellybean.co.uk> writes:
> > I have a query of the form:
> > SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day';
> > ..i.e. all rows 'older' than 1 day.  This could be efficiently
> > processed using the index on date1, but sadly pg doesn't know this ;-(
> 
> No, and I don't think it should.  Should we implement a general
> algebraic equation solver, and fire it up for every single query,
> in order to see if the user has written an indexable condition in
> a peculiar form?  I don't think we want to expend either the development
> effort or the runtime on that.  If you are concerned about performance
> of this sort of query, you'll need to transform it to
> 
>     SELECT * FROM .... WHERE date1 < now() - interval '1 day';

Well, I shall speak quietly and timidly, for I'm not offering to do
the work, and I respect that other tasks are both more interesting and
more important. However, it does seem to me that PostgreSQL /should/
be able to make these transformations (at least, it should IMO
recognise that given an expression of the form a + b - c + d < e - f
+ g where exactly one of a..g is a column name, and the rest are
constant, that is a candidate for using the index).

> 
> Of course that still leaves you with problem (b),
> 
> > SELECT * FROM .... 
> > WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval;
> 
> > ...so it doesn't realise that constant-constant is constant,
> > notwithstanding the more complex issues that now() is pseudo-constant.
> 
> Most of the datetime operations are not considered constant-foldable.
> The reason is that type timestamp has a special value CURRENT that
> is a symbolic representation of current time (this is NOT what now()
> produces, but might be thought of as a data-driven way of invoking
> now()).  This value will get reduced to a simple constant when it is
> fed into an arithmetic operation.  Hence, premature evaluation changes
> the results and would not be a correct optimization.
> 
> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
> proposing that we eliminate it to make the world safe for constant-
> folding timestamp operations.  (Thomas, any comments here?)

Yes.  I came across CURRENT in some examples somewhere, got very
confused, decided I didn't like it, and used now() instead ;-)  I now
understand the problem.  Personally, I'm thinking drop CURRENT, but
only because I've never used it myself...

> 
> In the meantime, there is a workaround that's been discussed on the
> mailing lists before --- create a function that hides the
> "unsafe-to-fold" operations and mark it iscachable:
> 
>     create function ago(interval) returns timestamp as
>     'select now() - $1' language 'sql' with (iscachable);
> 
> Then something like
> 
>     SELECT * FROM .... WHERE date1 < ago('1 day');
> 
> will be considered indexable.  You can shoot yourself in the foot with
> this --- don't try to write ago(constant) in a rule or function
> definition --- but in interactive queries it'll get the job done.

Thanks very much.  I shall try that.

Jules


Re: Constant propagation and similar issues

From
Jules Bean
Date:
On Mon, Sep 11, 2000 at 12:22:39PM -0400, Tom Lane wrote:
> Jules Bean <jules@jellybean.co.uk> writes:
> > However, it does seem to me that PostgreSQL /should/
> > be able to make these transformations (at least, it should IMO
> > recognise that given an expression of the form a + b - c + d < e - f
> > + g where exactly one of a..g is a column name, and the rest are
> > constant, that is a candidate for using the index).
> 
> Mumble.  I think that'd be a very difficult thing to do without losing
> the datatype extensibility of the system.  Right now, the only reason
> that "a < b" is considered indexable is that the optimizer has a table
> that tells it "<" is an indexable operator for btree indexes with
> certain datatypes (opclasses).  Neither the optimizer nor the btree code
> has any real understanding of the relationships between "<" and "-", say.
> There is no part of the system anywhere with understanding of algebraic
> identities like "a - b < c can be transformed to a < b + c", and no way
> I can see to add such knowledge without making it *substantially* harder
> to add new datatypes and operators.

Yes, actually something like this occurred to me after I sent the
above email.  I had forgotten about the (rather pretty) extensible
type system; I can see that makes spotting optimisations such as the
above much more difficult. Seems like it might make a nice subject for
a paper, actually.

> 
> Between that and the runtime that would be wasted during typical queries
> (IMHO searching for rearrangeable clauses would usually be fruitless),
> I really doubt that this is a good goal to pursue in Postgres.

I'm afraid I can't buy that second argument ;-) The time it takes to
optimise a query is asymptotically irrelevant, after all...

Jules



Re: Constant propagation and similar issues

From
"Ross J. Reedstrom"
Date:
On Mon, Sep 11, 2000 at 10:47:04AM -0500, Ross J. Reedstrom wrote:
> On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote:
> > 
> > Most of the datetime operations are not considered constant-foldable.
> > The reason is that type timestamp has a special value CURRENT that
> > is a symbolic representation of current time (this is NOT what now()
> > produces, but might be thought of as a data-driven way of invoking
> > now()).  This value will get reduced to a simple constant when it is
> > fed into an arithmetic operation.  Hence, premature evaluation changes
> > the results and would not be a correct optimization.
> > 
> > AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
> > proposing that we eliminate it to make the world safe for constant-
> > folding timestamp operations.  (Thomas, any comments here?)
> > 
> 
> I checked the ansi SQL'99 docs, and CURRENT as a date special constant
> is not a part of the standard (although CURRENT is a keyword: it is 
> used in the context of cursors)
> 

Following up to myself: 

Ah, I had forgotten that CURRENT is a magic value, like 'infinity'.

The standard does specify in section 6.19:

CURRENT_DATE, CURRENT_TIME, LOCALTIME, CURRENT_TIMESTAMP, and LOCALTIMESTAMP

as <datetime value function>

Which are currently implemented as generating the special value 'CURRENT',
which then get's stored in the column. This strikes me as _not_ standards
compliant.  What do other DB's do with these? I think that they should
be equivalent to now(), returning a static date that is stored.

I do find the timestamp special values 'infinity' and '-infinity' very
useful, but have never found a use for 'current'.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: Constant propagation and similar issues

From
Tom Lane
Date:
Jules Bean <jules@jellybean.co.uk> writes:
> However, it does seem to me that PostgreSQL /should/
> be able to make these transformations (at least, it should IMO
> recognise that given an expression of the form a + b - c + d < e - f
> + g where exactly one of a..g is a column name, and the rest are
> constant, that is a candidate for using the index).

Mumble.  I think that'd be a very difficult thing to do without losing
the datatype extensibility of the system.  Right now, the only reason
that "a < b" is considered indexable is that the optimizer has a table
that tells it "<" is an indexable operator for btree indexes with
certain datatypes (opclasses).  Neither the optimizer nor the btree code
has any real understanding of the relationships between "<" and "-", say.
There is no part of the system anywhere with understanding of algebraic
identities like "a - b < c can be transformed to a < b + c", and no way
I can see to add such knowledge without making it *substantially* harder
to add new datatypes and operators.

Between that and the runtime that would be wasted during typical queries
(IMHO searching for rearrangeable clauses would usually be fruitless),
I really doubt that this is a good goal to pursue in Postgres.
        regards, tom lane


Re: Constant propagation and similar issues

From
Tom Lane
Date:
Jules Bean <jules@jellybean.co.uk> writes:
> Presumably then, the optimizer doesn't even know that + is
> commutative, so it can't fold the constant in (5 + a + 4).

Actually, it does know that + is commutative, courtesy of the oprcom
column in pg_operator.  But it doesn't know that + is associative,
which is something you must also assume to transform 5 + a + 4
(really (5 + a) + 4 in the parser output) into a + (5 + 4).

Since, in general, the associative law does NOT hold for computer
arithmetic (think about intermediate-result overflow, not to mention
roundoff error in floating-point cases), I'm hesitant to want to put
such assumptions in.

But the more serious problem is the search space involved in discovering
that there is a path of manipulations that leads to a more completely
reducible expression.  With a large expression that could be an
exponential time cost --- even if the resultant savings is zero.

I do not buy your previous comment that the cost of optimization is
negligible.  We've already had to put in heuristics to prevent the
system from blowing up in cnfify() for moderately large WHERE clauses
--- it used to be that a few dozen OR (a=1 AND b=2) OR (a=4 AND b=5)
kinds of clauses would bring the optimizer to its knees.  And that's
for a well-understood deterministic simplification that affects only
AND/OR/NOT operators, with no search involved to decide what to do
with them.  What you're proposing would affect many more operators
and require a combinatorial search to see what could be done to the
expression.

So I stand by my opinion that this isn't likely to be a profitable
path to pursue.
        regards, tom lane


Re: Constant propagation and similar issues

From
Thomas Lockhart
Date:
> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
> proposing that we eliminate it to make the world safe for constant-
> folding timestamp operations.  (Thomas, any comments here?)

Well, it is a feature from "the old days". Pretty neat one at that, and
is an example of a useful feature not found in other DBs or in
standards, but which might show up someday because they are useful.
Throwing those things away one at a time will end us up at the lowest
common denominator, eventually :(

Another way of looking at the problem is to ask how we could retain this
feature in the face of the other optimization "desirements". istm that
types which have multiple behaviors could be queried for the behavior of
a particular example by the optimizer. For most types, a "query" would
not be necessary (so there is minimal overhead), but for this case a
function could return the property of an example as either cachable or
not.

Perhaps a true "serial type" would need similar behaviors, as might
other future types.
                  - Thomas


Re: Constant propagation and similar issues

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
>> proposing that we eliminate it to make the world safe for constant-
>> folding timestamp operations.  (Thomas, any comments here?)

> Well, it is a feature from "the old days". Pretty neat one at that, and
> is an example of a useful feature not found in other DBs or in
> standards, but which might show up someday because they are useful.

I'm not convinced that it is useful.  What I think it is is a good way
of shooting yourself in the foot, because it's so hard to control when
'CURRENT' will be reduced to a specific time value.

I have no problem with the datetime input converters accepting the input
string 'CURRENT' and immediately replacing it with the current time.
That behavior is clearly useful and creates no semantic issues.  But
I don't think that a special data value that symbolically represents
current time is either useful or well-defined.

Just to give one example of why the concept is broken: consider an index
on a timestamp column that contains some CURRENT values.  Today the
index might look like2000-01-01 11:33:05-042000-09-17 14:39:44-04CURRENT2000-09-18 14:11:07-04
which is fine.  But twenty-four hours from now, this index will be out
of order and hence broken.  (The btree routines do not cope at all
gracefully with logically-inconsistent indexes.)

So I still recommend that we remove the special value CURRENT.  Then we
can mark the datetime-related operators constant-foldable, which will
eliminate a complaint that we can otherwise expect to hear constantly
(saw another instance today in pgsql-sql).
        regards, tom lane


Re: Constant propagation and similar issues

From
Philip Warner
Date:
At 16:37 11/09/00 +0000, Thomas Lockhart wrote:
>> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of
>> proposing that we eliminate it to make the world safe for constant-
>> folding timestamp operations.  (Thomas, any comments here?)
>
>Well, it is a feature from "the old days". Pretty neat one at that, and
>is an example of a useful feature not found in other DBs or in
>standards, but which might show up someday because they are useful.
>Throwing those things away one at a time will end us up at the lowest
>common denominator, eventually :(

Well, Dec RDB has it: the concept of a 'computed by' field. You can define
view-like elements in a table, eg:
 Create Table t1 (    f1 integer,    f2 real,    f12_avg computed by (f1 + f2)/2,    f4 computed by
current_timestamp).

It's actually quite a useful feature, but unlike PGSQL, Dec RDB does not
allow indexes to be created on the fields. Clearly one can just define a
view to get similar results, but it is not as clean.


>Another way of looking at the problem is to ask how we could retain this
>feature in the face of the other optimization "desirements". istm that
>types which have multiple behaviors could be queried for the behavior of
>a particular example by the optimizer. For most types, a "query" would
>not be necessary (so there is minimal overhead), but for this case a
>function could return the property of an example as either cachable or
>not.
>
>Perhaps a true "serial type" would need similar behaviors, as might
>other future types.

This seems like a nice and extensible idea - allowing types to define a
function for testing constancy. At least to allow:
   field < (current_timestamp - interval '1 hour')

to use an index properly.

I guess an alternative would be to define a special 'iscacheable' function:
eg.
  function constant(arg <all-types>) returns <same-type-as-arg>

so we could have:
   field < constant(current_timestamp - interval '1 hour')

although this looks suspiciously like an optimizer hint which I think
people have been opposed to in the past...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Constant propagation and similar issues

From
Philip Warner
Date:
At 19:05 18/09/00 +1000, Philip Warner wrote:
>
>I guess an alternative would be to define a special 'iscacheable' function:
>eg.
>
>   function constant(arg <all-types>) returns <same-type-as-arg>
>
>so we could have:
>
>    field < constant(current_timestamp - interval '1 hour')
>
>although this looks suspiciously like an optimizer hint which I think
>people have been opposed to in the past...
>

I just saw the flaw with this: iscacheable is not enough when the args are
deemed non-constant. I guess we'd need some kind of 'evaluate_once' flag,
which is getting a little obtuse.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Constant propagation and similar issues

From
Thomas Lockhart
Date:
> So I still recommend that we remove the special value CURRENT.  Then we
> can mark the datetime-related operators constant-foldable, which will
> eliminate a complaint that we can otherwise expect to hear constantly
> (saw another instance today in pgsql-sql).

OK.
                   - Thomas