Thread: [HACKERS] More optimization effort?

[HACKERS] More optimization effort?

From
Tatsuo Ishii
Date:
Currently following query does not use an index:

t-ishii@localhost: psql -p 5433 test
Pager usage is off.
psql (9.6.3)
Type "help" for help.

test=# explain select * from pgbench_accounts where aid*100 < 10000;                              QUERY PLAN
                  
 
------------------------------------------------------------------------Seq Scan on pgbench_accounts
(cost=0.00..3319.00rows=33333 width=97)  Filter: ((aid * 100) < 10000)
 
(2 rows)

While following one does use the index.

test=# explain select * from pgbench_accounts where aid < 10000/100;                                           QUERY
PLAN                                           
 
--------------------------------------------------------------------------------------------------Index Scan using
pgbench_accounts_pkeyon pgbench_accounts  (cost=0.29..11.08 rows=102 width=97)  Index Cond: (aid < 100)
 
(2 rows)

Is it worth to make our optimizer a little bit smarter to convert the
the first query into the second form?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [HACKERS] More optimization effort?

From
Craig Ringer
Date:
On 21 July 2017 at 07:11, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
Currently following query does not use an index:

t-ishii@localhost: psql -p 5433 test
Pager usage is off.
psql (9.6.3)
Type "help" for help.

test=# explain select * from pgbench_accounts where aid*100 < 10000;
                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on pgbench_accounts  (cost=0.00..3319.00 rows=33333 width=97)
   Filter: ((aid * 100) < 10000)
(2 rows)

While following one does use the index.

test=# explain select * from pgbench_accounts where aid < 10000/100;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.29..11.08 rows=102 width=97)
   Index Cond: (aid < 100)
(2 rows)

Is it worth to make our optimizer a little bit smarter to convert the
the first query into the second form?

If I understand correctly, you're proposing that the optimiser should attempt algebraic simplification to fold more constants, rather than stopping pre-evaluation constant expressions  as soon as we see a non-constant like we do now. Right?

I'm sure there are documented algorithms out there for algebraic manipulations like that, taking account of precedence etc. But will they be cheap enough to run in the optimiser? And likely to benefit many queries? 

There's also the hiccup of partial index matching. If Pg simplifies and rearranges expressions more, will we potentially fail to match partial indexes that we would've originally matched? I'm not sure it's a blocker, but it bears consideration, and Pg might have to do more work on partial index matching too.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] More optimization effort?

From
Tom Lane
Date:
Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> Currently following query does not use an index:
> test=# explain select * from pgbench_accounts where aid*100 < 10000;
> While following one does use the index.
> test=# explain select * from pgbench_accounts where aid < 10000/100;

> Is it worth to make our optimizer a little bit smarter to convert the
> the first query into the second form?

That's a rather ambitious definition of "little bit" smarter.

For starters, I'm not sure those queries are even fully equivalent
w.r.t. issues like overflow and roundoff.  While a person might
decide that the transformation is OK anyway, I'm not sure that the
optimizer should be allowed to take such liberties.

But the bigger picture is that doing something that helps to any
useful extent would require a really substantial amount of new,
datatype- and operator-specific knowledge that doesn't exist in the
system today.  And as Craig noted, applying that knowledge would
be expensive, even in cases where it failed to help.

So, color me skeptical ...
        regards, tom lane



Re: [HACKERS] More optimization effort?

From
Robert Haas
Date:
On Fri, Jul 21, 2017 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> But the bigger picture is that doing something that helps to any
> useful extent would require a really substantial amount of new,
> datatype- and operator-specific knowledge that doesn't exist in the
> system today.  And as Craig noted, applying that knowledge would
> be expensive, even in cases where it failed to help.
>
> So, color me skeptical ...

I agree, but with a caveat.  If somebody felt like doing all of that
work, and either made it cheap enough to justify enabling it by
default or added a controlling GUC, it'd be fine with me.  We've
talked before about having knobs to adjust how hard the optimizer
tries to optimize things, and this would be a good candidate for such
a thing.  The bigger issue from my perspective is that I really doubt
that anybody wants to put the effort into doing something like this in
a principled way.

Another very similar (but possibly easier) case is:

select * from pgbench_accounts where aid = 1.0;

This will use a sequential scan rather than an index scan, because the
query optimizer doesn't know that the only integer for which =(int4,
numeric) will return true is 1.  Therefore it has to scan the whole
table one row at a time and check, for each one, whether the =
operator returns true.  It can't cast the constant to an integer
because the user might have written 1.1 rather than 1.0, in which case
the cast would fail; but the query should return 0 rows, not ERROR.

You can imagine fixing this by having some kind of datatype-specific
knowledge that would replace "aid = 1.0" with "aid = 1" and "aid =
1.1" with "false"; it would also have to know that "aid = 9999999999"
should be changed to "false" because 9999999999 isn't representable as
int4.

I have, however, decided not to volunteer to be the one who works on
that project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] More optimization effort?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Another very similar (but possibly easier) case is:

> select * from pgbench_accounts where aid = 1.0;

> This will use a sequential scan rather than an index scan, because the
> query optimizer doesn't know that the only integer for which =(int4,
> numeric) will return true is 1.  Therefore it has to scan the whole
> table one row at a time and check, for each one, whether the =
> operator returns true.  It can't cast the constant to an integer
> because the user might have written 1.1 rather than 1.0, in which case
> the cast would fail; but the query should return 0 rows, not ERROR.

> You can imagine fixing this by having some kind of datatype-specific
> knowledge that would replace "aid = 1.0" with "aid = 1" and "aid =
> 1.1" with "false"; it would also have to know that "aid = 9999999999"
> should be changed to "false" because 9999999999 isn't representable as
> int4.

Couple thoughts about that:

* Another conceivable route to a solution is to add "int = numeric"
and friends to the btree opclasses for integers.  I'm not sure if
there is any fundamental reason we've not done that (it's possible
it would fall foul of the requirements about transitivity, not sure).
However, this would only fix things to the extent of allowing an
index scan to occur; it wouldn't help in non-indexed cases, nor would
it know anything about simplifying say "aid = 1.1" to "false".

* The already-existing protransform infrastructure could be used to
address this type of problem; that is, you could imagine attaching
a transform function to numeric_eq that would look for cases like
"integer::numeric = nonintegralconstant" and simplify them accordingly.
When that feature went in, there was talk of using transforms to
simplify e.g. "variable + 0" or "variable * 1", but nobody's got round
to anything of the sort yet.

* On the other hand, protransform doesn't offer any easy way to apply
similar optimizations to a bunch of different functions/operators.
For instance, if your goal is to simplify "variable + 0", there are
a depressingly large number of "+" operators to write transform
functions for.  Maybe there's no way around duplicate coding for that,
but it looks tedious and bulky.

> I have, however, decided not to volunteer to be the one who works on
> that project.

Me either.  Any one of these things would require a *lot* of work in
order to have a coherent feature that provided useful behavior across
a bunch of different datatypes.
        regards, tom lane



Re: [HACKERS] More optimization effort?

From
Greg Stark
Date:
On 21 July 2017 at 20:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> I have, however, decided not to volunteer to be the one who works on
>> that project.
>
> Me either.  Any one of these things would require a *lot* of work in
> order to have a coherent feature that provided useful behavior across
> a bunch of different datatypes.

I had in the past idly thought about whether it would be possible to
link in one of the various general purpose theorem proving libraries
and use it to simplify the expressions. But I really have no idea how
much work it would be to teach one about all the properties and
constraints of our existing data types and operators or for that
matter how easy it would be to figure out what theorems we want proven
to be able to use an index.



-- 
greg