Thread: fool-toleranced optimizer

fool-toleranced optimizer

From
Oleg Bartunov
Date:
Hi there,

I just noticed a little optimizer problem - in second query there is
unused 'tycho t2' table alias which gets backend buried. This is 
artificial query, I just tried to check if optimizier could recognize
this.

tycho=# explain analyze select t.pm_ra,t.pm_dec from tycho t where t.pm_ra < 20.2 and t.pm_ra> 18;
                                  QUERY PLAN 
 

-----------------------------------------------------------------------------------------------------------------------------
IndexScan using pm_ra_idx on tycho t  (cost=0.00..9821.83 rows=2613 width=8) (actual time=0.061..12.518 rows=1466
loops=1)  Index Cond: ((pm_ra < 20.2::double precision) AND (pm_ra > 18::double precision)) Total runtime: 14.726 ms
 
(3 rows)

tycho=# explain analyze select t.pm_ra,t.pm_dec from tycho t, tycho t2 where t.pm_ra < 20.2 and t.pm_ra> 18;
...........................................
It's doing Nested Loop, probably, so I didn' wait until it completed....


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: fool-toleranced optimizer

From
Neil Conway
Date:
Oleg Bartunov wrote:
> I just noticed a little optimizer problem - in second query there is
> unused 'tycho t2' table alias which gets backend buried.

It's not an "unused table alias", it is specifying the cartesian product 
of `tycho' with itself. I don't see how this is an optimizer problem: 
it's a perfectly legitimate query, albeit one that is unlikely to 
execute very quickly.

-Neil


Re: fool-toleranced optimizer

From
Simon Riggs
Date:
On Wed, 2005-03-09 at 11:02 +1100, Neil Conway wrote:
> Oleg Bartunov wrote:
> > I just noticed a little optimizer problem - in second query there is
> > unused 'tycho t2' table alias which gets backend buried.
> 
> It's not an "unused table alias", it is specifying the cartesian product 
> of `tycho' with itself. I don't see how this is an optimizer problem: 
> it's a perfectly legitimate query, albeit one that is unlikely to 
> execute very quickly.

Turn this thought around a bit and the request makes sense.

Oleg is saying that the optimizer doesn't protect against foolish SQL
requests. His query is an example of a foolishly written query.

It is reasonably common SQL mistake to inadvertently request a cartesian
product join, when that was not actually desired. This is mostly
prevalent in Data Warehouse situations where people are attempting to
request complex result sets.

It seems a reasonable that there might be a GUC such as 
enable_cartesian = on (by default)

If an admin felt that this was a problem, they could enable it for their
novice users only, or perhaps across the whole system.

If enable_cartesian = off, then queries with cartesian product joins
would be made to fail. Which should be easy to detect in early stages of
optimization.

So, Oleg, for me, the request makes sense, though somebody would need to
code it...

Best Regards, Simon Riggs



Re: fool-toleranced optimizer

From
Neil Conway
Date:
Simon Riggs wrote:
> Oleg is saying that the optimizer doesn't protect against foolish SQL
> requests. His query is an example of a foolishly written query.

IMHO calling this a "foolishly written query" is completely arbitrary. I 
can imagine plenty of applications for which a cartesian join makes 
sense. In this case the user didn't write the query they meant to write 
-- but it is surely hopeless to prevent that in the general case :)

> It seems a reasonable that there might be a GUC such as 
> enable_cartesian = on (by default)

I think the bar for adding a new GUC ought to be significantly higher 
than that.

In any case, when this problem does occur, it is obvious to the user 
that something is wrong, and no harm is done. Given a complex SQL query, 
it might take a bit of examination to determine which join clause is 
missing -- but the proper way to fix that is better query visualization 
tools (perhaps similar RH's Visual Explain, for example). This would 
solve the general problem: "the user didn't write the query they 
intended to write", rather than a very narrow subset ("the user forgot a 
join clause and accidentally computed a cartesian product").

-Neil


Re: fool-toleranced optimizer

From
Simon Riggs
Date:
Oleg, this idea doesn't seem destine for greatness, so it might be worth
adding that you can avoid the general case problem of incorrectly-
specified-but-long-running query by using statement_timeout...

On Wed, 2005-03-09 at 22:38 +1100, Neil Conway wrote:
> Simon Riggs wrote:
> > Oleg is saying that the optimizer doesn't protect against foolish SQL
> > requests. His query is an example of a foolishly written query.
> 
> IMHO calling this a "foolishly written query" is completely arbitrary. 

Well, in this case "foolish" is defined by the person that wrote the
query, as an expression of regret.

> I 
> can imagine plenty of applications for which a cartesian join makes 
> sense. 

Yes, which is why I discussed using a GUC, set only by those people who
want to be protected *from themselves*. It's a safety harness that you
could choose to put on if you wished.

> In this case the user didn't write the query they meant to write 
> -- but it is surely hopeless to prevent that in the general case :)
> 
> > It seems a reasonable that there might be a GUC such as 
> > enable_cartesian = on (by default)
> 
> I think the bar for adding a new GUC ought to be significantly higher 
> than that.

Well, the point is moot until somebody writes the rest of the code
anyhow. So, add it to the ideas shelf...

> In any case, when this problem does occur, it is obvious to the user 
> that something is wrong, and no harm is done. Given a complex SQL query, 
> it might take a bit of examination to determine which join clause is 
> missing -- but the proper way to fix that is better query visualization 
> tools (perhaps similar RH's Visual Explain, for example). This would 
> solve the general problem: "the user didn't write the query they 
> intended to write", rather than a very narrow subset ("the user forgot a 
> join clause and accidentally computed a cartesian product").

This issue only occurs when using SQL as the user interface language,
which is common when using a database in iterative or exploratory mode
e.g. Data Warehousing. If you are using more advanced BI tools then they
seldom get the SQL wrong.

This is not useful in a situation where people are writing SQL for a
more static application.

Best Regards, Simon Riggs



Re: fool-toleranced optimizer

From
Oleg Bartunov
Date:
On Wed, 9 Mar 2005, Simon Riggs wrote:

> Oleg, this idea doesn't seem destine for greatness, so it might be worth
> adding that you can avoid the general case problem of incorrectly-
> specified-but-long-running query by using statement_timeout...

I have no problem with that ! I just wanted to take a note of such
"could be" mistaken errors.


>
> On Wed, 2005-03-09 at 22:38 +1100, Neil Conway wrote:
>> Simon Riggs wrote:
>>> Oleg is saying that the optimizer doesn't protect against foolish SQL
>>> requests. His query is an example of a foolishly written query.
>>
>> IMHO calling this a "foolishly written query" is completely arbitrary.
>
> Well, in this case "foolish" is defined by the person that wrote the
> query, as an expression of regret.
>
>> I
>> can imagine plenty of applications for which a cartesian join makes
>> sense.
>
> Yes, which is why I discussed using a GUC, set only by those people who
> want to be protected *from themselves*. It's a safety harness that you
> could choose to put on if you wished.
>
>> In this case the user didn't write the query they meant to write
>> -- but it is surely hopeless to prevent that in the general case :)
>>
>>> It seems a reasonable that there might be a GUC such as
>>> enable_cartesian = on (by default)
>>
>> I think the bar for adding a new GUC ought to be significantly higher
>> than that.
>
> Well, the point is moot until somebody writes the rest of the code
> anyhow. So, add it to the ideas shelf...
>
>> In any case, when this problem does occur, it is obvious to the user
>> that something is wrong, and no harm is done. Given a complex SQL query,
>> it might take a bit of examination to determine which join clause is
>> missing -- but the proper way to fix that is better query visualization
>> tools (perhaps similar RH's Visual Explain, for example). This would
>> solve the general problem: "the user didn't write the query they
>> intended to write", rather than a very narrow subset ("the user forgot a
>> join clause and accidentally computed a cartesian product").
>
> This issue only occurs when using SQL as the user interface language,
> which is common when using a database in iterative or exploratory mode
> e.g. Data Warehousing. If you are using more advanced BI tools then they
> seldom get the SQL wrong.
>
> This is not useful in a situation where people are writing SQL for a
> more static application.
>
> Best Regards, Simon Riggs
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: fool-toleranced optimizer

From
Greg Stark
Date:
Neil Conway <neilc@samurai.com> writes:

> In any case, when this problem does occur, it is obvious to the user that
> something is wrong, and no harm is done. 

I don't see why you say that. The whole complaint here is that it's *not*
obvious something is wrong and there *is* damage until it's realized.

If I run a query like this on a busy database backing a web site it could
easily kill the web site.

Or if I start this query and expect it to take an hour then after 2-3 hours
when I finally get suspicious I've just wasted 2-3 hours...

Or if I add it to the list of nightly jobs I could lose all the other jobs
that night that are preempted by this heavy query running for too long.

> Given a complex SQL query, it might take a bit of examination to determine
> which join clause is missing -- but the proper way to fix that is better
> query visualization tools (perhaps similar RH's Visual Explain, for
> example). This would solve the general problem: "the user didn't write the
> query they intended to write", rather than a very narrow subset ("the user
> forgot a join clause and accidentally computed a cartesian product").

I'm unconvinced any tool can make humans infallible. 

-- 
greg



Re: fool-toleranced optimizer

From
Josh Berkus
Date:
Simon, Neil, all:

> IMHO calling this a "foolishly written query" is completely arbitrary. I
> can imagine plenty of applications for which a cartesian join makes
> sense. In this case the user didn't write the query they meant to write
> -- but it is surely hopeless to prevent that in the general case :)

Hey, this reminds me, it's about time for us to set ADD_MISSING_FROM=FALSE as 
the default, for 8.1, yes?   When we added the option in 7.4, it was with the 
expectation of changing the default.

The reason this is relevant is "Missing FROM Clause" is a frequent cause of 
cartesian joins, because a user mixed up their table aliases.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: fool-toleranced optimizer

From
Kevin Brown
Date:
Neil Conway wrote:
> Simon Riggs wrote:
> >Oleg is saying that the optimizer doesn't protect against foolish SQL
> >requests. His query is an example of a foolishly written query.
> 
> IMHO calling this a "foolishly written query" is completely arbitrary. I 
> can imagine plenty of applications for which a cartesian join makes 
> sense. In this case the user didn't write the query they meant to write 
> -- but it is surely hopeless to prevent that in the general case :)

Sure, but this case, at least, is (hopefully) easily detectable (as
such things go), has a high cost when it occurs, and is *usually* not
what the user intended.

Hence, it makes sense to go ahead and run the query, but issue a
warning at the very beginning, e.g. "WARNING: query JOINs tables <list
of tables> without otherwise referencing or making use of those
tables.  This may cause excessively poor performance of the query".


That said, the real question is whether or not it's worth putting in
the effort to detect this condition and issue the warning.  I'd say
probably not, but if this is a big enough itch for someone then why
should we discourage them from coding up a fix?



-- 
Kevin Brown                          kevin@sysexperts.com


Re: fool-toleranced optimizer

From
Greg Stark
Date:
Kevin Brown <kevin@sysexperts.com> writes:

> Hence, it makes sense to go ahead and run the query, but issue a
> warning at the very beginning, e.g. "WARNING: query JOINs tables <list
> of tables> without otherwise referencing or making use of those
> tables.  This may cause excessively poor performance of the query".

Well the problem with a warning is what if it *is* intentional? It's not ok to
fill my logs up with warnings for every time the query is executed. That just
forces me to turn off warnings.

It would be ok to have an option to block cartesian joins entirely. I might
even choose to run with that enabled normally. I can always disable it for
queries I know need cartesion joins.

But outputing a warning and then continuing on to destroy performance just
gets the worst of both worlds.

For that matter, I wonder whether it's time to consider an option to disable
implicit (ie, pre-ansi join syntax) joins entirely. It seems like lots of
shops are likely imposing coding standards that require ansi join syntax
anyways. In environments like that you would expect "a CROSS JOIN b" not just
"select * from a,b" anyways. 

Shops like that might appreciate the ability to enforce a blanket coding
standard on that point and get protection from accidental cartesian joins as a
side benefit.

-- 
greg



Re: fool-toleranced optimizer

From
Kevin Brown
Date:
Greg Stark wrote:
> 
> Kevin Brown <kevin@sysexperts.com> writes:
> 
> > Hence, it makes sense to go ahead and run the query, but issue a
> > warning at the very beginning, e.g. "WARNING: query JOINs tables <list
> > of tables> without otherwise referencing or making use of those
> > tables.  This may cause excessively poor performance of the query".
> 
> Well the problem with a warning is what if it *is* intentional? It's
> not ok to fill my logs up with warnings for every time the query is
> executed. That just forces me to turn off warnings.

WARNING is probably the wrong level (I wasn't thinking in terms of PG
logging, though I probably should have been).  What about NOTICE?
Basically, you want something that will alert the interactive user
that what they're doing is likely to be stupid, but at the same time
won't be a burden on the system or the DBA...

> It would be ok to have an option to block cartesian joins entirely. I might
> even choose to run with that enabled normally. I can always disable it for
> queries I know need cartesion joins.

Which wouldn't work all that well for people who are trying to write
their software in a reasonably portable fashion, unfortunately.
However, the number of people who care would now be much smaller.

> For that matter, I wonder whether it's time to consider an option to
> disable implicit (ie, pre-ansi join syntax) joins entirely. It seems
> like lots of shops are likely imposing coding standards that require
> ansi join syntax anyways. In environments like that you would expect
> "a CROSS JOIN b" not just "select * from a,b" anyways.
> 
> Shops like that might appreciate the ability to enforce a blanket
> coding standard on that point and get protection from accidental
> cartesian joins as a side benefit.

That could be handy, but of course it should default to off, which
with respect to cross joins would unfortunately wind up benefitting
only those people who already are potentially aware of the issue and
care about it (or, at least, those people who have DBAs that care
about it).



-- 
Kevin Brown                          kevin@sysexperts.com


Re: fool-toleranced optimizer

From
Richard Huxton
Date:
Greg Stark wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> 
> 
>>Hence, it makes sense to go ahead and run the query, but issue a
>>warning at the very beginning, e.g. "WARNING: query JOINs tables <list
>>of tables> without otherwise referencing or making use of those
>>tables.  This may cause excessively poor performance of the query".
> 
> 
> Well the problem with a warning is what if it *is* intentional? It's not ok to
> fill my logs up with warnings for every time the query is executed. That just
> forces me to turn off warnings.
> 
> It would be ok to have an option to block cartesian joins entirely. I might
> even choose to run with that enabled normally. I can always disable it for
> queries I know need cartesion joins.

I'm not sure the cartesian join is the problem - it's the explosion in 
number of rows. Which suggests you want something analogous to 
statement_timeout. Perhaps something like:  statement_max_select_rows = 0  # 0=disabled  statement_max_update_rows = 0
#applies to insert/delete too
 

That has the bonus of letting you set statement_max_update_rows=1 in an 
interactive session and catching WHERE clause typos.

On the down-side, it means 2 more GUC variables and I'm not sure how 
practical/efficient it is to detect a resultset growing beyond that size.
--  Richard Huxton  Archonet Ltd


Re: fool-toleranced optimizer

From
Bruce Momjian
Date:
Added to TODO:
* Add GUC to issue notice about queries that use unjoined tables


---------------------------------------------------------------------------

Kevin Brown wrote:
> Greg Stark wrote:
> > 
> > Kevin Brown <kevin@sysexperts.com> writes:
> > 
> > > Hence, it makes sense to go ahead and run the query, but issue a
> > > warning at the very beginning, e.g. "WARNING: query JOINs tables <list
> > > of tables> without otherwise referencing or making use of those
> > > tables.  This may cause excessively poor performance of the query".
> > 
> > Well the problem with a warning is what if it *is* intentional? It's
> > not ok to fill my logs up with warnings for every time the query is
> > executed. That just forces me to turn off warnings.
> 
> WARNING is probably the wrong level (I wasn't thinking in terms of PG
> logging, though I probably should have been).  What about NOTICE?
> Basically, you want something that will alert the interactive user
> that what they're doing is likely to be stupid, but at the same time
> won't be a burden on the system or the DBA...
> 
> > It would be ok to have an option to block cartesian joins entirely. I might
> > even choose to run with that enabled normally. I can always disable it for
> > queries I know need cartesion joins.
> 
> Which wouldn't work all that well for people who are trying to write
> their software in a reasonably portable fashion, unfortunately.
> However, the number of people who care would now be much smaller.
> 
> > For that matter, I wonder whether it's time to consider an option to
> > disable implicit (ie, pre-ansi join syntax) joins entirely. It seems
> > like lots of shops are likely imposing coding standards that require
> > ansi join syntax anyways. In environments like that you would expect
> > "a CROSS JOIN b" not just "select * from a,b" anyways.
> > 
> > Shops like that might appreciate the ability to enforce a blanket
> > coding standard on that point and get protection from accidental
> > cartesian joins as a side benefit.
> 
> That could be handy, but of course it should default to off, which
> with respect to cross joins would unfortunately wind up benefitting
> only those people who already are potentially aware of the issue and
> care about it (or, at least, those people who have DBAs that care
> about it).
> 
> 
> 
> -- 
> Kevin Brown                          kevin@sysexperts.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073