Thread: Posible planner improvement?

Posible planner improvement?

From

Albert Cervera Areny

Date:

21 May 2008, 07:35:46

I've got a query similar to this:

select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;

That took > 84 minutes (the query was a bit longer but this is the part that
made the difference) after a little change the query took ~1 second:

select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
t2.id;

The change is pretty simple and it seems (note I don't have a clue on how the
planner works) it'd be possible for the planner to make this assumption
itself. Do you think it is really feasible/appropiate?

Re: Posible planner improvement?

From

Richard Huxton

Date:

21 May 2008, 07:48:20

Albert Cervera Areny wrote:
> I've got a query similar to this:
>
> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
>
> That took > 84 minutes (the query was a bit longer but this is the part that
> made the difference) after a little change the query took ~1 second:
>
> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
> t2.id;

Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
we'll see why it's better at the second one.

--
   Richard Huxton
   Archonet Ltd

Re: Posible planner improvement?

From

Albert Cervera Areny

Date:

21 May 2008, 08:11:35

A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> Albert Cervera Areny wrote:
> > I've got a query similar to this:
> >
> > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> >
> > That took > 84 minutes (the query was a bit longer but this is the part
> > that made the difference) after a little change the query took ~1 second:
> >
> > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
> > t2.id;
>
> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> we'll see why it's better at the second one.

Right, attached an example of such a difference.

Attachment

queries.txt

Re: Posible planner improvement?

From

Mark Mielke

Date:

21 May 2008, 08:25:04

A Dimecres 21 Maig 2008, Richard Huxton va escriure:
>> Albert Cervera Areny wrote:
>>
>>> I've got a query similar to this:
>>>
>>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
>>>
>>> That took > 84 minutes (the query was a bit longer but this is the part
>>> that made the difference) after a little change the query took ~1 second:
>>>
>>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
>>> t2.id;
>>>
>> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
>> we'll see why it's better at the second one.
>>

Even if the estimates were off (they look a bit off for the first
table), the above two queries are logically identical, and I would
expect the planner to make the same decision for both.

I am curious - what is the result of:

    select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;

Is it the same speed as the first or second, or is a third speed entirely?

If t1.id = t2.id, I would expect the planner to substitute them freely
in terms of identities?

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

Re: Posible planner improvement?

From

Albert Cervera Areny

Date:

21 May 2008, 08:30:37

A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> >> Albert Cervera Areny wrote:
> >>> I've got a query similar to this:
> >>>
> >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> >>>
> >>> That took > 84 minutes (the query was a bit longer but this is the part
> >>> that made the difference) after a little change the query took ~1
> >>> second:
> >>>
> >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id
> >>> = t2.id;
> >>
> >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> >> we'll see why it's better at the second one.
>
> Even if the estimates were off (they look a bit off for the first
> table), the above two queries are logically identical, and I would
> expect the planner to make the same decision for both.
>
> I am curious - what is the result of:
>
>     select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
>
> Is it the same speed as the first or second, or is a third speed entirely?

Attached the same file with the third result at the end. The result is worst
than the other two cases. Note that I've analyzed both tables but results are
the same. One order of magnitude between the two first queries.

>
> If t1.id = t2.id, I would expect the planner to substitute them freely
> in terms of identities?
>
> Cheers,
> mark

Attachment

queries.txt

Re: Posible planner improvement?

From

Albert Cervera Areny

Date:

21 May 2008, 08:37:59

A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure:
> A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> > A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> > >> Albert Cervera Areny wrote:
> > >>> I've got a query similar to this:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> > >>>
> > >>> That took > 84 minutes (the query was a bit longer but this is the
> > >>> part that made the difference) after a little change the query took
> > >>> ~1 second:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
> > >>> t1.id = t2.id;
> > >>
> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> > >> we'll see why it's better at the second one.
> >
> > Even if the estimates were off (they look a bit off for the first
> > table), the above two queries are logically identical, and I would
> > expect the planner to make the same decision for both.
> >
> > I am curious - what is the result of:
> >
> >     select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
> >
> > Is it the same speed as the first or second, or is a third speed
> > entirely?
>
> Attached the same file with the third result at the end. The result is
> worst than the other two cases. Note that I've analyzed both tables but
> results are the same. One order of magnitude between the two first queries.

Sorry, it's not worse than the other two cases as shown in the file. However,
after repetition it seems the other two seem to decrease more than the third
one whose times vary a bit more and some times take up to 5 seconds.

Other queries are running in the same machine, so take times with a grain of
salt. What's clear is that always  there's a big difference between first and
second queries.

>
> > If t1.id = t2.id, I would expect the planner to substitute them freely
> > in terms of identities?
> >
> > Cheers,
> > mark

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................  AVISO LEGAL  ............................
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión     del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurar    ni  la   confidencialidad   de   los  mensajes
ni    su    correcta     recepción.   En    el  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individual    sender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internet    e-mail,    please    inform     us    inmmediately.
====================================================================

Re: Posible planner improvement?

From

"Luke Lonergan"

Date:

21 May 2008, 08:51:01

The problem is that the implied join predicate is not being propagated. This is definitely a planner deficiency.

- Luke

----- Original Message -----
From: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org>
To: pgsql-performance@postgresql.org <pgsql-performance@postgresql.org>
Sent: Wed May 21 07:37:49 2008
Subject: Re: [PERFORM] Posible planner improvement?

A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure:
> A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> > A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> > >> Albert Cervera Areny wrote:
> > >>> I've got a query similar to this:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> > >>>
> > >>> That took > 84 minutes (the query was a bit longer but this is the
> > >>> part that made the difference) after a little change the query took
> > >>> ~1 second:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
> > >>> t1.id = t2.id;
> > >>
> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> > >> we'll see why it's better at the second one.
> >
> > Even if the estimates were off (they look a bit off for the first
> > table), the above two queries are logically identical, and I would
> > expect the planner to make the same decision for both.
> >
> > I am curious - what is the result of:
> >
> >     select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
> >
> > Is it the same speed as the first or second, or is a third speed
> > entirely?
>
> Attached the same file with the third result at the end. The result is
> worst than the other two cases. Note that I've analyzed both tables but
> results are the same. One order of magnitude between the two first queries.

Sorry, it's not worse than the other two cases as shown in the file. However,
after repetition it seems the other two seem to decrease more than the third
one whose times vary a bit more and some times take up to 5 seconds.

Other queries are running in the same machine, so take times with a grain of
salt. What's clear is that always there's a big difference between first and
second queries.

>
> > If t1.id = t2.id, I would expect the planner to substitute them freely
> > in terms of identities?
> >
> > Cheers,
> > mark

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................ AVISO LEGAL ............................
La   presente comunicación y sus anexos tiene como destinatario la
persona a la que va dirigida, por lo que si usted lo recibe
por error debe notificarlo al remitente y   eliminarlo   de su
sistema, no pudiendo utilizarlo, total o   parcialmente,   para
ningún fin. Su contenido puede tener información confidencial o
protegida legalmente   y   únicamente   expresa la opinión     del
remitente. El   uso   del   correo   electrónico   vía Internet no
permite   asegurar    ni la   confidencialidad   de   los mensajes
ni    su    correcta     recepción.   En    el caso   de   que   el
destinatario no consintiera la utilización del correo electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its attachments are intended exclusively for the
named addressee. If you receive this message in   error, please
immediately delete it from your system and notify the sender. You
may not use this message or any part of it for any purpose.
The   message   may contain information that is confidential or
protected by law, and any opinions expressed are those of the
individual    sender. Internet e-mail   guarantees   neither   the
confidentiality   nor the proper receipt of the message sent.
If the addressee of this message does not consent to the use
of   internet    e-mail,    please    inform     us    inmmediately.
====================================================================

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Posible planner improvement?

From

Richard Huxton

Date:

21 May 2008, 10:09:58

Luke Lonergan wrote:
> The problem is that the implied join predicate is not being
> propagated.  This is definitely a planner deficiency.

IIRC only equality conditions are propagated and gt, lt, between aren't.
  I seem to remember that the argument given was that the cost of
checking for the ability to propagate was too high for the frequency
when it ocurred.

Of course, what was true for code and machines of 5 years ago might not
be so today.

--
   Richard Huxton
   Archonet Ltd

Re: Posible planner improvement?

From

PFC

Date:

21 May 2008, 13:12:05

On Wed, 21 May 2008 15:09:49 +0200, Richard Huxton <dev@archonet.com>
wrote:

> Luke Lonergan wrote:
>> The problem is that the implied join predicate is not being
>> propagated.  This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between aren't.
>   I seem to remember that the argument given was that the cost of
> checking for the ability to propagate was too high for the frequency
> when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might not
> be so today.
>

    Suggestion : when executing a one-off sql statement, optimizer should try
to offer "best effort while being fast" ; when making a plan that will be
reused many times (ie PREPARE, functions...) planning time could be
muuuuch longer...

Re: Posible planner improvement?

From

Albert Cervera Areny

Date:

21 May 2008, 13:22:51

A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> Luke Lonergan wrote:
> > The problem is that the implied join predicate is not being
> > propagated.  This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between aren't.
>   I seem to remember that the argument given was that the cost of
> checking for the ability to propagate was too high for the frequency
> when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might not
> be so today.

Hope this can be revisited given the huge difference in this case: 80 minutes
to 1 second.

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................  AVISO LEGAL  ............................
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión     del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurar    ni  la   confidencialidad   de   los  mensajes
ni    su    correcta     recepción.   En    el  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individual    sender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internet    e-mail,    please    inform     us    inmmediately.
====================================================================

Re: Posible planner improvement?

From

Decibel!

Date:

24 May 2008, 15:51:45

Moving to -hackers...

On May 21, 2008, at 9:09 AM, Richard Huxton wrote:
> Luke Lonergan wrote:
>> The problem is that the implied join predicate is not being
>> propagated.  This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between
> aren't.  I seem to remember that the argument given was that the
> cost of checking for the ability to propagate was too high for the
> frequency when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might
> not be so today.

Definitely...

How hard would it be to propagate all conditions (except maybe
functions, though perhaps the new function cost estimates make that
more practical) in cases of equality?

For reference, the original query as posted to -performance:

select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;

That took > 84 minutes (the query was a bit longer but this is the
part that made the difference) after a little change the query took
~1 second:

select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
t1.id = t2.id;

--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachment

smime.p7s

Re: Posible planner improvement?

From

ITAGAKI Takahiro

Date:

26 May 2008, 08:30:25

Hello Albert,

Albert Cervera Areny <albert@sedifa.com> wrote:

> I've got a query similar to this:
>
> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
>
> That took > 84 minutes (the query was a bit longer but this is the part that
> made the difference) after a little change the query took ~1 second:
>
> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id =
> t2.id;

I had a similar problem here:
  http://archives.postgresql.org/pgsql-general/2007-02/msg00850.php
and added a redundant inequality explicitly to make it work well.

I think it is worth trying to improve, but I'm not sure we can do it
against user defined types. Does postgres always require transitive law
to all types?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: Posible planner improvement?

From

"Kevin Grittner"

Date:

04 August 2008, 15:26:47

>>> On Mon, May 26, 2008 at  6:30 AM, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Albert Cervera Areny <albert@sedifa.com> wrote:
>
>> I've got a query similar to this:
>>
>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
>>
>> That took > 84 minutes (the query was a bit longer but this is the
part that
>> made the difference) after a little change the query took ~1
second:
>>
>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
t1.id =
>> t2.id;
>
> I had a similar problem here:
>   http://archives.postgresql.org/pgsql-general/2007-02/msg00850.php
> and added a redundant inequality explicitly to make it work well.
>
> I think it is worth trying to improve, but I'm not sure we can do it
> against user defined types. Does postgres always require transitive
law
> to all types?

I've recently run into this.  It would be a nice optimization,
if feasible.

-Kevin