Thread: Posible planner improvement?
I've got a query similar to this: select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; That took > 84 minutes (the query was a bit longer but this is the part that made the difference) after a little change the query took ~1 second: select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = t2.id; The change is pretty simple and it seems (note I don't have a clue on how the planner works) it'd be possible for the planner to make this assumption itself. Do you think it is really feasible/appropiate?
Albert Cervera Areny wrote: > I've got a query similar to this: > > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > > That took > 84 minutes (the query was a bit longer but this is the part that > made the difference) after a little change the query took ~1 second: > > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = > t2.id; Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and we'll see why it's better at the second one. -- Richard Huxton Archonet Ltd
A Dimecres 21 Maig 2008, Richard Huxton va escriure: > Albert Cervera Areny wrote: > > I've got a query similar to this: > > > > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > > > > That took > 84 minutes (the query was a bit longer but this is the part > > that made the difference) after a little change the query took ~1 second: > > > > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = > > t2.id; > > Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and > we'll see why it's better at the second one. Right, attached an example of such a difference.
Attachment
A Dimecres 21 Maig 2008, Richard Huxton va escriure: >> Albert Cervera Areny wrote: >> >>> I've got a query similar to this: >>> >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; >>> >>> That took > 84 minutes (the query was a bit longer but this is the part >>> that made the difference) after a little change the query took ~1 second: >>> >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = >>> t2.id; >>> >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and >> we'll see why it's better at the second one. >> Even if the estimates were off (they look a bit off for the first table), the above two queries are logically identical, and I would expect the planner to make the same decision for both. I am curious - what is the result of: select * from t1, t2 where t2.id > 158507 and t1.id = t2.id; Is it the same speed as the first or second, or is a third speed entirely? If t1.id = t2.id, I would expect the planner to substitute them freely in terms of identities? Cheers, mark -- Mark Mielke <mark@mielke.cc>
A Dimecres 21 Maig 2008, Mark Mielke va escriure: > A Dimecres 21 Maig 2008, Richard Huxton va escriure: > >> Albert Cervera Areny wrote: > >>> I've got a query similar to this: > >>> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > >>> > >>> That took > 84 minutes (the query was a bit longer but this is the part > >>> that made the difference) after a little change the query took ~1 > >>> second: > >>> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id > >>> = t2.id; > >> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and > >> we'll see why it's better at the second one. > > Even if the estimates were off (they look a bit off for the first > table), the above two queries are logically identical, and I would > expect the planner to make the same decision for both. > > I am curious - what is the result of: > > select * from t1, t2 where t2.id > 158507 and t1.id = t2.id; > > Is it the same speed as the first or second, or is a third speed entirely? Attached the same file with the third result at the end. The result is worst than the other two cases. Note that I've analyzed both tables but results are the same. One order of magnitude between the two first queries. > > If t1.id = t2.id, I would expect the planner to substitute them freely > in terms of identities? > > Cheers, > mark
Attachment
A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure: > A Dimecres 21 Maig 2008, Mark Mielke va escriure: > > A Dimecres 21 Maig 2008, Richard Huxton va escriure: > > >> Albert Cervera Areny wrote: > > >>> I've got a query similar to this: > > >>> > > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > > >>> > > >>> That took > 84 minutes (the query was a bit longer but this is the > > >>> part that made the difference) after a little change the query took > > >>> ~1 second: > > >>> > > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and > > >>> t1.id = t2.id; > > >> > > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and > > >> we'll see why it's better at the second one. > > > > Even if the estimates were off (they look a bit off for the first > > table), the above two queries are logically identical, and I would > > expect the planner to make the same decision for both. > > > > I am curious - what is the result of: > > > > select * from t1, t2 where t2.id > 158507 and t1.id = t2.id; > > > > Is it the same speed as the first or second, or is a third speed > > entirely? > > Attached the same file with the third result at the end. The result is > worst than the other two cases. Note that I've analyzed both tables but > results are the same. One order of magnitude between the two first queries. Sorry, it's not worse than the other two cases as shown in the file. However, after repetition it seems the other two seem to decrease more than the third one whose times vary a bit more and some times take up to 5 seconds. Other queries are running in the same machine, so take times with a grain of salt. What's clear is that always there's a big difference between first and second queries. > > > If t1.id = t2.id, I would expect the planner to substitute them freely > > in terms of identities? > > > > Cheers, > > mark -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 ==================================================================== ........................ AVISO LEGAL ............................ La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurar ni la confidencialidad de los mensajes ni su correcta recepción. En el caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ==================================================================== ........................... DISCLAIMER ............................. This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any opinions expressed are those of the individual sender. Internet e-mail guarantees neither the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internet e-mail, please inform us inmmediately. ====================================================================
The problem is that the implied join predicate is not being propagated. This is definitely a planner deficiency.
- Luke
----- Original Message -----
From: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org>
To: pgsql-performance@postgresql.org <pgsql-performance@postgresql.org>
Sent: Wed May 21 07:37:49 2008
Subject: Re: [PERFORM] Posible planner improvement?
A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure:
> A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> > A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> > >> Albert Cervera Areny wrote:
> > >>> I've got a query similar to this:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> > >>>
> > >>> That took > 84 minutes (the query was a bit longer but this is the
> > >>> part that made the difference) after a little change the query took
> > >>> ~1 second:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
> > >>> t1.id = t2.id;
> > >>
> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> > >> we'll see why it's better at the second one.
> >
> > Even if the estimates were off (they look a bit off for the first
> > table), the above two queries are logically identical, and I would
> > expect the planner to make the same decision for both.
> >
> > I am curious - what is the result of:
> >
> > select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
> >
> > Is it the same speed as the first or second, or is a third speed
> > entirely?
>
> Attached the same file with the third result at the end. The result is
> worst than the other two cases. Note that I've analyzed both tables but
> results are the same. One order of magnitude between the two first queries.
Sorry, it's not worse than the other two cases as shown in the file. However,
after repetition it seems the other two seem to decrease more than the third
one whose times vary a bit more and some times take up to 5 seconds.
Other queries are running in the same machine, so take times with a grain of
salt. What's clear is that always there's a big difference between first and
second queries.
>
> > If t1.id = t2.id, I would expect the planner to substitute them freely
> > in terms of identities?
> >
> > Cheers,
> > mark
--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.
Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12
====================================================================
........................ AVISO LEGAL ............................
La presente comunicación y sus anexos tiene como destinatario la
persona a la que va dirigida, por lo que si usted lo recibe
por error debe notificarlo al remitente y eliminarlo de su
sistema, no pudiendo utilizarlo, total o parcialmente, para
ningún fin. Su contenido puede tener información confidencial o
protegida legalmente y únicamente expresa la opinión del
remitente. El uso del correo electrónico vía Internet no
permite asegurar ni la confidencialidad de los mensajes
ni su correcta recepción. En el caso de que el
destinatario no consintiera la utilización del correo electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its attachments are intended exclusively for the
named addressee. If you receive this message in error, please
immediately delete it from your system and notify the sender. You
may not use this message or any part of it for any purpose.
The message may contain information that is confidential or
protected by law, and any opinions expressed are those of the
individual sender. Internet e-mail guarantees neither the
confidentiality nor the proper receipt of the message sent.
If the addressee of this message does not consent to the use
of internet e-mail, please inform us inmmediately.
====================================================================
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Luke Lonergan wrote: > The problem is that the implied join predicate is not being > propagated. This is definitely a planner deficiency. IIRC only equality conditions are propagated and gt, lt, between aren't. I seem to remember that the argument given was that the cost of checking for the ability to propagate was too high for the frequency when it ocurred. Of course, what was true for code and machines of 5 years ago might not be so today. -- Richard Huxton Archonet Ltd
On Wed, 21 May 2008 15:09:49 +0200, Richard Huxton <dev@archonet.com> wrote: > Luke Lonergan wrote: >> The problem is that the implied join predicate is not being >> propagated. This is definitely a planner deficiency. > > IIRC only equality conditions are propagated and gt, lt, between aren't. > I seem to remember that the argument given was that the cost of > checking for the ability to propagate was too high for the frequency > when it ocurred. > > Of course, what was true for code and machines of 5 years ago might not > be so today. > Suggestion : when executing a one-off sql statement, optimizer should try to offer "best effort while being fast" ; when making a plan that will be reused many times (ie PREPARE, functions...) planning time could be muuuuch longer...
A Dimecres 21 Maig 2008, Richard Huxton va escriure: > Luke Lonergan wrote: > > The problem is that the implied join predicate is not being > > propagated. This is definitely a planner deficiency. > > IIRC only equality conditions are propagated and gt, lt, between aren't. > I seem to remember that the argument given was that the cost of > checking for the ability to propagate was too high for the frequency > when it ocurred. > > Of course, what was true for code and machines of 5 years ago might not > be so today. Hope this can be revisited given the huge difference in this case: 80 minutes to 1 second. -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 ==================================================================== ........................ AVISO LEGAL ............................ La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurar ni la confidencialidad de los mensajes ni su correcta recepción. En el caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ==================================================================== ........................... DISCLAIMER ............................. This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any opinions expressed are those of the individual sender. Internet e-mail guarantees neither the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internet e-mail, please inform us inmmediately. ====================================================================
Moving to -hackers... On May 21, 2008, at 9:09 AM, Richard Huxton wrote: > Luke Lonergan wrote: >> The problem is that the implied join predicate is not being >> propagated. This is definitely a planner deficiency. > > IIRC only equality conditions are propagated and gt, lt, between > aren't. I seem to remember that the argument given was that the > cost of checking for the ability to propagate was too high for the > frequency when it ocurred. > > Of course, what was true for code and machines of 5 years ago might > not be so today. Definitely... How hard would it be to propagate all conditions (except maybe functions, though perhaps the new function cost estimates make that more practical) in cases of equality? For reference, the original query as posted to -performance: select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; That took > 84 minutes (the query was a bit longer but this is the part that made the difference) after a little change the query took ~1 second: select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = t2.id; -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Attachment
Hello Albert, Albert Cervera Areny <albert@sedifa.com> wrote: > I've got a query similar to this: > > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > > That took > 84 minutes (the query was a bit longer but this is the part that > made the difference) after a little change the query took ~1 second: > > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = > t2.id; I had a similar problem here: http://archives.postgresql.org/pgsql-general/2007-02/msg00850.php and added a redundant inequality explicitly to make it work well. I think it is worth trying to improve, but I'm not sure we can do it against user defined types. Does postgres always require transitive law to all types? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
>>> On Mon, May 26, 2008 at 6:30 AM, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote: > Albert Cervera Areny <albert@sedifa.com> wrote: > >> I've got a query similar to this: >> >> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; >> >> That took > 84 minutes (the query was a bit longer but this is the part that >> made the difference) after a little change the query took ~1 second: >> >> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = >> t2.id; > > I had a similar problem here: > http://archives.postgresql.org/pgsql-general/2007-02/msg00850.php > and added a redundant inequality explicitly to make it work well. > > I think it is worth trying to improve, but I'm not sure we can do it > against user defined types. Does postgres always require transitive law > to all types? I've recently run into this. It would be a nice optimization, if feasible. -Kevin