Thread: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement
BUG #13112: Catastrophic performance degradation without DISTINCT ON statement
From
oyvind.harboe@zylin.com
Date:
The following bug has been logged on the website: Bug reference: 13112 Logged by: Ãyvind Harboe Email address: oyvind.harboe@zylin.com PostgreSQL version: 9.1.0 Operating system: Ubuntu Description: I've been testing out PostgreSQL vs. Derby/MS SQL for our application when I ran into a problem where the performance of PostgreSQL went from great to abysmal for no apparent reason. After a bit of digging, I've found that the problem is with the SQL statement that Apache Cayenne generates. Apache Cayenne generates statements of the following form which yields bad performance on PostgreSQL with complicated WHERE statements and numerous columns: 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement If I rewrite this statement to the form below using the 'DISTINCT ON()' syntax(which is PostgreSQL specific dialect), then I get great performance again: 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement Numbers on my machine: 1) 44000ms 2) 4300ms Here's where I read up on the DISTINCT ON syntax: http://www.postgresql.org/docs/9.4/static/sql-select.html
Re: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement
From
"David G. Johnston"
Date:
On Tue, Apr 21, 2015 at 12:16 AM, <oyvind.harboe@zylin.com> wrote: > The following bug has been logged on the website: > > Bug reference: 13112 > Logged by: =C3=98yvind Harboe > Email address: oyvind.harboe@zylin.com > PostgreSQL version: 9.1.0 > Operating system: Ubuntu > Description: > > I've been testing out PostgreSQL vs. Derby/MS SQL for our application whe= n > I > ran into a problem where the performance of PostgreSQL went from great to > abysmal for no apparent reason. > > After a bit of digging, I've found that the problem is with the SQL > statement that Apache Cayenne generates. > > Apache Cayenne generates statements of the following form which yields ba= d > performance on PostgreSQL with complicated WHERE statements and numerous > columns: > > 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement > > If I rewrite this statement to the form below using the 'DISTINCT ON()' > syntax(which is PostgreSQL specific dialect), then I get great performanc= e > again: > > 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement > > Numbers on my machine: > > 1) 44000ms > > 2) 4300ms > > Here's where I read up on the DISTINCT ON syntax: > http://www.postgresql.org/docs/9.4/static/sql-select.html So what - the two queries are not equivalent. There may be room for improvement here but you've given insufficient information to help anyone who cares to dig deeper. see =E2=80=8Bhttps://wiki.postgresql.org/wiki/SlowQueryQuestions =E2=80=8B Distinct requires sorting - the ON clause just limits how much - so using it in place of writing a correct query will always yield sub-optimal results. This is not a bug even if PostgreSQL could be taught to handle these kinds of ORM queries more effectively. David J.