Thread: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement

BUG #13112: Catastrophic performance degradation without DISTINCT ON statement

From
oyvind.harboe@zylin.com
Date:
The following bug has been logged on the website:

Bug reference:      13112
Logged by:          Øyvind Harboe
Email address:      oyvind.harboe@zylin.com
PostgreSQL version: 9.1.0
Operating system:   Ubuntu
Description:

I've been testing out PostgreSQL vs. Derby/MS SQL for our application when I
ran into a problem where the performance of PostgreSQL went from great to
abysmal for no apparent reason.

After a bit of digging, I've found that the problem is with the SQL
statement that Apache Cayenne generates.

Apache Cayenne generates statements of the following form which yields bad
performance on PostgreSQL with complicated WHERE statements and numerous
columns:

1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement

If I rewrite this statement to the form below using the 'DISTINCT ON()'
syntax(which is PostgreSQL specific dialect), then I get great performance
again:

2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement

Numbers on my machine:

1) 44000ms

2) 4300ms

Here's where I read up on the DISTINCT ON syntax:
http://www.postgresql.org/docs/9.4/static/sql-select.html

Re: BUG #13112: Catastrophic performance degradation without DISTINCT ON statement

From
"David G. Johnston"
Date:
On Tue, Apr 21, 2015 at 12:16 AM, <oyvind.harboe@zylin.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      13112
> Logged by:          =C3=98yvind Harboe
> Email address:      oyvind.harboe@zylin.com
> PostgreSQL version: 9.1.0
> Operating system:   Ubuntu
> Description:
>
> I've been testing out PostgreSQL vs. Derby/MS SQL for our application whe=
n
> I
> ran into a problem where the performance of PostgreSQL went from great to
> abysmal for no apparent reason.
>
> After a bit of digging, I've found that the problem is with the SQL
> statement that Apache Cayenne generates.
>
> Apache Cayenne generates statements of the following form which yields ba=
d
> performance on PostgreSQL with complicated WHERE statements and numerous
> columns:
>
> 1) SELECT DISTINCT a,b,c,d,e,f ... WHERE somecomplicatedstatement
>
> If I rewrite this statement to the form below using the 'DISTINCT ON()'
> syntax(which is PostgreSQL specific dialect), then I get great performanc=
e
> again:
>
> 2) SELECT DISTINCT ON(a) a,b,c,d,e,f ... WHERE somecomplicatedstatement
>
> Numbers on my machine:
>
> 1) 44000ms
>
> 2) 4300ms
>
> Here's where I read up on the DISTINCT ON syntax:
> http://www.postgresql.org/docs/9.4/static/sql-select.html


So what - the two queries are not equivalent.

There may be room for improvement here but you've given insufficient
information to help anyone who cares to dig deeper.

see =E2=80=8Bhttps://wiki.postgresql.org/wiki/SlowQueryQuestions
=E2=80=8B
Distinct requires sorting - the ON clause just limits how much - so using
it in place of writing a correct query will always yield sub-optimal
results.

This is not a bug even if PostgreSQL could be taught to handle these kinds
of ORM queries more effectively.

David J.