Thread: select distinct runs slow on pg 10.6

select distinct runs slow on pg 10.6

From
yash mehta
Date:
We have a query that takes 1min to execute in postgres 10.6 and the same executes in 4 sec in Oracle database. The query is doing 'select distinct'. If I add a 'group by' clause, performance in postgres improves significantly and fetches results in 2 sec (better than oracle). But unfortunately, we cannot modify the query. Could you please suggest a way to improve performance in Postgres without modifying the query. 

Original condition: time taken 1min

Sort Method: external merge  Disk: 90656kB

 

After removing distinct from query: time taken 2sec

Sort Method: top-N heapsort  Memory: 201kB

 

After increasing work_mem to 180MB; it takes 20sec

Sort Method: quicksort  Memory: 172409kB

 

SELECT * FROM pg_stat_statements ORDER BY total_time DESC limit 1;

-[ RECORD 1 ]-------+-----------------------------------------------------------------------------------------------------------------------------------------

userid              | 174862

dbid                | 174861

queryid             | 1469376470

query               | <query is too long. It selects around 300 columns>

calls               | 1

total_time          | 59469.972661

min_time            | 59469.972661

max_time            | 59469.972661

mean_time           | 59469.972661

stddev_time         | 0

rows                | 25

shared_blks_hit     | 27436

shared_blks_read    | 2542

shared_blks_dirtied | 0

shared_blks_written | 0

local_blks_hit      | 0

local_blks_read     | 0

local_blks_dirtied  | 0

local_blks_written  | 0

temp_blks_read      | 257

temp_blks_written   | 11333

blk_read_time       | 0

blk_write_time      | 0

Re: select distinct runs slow on pg 10.6

From
Justin Pryzby
Date:
On Mon, Sep 09, 2019 at 02:00:01PM +0530, yash mehta wrote:
> We have a query that takes 1min to execute in postgres 10.6 and the same
> executes in 4 sec in Oracle database. The query is doing 'select distinct'.
> If I add a 'group by' clause, performance in postgres improves
> significantly and fetches results in 2 sec (better than oracle). But
> unfortunately, we cannot modify the query. Could you please suggest a way
> to improve performance in Postgres without modifying the query.

Not sure it helps, but I remember this:
https://www.postgresql.org/message-id/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com



Re: select distinct runs slow on pg 10.6

From
Flo Rance
Date:
There are few things to consider:
- you don't need to use distinct on all columns (and therefore sort all columns)
- you should try to sort in memory, better than on-disk
- it seems that the planner doesn't predict the good number of rows

Regards,
Florian

On Mon, Sep 9, 2019 at 12:46 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Sep 09, 2019 at 02:00:01PM +0530, yash mehta wrote:
> We have a query that takes 1min to execute in postgres 10.6 and the same
> executes in 4 sec in Oracle database. The query is doing 'select distinct'.
> If I add a 'group by' clause, performance in postgres improves
> significantly and fetches results in 2 sec (better than oracle). But
> unfortunately, we cannot modify the query. Could you please suggest a way
> to improve performance in Postgres without modifying the query.

Not sure it helps, but I remember this:
https://www.postgresql.org/message-id/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com


Re: select distinct runs slow on pg 10.6

From
Michael Lewis
Date:
If you can't modify the query, then there is nothing more to be done to optimize the execution afaik. Distinct is much slower than group by in scenarios like this with many columns. You already identified the disk sort and increased work mem to get it faster by 3x. There are not any other tricks of which I am aware.

Re: select distinct runs slow on pg 10.6

From
yash mehta
Date:
Hi Michael/Justin/Flo,

Thank you all for your assistance. As Michael said, looks like there are no more tricks left. 

On Mon, Sep 9, 2019 at 9:09 PM Michael Lewis <mlewis@entrata.com> wrote:
If you can't modify the query, then there is nothing more to be done to optimize the execution afaik. Distinct is much slower than group by in scenarios like this with many columns. You already identified the disk sort and increased work mem to get it faster by 3x. There are not any other tricks of which I am aware.

Re: select distinct runs slow on pg 10.6

From
Rick Otten
Date:

On Tue, Sep 10, 2019 at 12:53 AM yash mehta <yash215@gmail.com> wrote:
Hi Michael/Justin/Flo,

Thank you all for your assistance. As Michael said, looks like there are no more tricks left. 

On Mon, Sep 9, 2019 at 9:09 PM Michael Lewis <mlewis@entrata.com> wrote:
If you can't modify the query, then there is nothing more to be done to optimize the execution afaik. Distinct is much slower than group by in scenarios like this with many columns. You already identified the disk sort and increased work mem to get it faster by 3x. There are not any other tricks of which I am aware.

Could you put a view in between the real table and the query that does the group by ?  (since you can't change the query)
I'm wondering if the sort/processing time would be faster when that distinct is invoked if the rows are already distinct.
 

Re: select distinct runs slow on pg 10.6

From
Merlin Moncure
Date:
On Mon, Sep 9, 2019 at 3:55 AM yash mehta <yash215@gmail.com> wrote:
>
> We have a query that takes 1min to execute in postgres 10.6 and the same executes in 4 sec in Oracle database. The
queryis doing 'select distinct'. If I add a 'group by' clause, performance in postgres improves significantly and
fetchesresults in 2 sec (better than oracle). But unfortunately, we cannot modify the query. Could you please suggest a
wayto improve performance in Postgres without modifying the query. 

Well, here's the bad news.  Postgres doesn't optimize this specific
formulation as well as oracle does.  Normally tweaking the query along
with some creativity would get the expected result; it's pretty rare
that I can't coerce the planner to do something fairly optimally.  I'm
guessing this is an Oracle conversion app, and we do not have the
ability to change the underlying source code?  Can you elaborate why
not?

In lieu of changing the query in the application, we have  high level
strategies to consider.
*) Eat the 20 seconds, and gripe to your oracle buddies (they will
appreciate this)

*) Mess around with with planner variables to get a better plan.
Unfortunately, since we can't do tricks like SET before running the
query, the changes will be global, and I'm not expecting this to bear
fruit, unless we can have this query be separated from other queries
at the connection level (we might be able to intervene on connect and
set influential non-global planner settings there)

*) Experiment with pg11/pg12 to see if upcoming versions can handle
this strategy better.  pg12 is in beta obviously, but an upgrade
strategy would be the easiest out.

*) Attempt to intervene with views.  I think this is out, since all
the tables are schema qualified. To avoid a global change, the typical
strategy is to tuck some views into a private schema and manipulate
search_path to have them resolve first, but that won't work if you
don't have control of the query string.

*) Try to change the query string anyways.  Say, this is a compiled
application for which you don't have the code,  we might be able to
locate the query text within the compiled binary and modify it.  This
is actually a pretty effective trick (although in many scenarios we'd
want the query string to be the same length as before but you have
plenty of whitespace to play with) although in certain
legal/regulatory contexts we might not be able to do it.

*) Hack some C to adjust the query in flight.   This is *SUPER* hacky,
but let's say that the application was dynamically linked against the
libpq driver, but with some C change and a fearless attitude we could
adjust the query after it leaves the application but before it hits
the database. Other candidate interventions might be in the database
itself or in pgbouncer.  We could also do this in jdbc if your
application connects via that driver.  This is would be 'absolutely
last resort' tactics, but sometimes you simply must find a solution.

merlin



Re: select distinct runs slow on pg 10.6

From
Dinesh Somani
Date:
I think Merlin has outlined pretty much all the options and very neatly. (As an asides Merlin could you possibly elaborate on the "C Hack" how that might be accomplished.)

To OP, I am curious if the performance changes were the query rewritten such that all timestamp columns were listed first in the selection. I understand it might not be feasible to make this change in your real application without breaking the contract.

Regards
Dinesh

On Wed, Sep 11, 2019 at 8:54 AM Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Sep 9, 2019 at 3:55 AM yash mehta <yash215@gmail.com> wrote:
>
> We have a query that takes 1min to execute in postgres 10.6 and the same executes in 4 sec in Oracle database. The query is doing 'select distinct'. If I add a 'group by' clause, performance in postgres improves significantly and fetches results in 2 sec (better than oracle). But unfortunately, we cannot modify the query. Could you please suggest a way to improve performance in Postgres without modifying the query.

Well, here's the bad news.  Postgres doesn't optimize this specific
formulation as well as oracle does.  Normally tweaking the query along
with some creativity would get the expected result; it's pretty rare
that I can't coerce the planner to do something fairly optimally.  I'm
guessing this is an Oracle conversion app, and we do not have the
ability to change the underlying source code?  Can you elaborate why
not?

In lieu of changing the query in the application, we have  high level
strategies to consider.
*) Eat the 20 seconds, and gripe to your oracle buddies (they will
appreciate this)

*) Mess around with with planner variables to get a better plan.
Unfortunately, since we can't do tricks like SET before running the
query, the changes will be global, and I'm not expecting this to bear
fruit, unless we can have this query be separated from other queries
at the connection level (we might be able to intervene on connect and
set influential non-global planner settings there)

*) Experiment with pg11/pg12 to see if upcoming versions can handle
this strategy better.  pg12 is in beta obviously, but an upgrade
strategy would be the easiest out.

*) Attempt to intervene with views.  I think this is out, since all
the tables are schema qualified. To avoid a global change, the typical
strategy is to tuck some views into a private schema and manipulate
search_path to have them resolve first, but that won't work if you
don't have control of the query string.

*) Try to change the query string anyways.  Say, this is a compiled
application for which you don't have the code,  we might be able to
locate the query text within the compiled binary and modify it.  This
is actually a pretty effective trick (although in many scenarios we'd
want the query string to be the same length as before but you have
plenty of whitespace to play with) although in certain
legal/regulatory contexts we might not be able to do it.

*) Hack some C to adjust the query in flight.   This is *SUPER* hacky,
but let's say that the application was dynamically linked against the
libpq driver, but with some C change and a fearless attitude we could
adjust the query after it leaves the application but before it hits
the database. Other candidate interventions might be in the database
itself or in pgbouncer.  We could also do this in jdbc if your
application connects via that driver.  This is would be 'absolutely
last resort' tactics, but sometimes you simply must find a solution.

merlin


Re: select distinct runs slow on pg 10.6

From
Rick Otten
Date:


On Wed, Sep 11, 2019 at 12:38 PM Dinesh Somani <dinesh@opsveda.com> wrote:
I think Merlin has outlined pretty much all the options and very neatly. (As an asides Merlin could you possibly elaborate on the "C Hack" how that might be accomplished.)

To OP, I am curious if the performance changes were the query rewritten such that all timestamp columns were listed first in the selection. I understand it might not be feasible to make this change in your real application without breaking the contract.

Regards
Dinesh

It looks like AWS has a pgbouncer query re-writer service that might be a starting point:

I've never used it.

 

Re: select distinct runs slow on pg 10.6

From
Merlin Moncure
Date:
On Wed, Sep 11, 2019 at 12:57 PM Rick Otten <rottenwindfish@gmail.com> wrote:
>
> On Wed, Sep 11, 2019 at 12:38 PM Dinesh Somani <dinesh@opsveda.com> wrote:
>>
>> I think Merlin has outlined pretty much all the options and very neatly. (As an asides Merlin could you possibly
elaborateon the "C Hack" how that might be accomplished.)
 
>>
>> To OP, I am curious if the performance changes were the query rewritten such that all timestamp columns were listed
firstin the selection. I understand it might not be feasible to make this change in your real application without
breakingthe contract.
 
>>
>> Regards
>> Dinesh
>
>
> It looks like AWS has a pgbouncer query re-writer service that might be a starting point:
>
https://aws.amazon.com/blogs/big-data/query-routing-and-rewrite-introducing-pgbouncer-rr-for-amazon-redshift-and-postgresql/
>
> I've never used it.

Yeah, I haven't either.  Side note: this system also provides the
ability to load balance queries across distributed system; that's a
huge benefit.  Say you have master server and five replica, it seems
that you can round robin the read only queries using this system or
other neat little tricks.   I would be cautious about pgbouncer-rr
becoming the bottleneck itself for certain workloads though.

Anyways, a 'hack' strategy on linux might be to:
*) Check and verify that libpq is dynamically linked (which is almost
alwasys the case).  ldd /your/application should give the dynamic
library dependency to libpq.
*) Grab postgres sources for same version as production
*) configure
*) switch to interfaces/libpq
*) figure out which interface routine(s) being called into.  The
approach will be slightly different if the query is
prepared/paramterized or not.  Assuming it isn't, you'd have to modify
the PQsendQuery routine to check for the signature  (say, with
strcmp), create a new string, and have that be put instead of the
incoming const char* query.  The parameterized versions
(PQsendQueryParams) would be easier since you'd be able to use a
static string rather than parsing it out.
*) Build the library, do some testing with hand written C program
*) inject the modified libpq with LD_LIBRARY_PATH

It must be stated that some people might read this and be compelled to
barf :-) -- it's pretty gross. Having said that, sometimes you have to
find a solution.   I would definitely try the pgbouncer-rr approach
first however; this has a *lot* of potential benefit.

merlin



Re: select distinct runs slow on pg 10.6

From
Dinesh Somani
Date:
Thanks a lot, Merlin.

Yes, it could appear kinda gross to some ;-) 

On Thu, Sep 12, 2019 at 7:19 AM Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 11, 2019 at 12:57 PM Rick Otten <rottenwindfish@gmail.com> wrote:
>
> On Wed, Sep 11, 2019 at 12:38 PM Dinesh Somani <dinesh@opsveda.com> wrote:
>>
>> I think Merlin has outlined pretty much all the options and very neatly. (As an asides Merlin could you possibly elaborate on the "C Hack" how that might be accomplished.)
>>
>> To OP, I am curious if the performance changes were the query rewritten such that all timestamp columns were listed first in the selection. I understand it might not be feasible to make this change in your real application without breaking the contract.
>>
>> Regards
>> Dinesh
>
>
> It looks like AWS has a pgbouncer query re-writer service that might be a starting point:
> https://aws.amazon.com/blogs/big-data/query-routing-and-rewrite-introducing-pgbouncer-rr-for-amazon-redshift-and-postgresql/
>
> I've never used it.

Yeah, I haven't either.  Side note: this system also provides the
ability to load balance queries across distributed system; that's a
huge benefit.  Say you have master server and five replica, it seems
that you can round robin the read only queries using this system or
other neat little tricks.   I would be cautious about pgbouncer-rr
becoming the bottleneck itself for certain workloads though.

Anyways, a 'hack' strategy on linux might be to:
*) Check and verify that libpq is dynamically linked (which is almost
alwasys the case).  ldd /your/application should give the dynamic
library dependency to libpq.
*) Grab postgres sources for same version as production
*) configure
*) switch to interfaces/libpq
*) figure out which interface routine(s) being called into.  The
approach will be slightly different if the query is
prepared/paramterized or not.  Assuming it isn't, you'd have to modify
the PQsendQuery routine to check for the signature  (say, with
strcmp), create a new string, and have that be put instead of the
incoming const char* query.  The parameterized versions
(PQsendQueryParams) would be easier since you'd be able to use a
static string rather than parsing it out.
*) Build the library, do some testing with hand written C program
*) inject the modified libpq with LD_LIBRARY_PATH

It must be stated that some people might read this and be compelled to
barf :-) -- it's pretty gross. Having said that, sometimes you have to
find a solution.   I would definitely try the pgbouncer-rr approach
first however; this has a *lot* of potential benefit.

merlin