Thread: [PERFORM] Slow query after 9.3 to 9.6 migration

[PERFORM] Slow query after 9.3 to 9.6 migration

From
Flávio Henrique
Date:
Hi there, fellow experts!

I need an advice with query that became slower after 9.3 to 9.6 migration.

First of all, I'm from the dev team.

Before migration, we (programmers) made some modifications on query bring it's average time from 8s to 2-3s.

As this query is the most executed on our system (it builds the user panel to work), every bit that we can squeeze from it will be nice.

Now, after server migration to 9.6 we're experiencing bad times with this query again.

Unfortunately, I don't have the old query plain (9.3 version) to show you, but in the actual version (9.6) I can see some buffers written that tells me that something is wrong.

Our server has 250GB of memory available, but the database team says that they can't do nothing to make this query better. I'm not sure, as some buffers are written on disk.

Any tip/help will be much appreciated (even from the query side).

Thank you!


Note: I tried to add index on kilo_victor table already, but Postgresql still thinks that is better to do a seq scan.


Flávio Henrique

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Gerardo Herzig
Date:
>
> Hi there, fellow experts!
>
>
> I need an advice with query that became slower after 9.3 to 9.6
> migration.
>
>
> First of all, I'm from the dev team.
>
>
> Before migration, we (programmers) made some modifications on query
> bring it's average time from 8s to 2-3s.
>
>
> As this query is the most executed on our system (it builds the user
> panel to work), every bit that we can squeeze from it will be nice.
>
>
> Now, after server migration to 9.6 we're experiencing bad times with
> this query again.
>
>
> Unfortunately, I don't have the old query plain (9.3 version) to show
> you, but in the actual version (9.6) I can see some buffers written
> that tells me that something is wrong.
>
>
> Our server has 250GB of memory available, but the database team says
> that they can't do nothing to make this query better. I'm not sure,
> as some buffers are written on disk.
>
>
> Any tip/help will be much appreciated (even from the query side).
>
>
> Thank you!
>
>
> The query plan: https://explain.depesz.com/s/5KMn
>
>
> Note: I tried to add index on kilo_victor table already, but
> Postgresql still thinks that is better to do a seq scan.
>
>

I dont know about the data distribution in kilo_victor, but maybe a partial index
ON kilo_victor (juliet_romeo) where not xray_seven
?

Gerardo


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Daniel Blanch Bataller
Date:
The biggest impact on performance you can achieve is by using a materialized view. if it’s so heavily used as you said, even 2-3 seconds in a multiuser OLTP environment still unacceptable under my point of view. I don’t know if this is the case but if you have 1000 users connecting at 8 am all at the same time … it will freeze the app for a while ..

Ask your self: how old data can be? and take into account that you can refresh the materialized view as often as you want, even every 10 secs if you want.

Beides this, there there's still some room for improvement. Perhaps you have not created the right index to avoid seq scans. Have a look at indexes on expressions.

On systems side: ask them if they have not changed anything in effective_cache_size and shared_buffers parameters, I presume they haven’t change anything related to costs.

Regards.

Daniel Blanch.


El 28 dic 2016, a las 0:50, Flávio Henrique <yoshimit@gmail.com> escribió:

Hi there, fellow experts!

I need an advice with query that became slower after 9.3 to 9.6 migration.

First of all, I'm from the dev team.

Before migration, we (programmers) made some modifications on query bring it's average time from 8s to 2-3s.

As this query is the most executed on our system (it builds the user panel to work), every bit that we can squeeze from it will be nice.

Now, after server migration to 9.6 we're experiencing bad times with this query again.

Unfortunately, I don't have the old query plain (9.3 version) to show you, but in the actual version (9.6) I can see some buffers written that tells me that something is wrong.

Our server has 250GB of memory available, but the database team says that they can't do nothing to make this query better. I'm not sure, as some buffers are written on disk.

Any tip/help will be much appreciated (even from the query side).

Thank you!


Note: I tried to add index on kilo_victor table already, but Postgresql still thinks that is better to do a seq scan.


Flávio Henrique

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Kevin Grittner
Date:
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique <yoshimit@gmail.com> wrote:

> I can see some buffers written that tells me
> that something is wrong.

Try running VACUUM FREEZE ANALYZE on all tables involved in the
query (or just run it as a superuser on the whole database).  Do
*not* use the FULL option.  Among other things, this will ensure
that you have somewhat current statistics, and that all hint bits
are set.  (I remember my surprise the first time I converted a
table to PostgreSQL, ran SELECT count(*) on it to make sure all
rows made it, saw a very long run time with disk writes as the
bottleneck.  That's when I learned about hint bits.)

You should also make sure that autovacuum is aggressive enough on
the new cluster.  Without that, any performance benefit from the
above will slowly disappear.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Merlin Moncure
Date:
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique <yoshimit@gmail.com> wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but Postgresql
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.   Here are
some things you can try:
*) Set effective_io_concurrency high.    You have some heap scanning
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.   these are
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data types?
 I suspect your database team might be incorrect.

merlin


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Flávio Henrique
Date:
Hi all!
Sorry the delay (holidays).

Well, the most expensive sequencial scan was solved.
I asked the db team to drop the index and recreate it and guess what: now postgresql is using it and the time dropped.
(thank you, @Gerardo Herzig!)

I think there's still room for improvement, but the problem is not so crucial right now.
I'll try to investigate every help mentioned here. Thank you all.

@Daniel Blanch
I'll make some tests with a materialized view. Thank you.
On systems side: ask them if they have not changed anything in effective_cache_size and shared_buffers parameters, I presume they haven’t change anything related to costs.
Replying your comment, I think they tunned the server:
effective_cache_size = 196GB
shared_buffers = 24GB (this shouldn't be higher?)

@Kevin Grittner
sorry, but I'm not sure when the autovacuum is aggressive enough, but here my settings related:
autovacuum                          |on        
autovacuum_analyze_scale_factor     |0.05      
autovacuum_analyze_threshold        |10        
autovacuum_freeze_max_age           |200000000 
autovacuum_max_workers              |3         
autovacuum_multixact_freeze_max_age |400000000 
autovacuum_naptime                  |15s       
autovacuum_vacuum_cost_delay        |10ms      
autovacuum_vacuum_cost_limit        |-1        
autovacuum_vacuum_scale_factor      |0.1       
autovacuum_vacuum_threshold         |10        
autovacuum_work_mem                 |-1        

@Merlin Moncure
Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:
Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)
Are you casting in the query or joining through dissimilar data types?
No casts in query. The joins are on same data types. 

Thank you all for the answers. Happy 2017!

Flávio Henrique
--------------------------------------------------------
"There are only 10 types of people in the world: Those who understand binary, and those who don't"
--------------------------------------------------------

On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique <yoshimit@gmail.com> wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but Postgresql
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.   Here are
some things you can try:
*) Set effective_io_concurrency high.    You have some heap scanning
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.   these are
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data types?
 I suspect your database team might be incorrect.

merlin

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Kevin Grittner
Date:
On Thu, Jan 5, 2017 at 10:51 AM, Flávio Henrique <yoshimit@gmail.com> wrote:

> Replying your comment, I think they tunned the server:
> effective_cache_size = 196GB
> shared_buffers = 24GB (this shouldn't be higher?)

Probably not, although it may be a good idea to try settings either
side of that (say, 16GB and 32GB) and monitor performance compared
to the current setting.

> autovacuum_max_workers              |3

If you ever see all workers busy at the same time for 30 minutes or
more, you should probably consider raising that so that small,
frequently updated tables are not neglected for too long.

> autovacuum_vacuum_cost_limit        |-1

That is going to default to vacuum_cost_limit, which is usually
200.  If the server is actually big enough to merit
"effective_cache_size = 196GB" then you should probably bump this
setting to something like 2000.

> autovacuum_work_mem                 |-1

That is going to default to maintenance_work_mem.  On a big
machine, you probably want that set to somewhere between 1GB and
2GB.

Some other tuning to the cost parameters might be helpful, but
there's not enough data on the thread to know what else to suggest.
If you hit some other slow query, you might want to report it in
the manner suggested here:

https://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Daniel Blanch Bataller
Date:
Hi,

If just recreating the index now it uses it, it might mean that the index was bloated, that is, it grew so big that it was cheaper a seq scan.

I’ve seen another case recently where postgres 9.6 wasn’t using the right index in a query, I was able to reproduce the issue crafting index bigger, much bigger than it should be. 

Can you record index size as it is now? Keep this info, and If problem happens again check indexes size, and see if they have grow too much.

i.e. SELECT relname, relpages, reltuples FROM pg_class WHERE relname = ‘index_name'

This might help to see if this is the problem, that indexes are growing too much for some reason.

Regards.

P.S the other parameters don't seem to be the cause of the problem to me.

El 5 ene 2017, a las 17:51, Flávio Henrique <yoshimit@gmail.com> escribió:

Hi all!
Sorry the delay (holidays).

Well, the most expensive sequencial scan was solved.
I asked the db team to drop the index and recreate it and guess what: now postgresql is using it and the time dropped.
(thank you, @Gerardo Herzig!)

I think there's still room for improvement, but the problem is not so crucial right now.
I'll try to investigate every help mentioned here. Thank you all.

@Daniel Blanch
I'll make some tests with a materialized view. Thank you.
On systems side: ask them if they have not changed anything in effective_cache_size and shared_buffers parameters, I presume they haven’t change anything related to costs.
Replying your comment, I think they tunned the server:
effective_cache_size = 196GB
shared_buffers = 24GB (this shouldn't be higher?)

@Kevin Grittner
sorry, but I'm not sure when the autovacuum is aggressive enough, but here my settings related:
autovacuum                          |on        
autovacuum_analyze_scale_factor     |0.05      
autovacuum_analyze_threshold        |10        
autovacuum_freeze_max_age           |200000000 
autovacuum_max_workers              |3         
autovacuum_multixact_freeze_max_age |400000000 
autovacuum_naptime                  |15s       
autovacuum_vacuum_cost_delay        |10ms      
autovacuum_vacuum_cost_limit        |-1        
autovacuum_vacuum_scale_factor      |0.1       
autovacuum_vacuum_threshold         |10        
autovacuum_work_mem                 |-1        

@Merlin Moncure
Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:
Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)
Are you casting in the query or joining through dissimilar data types?
No casts in query. The joins are on same data types. 

Thank you all for the answers. Happy 2017!

Flávio Henrique
--------------------------------------------------------
"There are only 10 types of people in the world: Those who understand binary, and those who don't"
--------------------------------------------------------

On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique <yoshimit@gmail.com> wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but Postgresql
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.   Here are
some things you can try:
*) Set effective_io_concurrency high.    You have some heap scanning
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.   these are
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data types?
 I suspect your database team might be incorrect.

merlin


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Merlin Moncure
Date:
On Thu, Jan 5, 2017 at 10:51 AM, Flávio Henrique <yoshimit@gmail.com> wrote:
> @Merlin Moncure
>>
>> Big gains (if any) are likely due to indexing strategy.
>> I do see some suspicious casting, for example:
>> Join Filter: ((four_charlie.delta_tango)::integer =
>> (six_quebec.golf_bravo)::integer)
>> Are you casting in the query or joining through dissimilar data types?
>
> No casts in query. The joins are on same data types.

well, something is going on.

create table t(i int);
create table t2(i int);
set enable_hashjoin to false;
set enable_mergejoin to false;

yields:

postgres=# explain select * from t join t2 on t.i = t2.i;
                            QUERY PLAN
──────────────────────────────────────────────────────────────────
 Nested Loop  (cost=0.00..97614.88 rows=32512 width=8)
   Join Filter: (t.i = t2.i)
   ->  Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)

please note the non-casted join filter.

however,

postgres=# explain select * from t join t2 on t.i::bigint = t2.i::bigint;
                            QUERY PLAN
──────────────────────────────────────────────────────────────────
 Nested Loop  (cost=0.00..130127.38 rows=32512 width=8)
   Join Filter: ((t.i)::bigint = (t2.i)::bigint)
   ->  Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
         ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)

notice the casts in the join filter.   Furthermore, please note the
higher  query cost due to the server accounting for the casting
involved in the join.  Any kind of non-equality based operation in a
join or the predicate side of a where condition can get very expensive
very quickly.   (it remains difficult to see if there's any way to
improve the join operation due to lack of visibility on the query
string).

merlin

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Filipe Oliveira
Date:
Can you remove me from your mailing list?

Thanks.

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Michael Paquier
Date:
On Fri, Jan 6, 2017 at 6:14 AM, Filipe Oliveira <filipeo_11@hotmail.com> wrote:
> Can you remove me from your mailing list?

There is an unsubscribe action here:
https://www.postgresql.org/community/lists/subscribe/
--
Michael


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Filipe Oliveira
Date:
Thank you for the reply. I had been trying to find that option for awhile now.

On Fri, Jan 6, 2017 at 12:51 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jan 6, 2017 at 6:14 AM, Filipe Oliveira <filipeo_11@hotmail.com> wrote:
> Can you remove me from your mailing list?

There is an unsubscribe action here:
https://www.postgresql.org/community/lists/subscribe/
--
Michael

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Peter Geoghegan
Date:
On Thu, Jan 5, 2017 at 9:51 AM, Daniel Blanch Bataller
<daniel.blanch.bataller@gmail.com> wrote:
> If just recreating the index now it uses it, it might mean that the index
> was bloated, that is, it grew so big that it was cheaper a seq scan.
>
> I’ve seen another case recently where postgres 9.6 wasn’t using the right
> index in a query, I was able to reproduce the issue crafting index bigger,
> much bigger than it should be.
>
> Can you record index size as it is now? Keep this info, and If problem
> happens again check indexes size, and see if they have grow too much.
>
> i.e. SELECT relname, relpages, reltuples FROM pg_class WHERE relname =
> ‘index_name'
>
> This might help to see if this is the problem, that indexes are growing too
> much for some reason.

Are these unique indexes or not? Did Flavio have a workload with many UPDATEs?

I ask these questions because I think it's possible that this is
explained by a regression in 9.5's handling of index bloat, described
here:

http://postgr.es/m/CAH2-Wz=SfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA@mail.gmail.com

I'm trying to track down cases where this could be an issue, to get a
better sense of the problem.

--
Peter Geoghegan


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

From
Scott Marlowe
Date:
On Tue, Dec 27, 2016 at 3:50 PM, Flávio Henrique <yoshimit@gmail.com> wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.

The second sorts etc start spilling to disk your performance is gonna
tank. Try increasing work_mem to something moderate like 256M to 1G.
Note that work_mem is per sort / action, so if you got 100 users
running queries with 2 or 3 sorts at a time you can exhaust memory
real fast. OTOH, a db with proper pooling on connections etc (aka 10
to 20 live connections at a time) cna easily handle 1G work_mem if
it's got 256G RAM