Thread: Specific query taking time to process

Specific query taking time to process

From
Fahiz Mohamed
Date:
Hi team,

Could you please help me with this strange issue I am facing in my current live server I am maintaining.

There is a specific search query I am running to get list of Documents and their metadata from several table in the DB.
We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)

Our current DB consists of 500GB of data and indexes. Most of the rows in table are consist of 454,078,915

With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of the time its less than a second. 

But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+ seconds to respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks time. 

I am just suspecting is there any cache or index maxing out causes this issue?

Could you please guide me what can it be the root cause of this issue?


Thank you,
Fahiz

Re: Specific query taking time to process

From
Justin Pryzby
Date:
On Sat, Dec 07, 2019 at 08:05:59PM +0000, Fahiz Mohamed wrote:
> There is a specific search query I am running to get list of Documents and their metadata from several table in the
DB.
> We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)
> 
> Our current DB consists of 500GB of data and indexes. Most of the rows in table are consist of 454,078,915

454M rows or ??

> With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of
thetime its less than a second.
 

> But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+
secondsto respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks
time.
> 
> I am just suspecting is there any cache or index maxing out causes this issue?
> 
> Could you please guide me what can it be the root cause of this issue?

https://wiki.postgresql.org/wiki/Slow_Query_Questions

Could you send explain ANALYZE (attach here as txt attachment or link on
depesz) now and compared with shortly after a restore ?

Justin



Re: Specific query taking time to process

From
Michael Lewis
Date:
There is a specific search query I am running to get list of Documents and their metadata from several table in the DB.
We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)

Our current DB consists of 500GB of data and indexes. Most of the rows in table are consist of 454,078,915

With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of the time its less than a second. 

But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+ seconds to respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks time. 


You haven't been quite clear on the situation and your use case, but assuming this table has 454 million rows and experiences updates/deletes then this sounds like you may be having problems with autovacuum. Have you customized parameters to ensure it is running more frequently than default? How are you doing those data restores? Perhaps that process is cleaning up the accumulated bloat and you can run fine again for a while. Check pg_stat_user_tables for the last (auto)vacuum that ran, assuming you didn't just restore again and are expecting the issue to occur again soon.

Re: Specific query taking time to process

From
Fahiz Mohamed
Date:
Thank you very much for your prompt responses.

I have analysed more regarding this and found the long running query.

I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)

Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.
EXPLAIN ANALYZE
Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1)
-> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1)
Hash Cond: (node.id = prop.node_id)
-> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383
-> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01 rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1)
Index Cond: ((store_id = 6) AND (type_qname_id = 240))
SubPlan 1
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392 rows=5635 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=40765
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41 rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1)
Index Cond: (qname_id = 251)
-> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045 rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 23
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 22
Planning time: 0.639 ms
Execution time: 22946.036 ms

Instance 2: (Only by testers - 250 request a day)

Nested Loop Semi Join  (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1)
-> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1)
-> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1)
Group Key: prop.node_id
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r
ows=18 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 18
-> Index Scan using alf_node_pkey on alf_node node (cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1 loops=18)
Index Cond: (id = prop.node_id)
Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 0
SubPlan 1
-> Materialize (cost=2962.65..397912.89 rows=158204 width=8) (actual time=0.001..0.009 rows=85 loops=17)
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows=
85 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=55
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87 loops=
1)
Index Cond: (qname_id = 251)
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop
s=17)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 17
Planning time: 0.623 ms
Execution time: 0.540 ms

Configurations are same in both servers.

Please advise me on this. Is there any configuration specifically I need to look like “work_mem”, “Shared_buffers”, “checkpoint_segment”, “effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”? 

Thanks in advance.

Fahiz

On 9 Dec 2019, 19:03 +0000, Michael Lewis <mlewis@entrata.com>, wrote:
There is a specific search query I am running to get list of Documents and their metadata from several table in the DB.
We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)

Our current DB consists of 500GB of data and indexes. Most of the rows in table are consist of 454,078,915

With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of the time its less than a second. 

But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+ seconds to respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks time. 


You haven't been quite clear on the situation and your use case, but assuming this table has 454 million rows and experiences updates/deletes then this sounds like you may be having problems with autovacuum. Have you customized parameters to ensure it is running more frequently than default? How are you doing those data restores? Perhaps that process is cleaning up the accumulated bloat and you can run fine again for a while. Check pg_stat_user_tables for the last (auto)vacuum that ran, assuming you didn't just restore again and are expecting the issue to occur again soon.

Re: Specific query taking time to process

From
Michael Lewis
Date:
On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed <fahiz@netwidz.com> wrote:
I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)

Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.
     -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383

Planning time: 0.639 ms
Execution time: 22946.036 ms


That seems like a lot of lossy blocks. As I understand it, that means the system didn't have enough work_mem to fit all the references to the individual rows which perhaps isn't surprising when it estimates it needs 4.5 million rows and ends up with 41 million.

Do both DB instances have the same data? I ask because the two plans are rather different which makes me think that statistics about the data are not very similar. Are both configured the same, particularly for shared_buffers and work_mem, as well as the various planning cost parameters like random_page cost? If you can provide these plans again with explain( analyze, buffers ) this time? Did you check on the last time autovacuum ran in pg_stat_user_tables?


Re: Specific query taking time to process

From
Justin Pryzby
Date:
On Mon, Dec 09, 2019 at 10:39:38PM +0000, Fahiz Mohamed wrote:
> Thank you very much for your prompt responses.
> 
> I have analysed more regarding this and found the long running query.
> 
> I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist
ofsame data. Instane 1 took 20+ second to process and instance 2 took less than a second)
 
> 
> Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is
nouser in the server.
 
> EXPLAIN ANALYZE
> Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1)
>  -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1)
>     Hash Cond: (node.id = prop.node_id)
>     -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual
time=3304.419..20465.551rows=41109751 loops=1)
 
>        Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
>        Rows Removed by Index Recheck: 54239131
>        Filter: (NOT (hashed SubPlan 1))
>        Rows Removed by Filter: 2816
>        Heap Blocks: exact=24301 lossy=1875383
...

This is doing a bitmap scan on alf_node where the second is doing an Index Scan.
Is alf_node well-correlated on the 2nd server on store_id or type_qname_id ?
If you CLUSTER and ANALYZE on the 1st server, maybe it would perform similarly.
(But, that could hurt other queries).

> We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)
> With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of
thetime its less than a second.
 
> But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+
secondsto respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks
time.

> Instance 2: (Only by testers - 250 request a day)
> 
> Nested Loop Semi Join  (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1)
>   ->  Nested Loop  (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1)
>         ->  HashAggregate  (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1)
>               Group Key: prop.node_id
>               ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties prop  (cost=0.70..3505.99 rows=866
width=8)(actual time=0.020..0.035 r
 
> ows=18 loops=1)
>                     Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
>                     Heap Fetches: 18
>         ->  Index Scan using alf_node_pkey on alf_node node  (cost=2963.22..200644.11 rows=1 width=8) (actual
time=0.019..0.019rows=1 loops=18)
 
>               Index Cond: (id = prop.node_id)
>               Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1)))
>               Rows Removed by Filter: 0
>               SubPlan 1



Re: Specific query taking time to process

From
Fahiz Mohamed
Date:
There is a slight different in both instance’s data. Inastanbce 1 contains latest data and instance 2 consists of data which is 3 weeks older than instance 1. 

Following are the number of rows in each table in both instances

Instance 1
alf_node : 96493129 rows
alf_node_properties : 455599288 rows
alf_node_aspects : 150153006 rowsInstance 2
alf_node : 90831396 rows
alf_node_properties : 440792247 rows
alf_node_aspects : 146648241 rows

I hope the above data difference can make a drastic difference. Please correct me if I am wrong.

I checked "pg_stat_user_tables" and autovacuum never run against those tables.

I did execute vacuum manually and I noticed the below in the output

"INFO: vacuuming "public.alf_node_aspects"
INFO: "alf_node_aspects": found 0 removable, 150264654 nonremovable row versions in 812242 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 13.53s/33.35u sec elapsed 77.88 sec.


INFO: analyzing "public.alf_node_aspects"
INFO: "alf_node_aspects": scanned 30000 of 812242 pages, containing 5550000 live rows and 0 dead rows; 30000 rows in sample, 150264770 estimated total rows"

Did the vacuum worked fine?

I did "explain( analyze, buffers )" below are the results:

Instance 1: 

 Nested Loop Semi Join (cost=441133.07..1352496.76 rows=1 width=8) (actual time=9054.324..10029.748 rows=22 loops=1)
Buffers: shared hit=1617812
-> Merge Semi Join (cost=441132.50..1352359.19 rows=41 width=16) (actual time=9054.296..10029.547 rows=22 loops=1)
Merge Cond: (node.id = prop.node_id)
Buffers: shared hit=1617701
-> Index Only Scan using idx_alf_node_tqn on alf_node node (cost=441041.93..1340831.48 rows=4593371 width=8) (actual t
ime=4.418..7594.637 rows=40998148 loops=1)
Index Cond: ((type_qname_id = 240) AND (store_id = 6))
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 130
Heap Fetches: 0
Buffers: shared hit=1617681
SubPlan 1
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=3418.63..440585.22 rows=182459 width=8) (actual time=0.
583..2.992 rows=4697 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=1774
Buffers: shared hit=1791
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..3373.01 rows=182459 width=0) (actual time=0.384..0.38
4 rows=4697 loops=1)
Index Cond: (qname_id = 251)
Buffers: shared hit=17
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..41.74 rows=852 width=8) (actual time=
0.022..0.037 rows=23 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 0
Buffers: shared hit=20
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..38.80 rows=14 width=8) (actual time=0.
007..0.007 rows=1 loops=22)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 22
Buffers: shared hit=111
Planning time: 0.621 ms
Execution time: 10029.799 ms
(29 rows)


Instance 2: 

Nested Loop Semi Join (cost=6471.94..173560891.23 rows=2 width=8) (actual time=0.162..0.470 rows=17 loops=1)
Buffers: shared hit=257
-> Nested Loop (cost=6471.37..173560734.51 rows=45 width=16) (actual time=0.154..0.392 rows=17 loops=1)
Buffers: shared hit=172
-> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.039..0.043 rows=18 loops=1)
Group Key: prop.node_id
Buffers: shared hit=23
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop (cost=0.70..3505.99 rows=866 width=8)
(actual time=0.019..0.033 rows=18 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
Heap Fetches: 18
Buffers: shared hit=23
-> Index Scan using alf_node_pkey on alf_node node (cost=2963.22..200644.17 rows=1 width=8) (actual time=0.019..
0.019 rows=1 loops=18)
Index Cond: (id = prop.node_id)
Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 0
Buffers: shared hit=149
SubPlan 1
-> Materialize (cost=2962.65..397913.00 rows=158204 width=8) (actual time=0.002..0.009 rows=85 loops=17)
Buffers: shared hit=59
-> Bitmap Heap Scan on alf_node_aspects aspect_1 (cost=2962.65..396503.98 rows=158204 width=8) (ac
tual time=0.025..0.085 rows=85 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=55
Buffers: shared hit=59
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2923.10 rows=158204 width=0) (actual time
=0.019..0.019 rows=87 loops=1)
Index Cond: (qname_id = 251)
Buffers: shared hit=4
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect (cost=0.57..34.32 rows=12 width=8) (actual t
ime=0.004..0.004 rows=1 loops=17)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 17
Buffers: shared hit=85
Planning time: 0.642 ms
Execution time: 0.546 ms
(32 rows)

With the vacuum and reindex on those tables manage to reduce the time to 10s from 30s, but still I don't get the exact performance as Instance 2.

work_mem was set to 4mb previously and in instance 1 its set to 160mb and instance 2 still using 4mb as work_mem. 

shared_buffers set to 8GB in both.

Both are on Postgres DB version 9.6.9

Thanks in advance.

Fahiz
On 10 Dec 2019, 20:15 +0000, Michael Lewis <mlewis@entrata.com>, wrote:
On Mon, Dec 9, 2019 at 3:39 PM Fahiz Mohamed <fahiz@netwidz.com> wrote:
I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)

Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is no user in the server.
 
     -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383

Planning time: 0.639 ms
Execution time: 22946.036 ms


That seems like a lot of lossy blocks. As I understand it, that means the system didn't have enough work_mem to fit all the references to the individual rows which perhaps isn't surprising when it estimates it needs 4.5 million rows and ends up with 41 million.

Do both DB instances have the same data? I ask because the two plans are rather different which makes me think that statistics about the data are not very similar. Are both configured the same, particularly for shared_buffers and work_mem, as well as the various planning cost parameters like random_page cost? If you can provide these plans again with explain( analyze, buffers ) this time? Did you check on the last time autovacuum ran in pg_stat_user_tables?


Re: Specific query taking time to process

From
Michael Lewis
Date:
This seems beyond me at this point, but I am curious if you also vacuumed alf_node_properties and alf_node tables and checked when they last got (auto)vacuumed/analyzed. With default configs for autovacuum parameters and tables with that many rows, they don't qualify for autovacuum very often. I don't have much experience with tables in excess of 50 million rows because of manual sharding clients data.

You mention work_mem is set differently. Did you try setting work_mem back to 4MB in session on instance 1 just to test the query? I don't know if work_mem is included in planning stage, but I would think it may be considered. It would be odd for more available memory to end up with a slower plan, but I like to eliminate variables whenever possible.

It might be worthwhile to see about increasing default_statistics_target to get more specific stats, but that can result in a dramatic increase in planning time for even simple queries.

Hopefully one of the real experts chimes in.

Re: Specific query taking time to process

From
Jeff Janes
Date:
On Tue, Dec 10, 2019 at 3:40 AM Fahiz Mohamed <fahiz@netwidz.com> wrote:
Thank you very much for your prompt responses.

I have analysed more regarding this and found the long running query.

I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist of same data. Instane 1 took 20+ second to process and instance 2 took less than a second)

They do not consist of the same data.  One returns 17 rows, the other 22.

One finds 5635 rows (scattered over 40765 blocks!) where qname_id = 251, the other find 85 rows for the same condition.  It seems the first one is not very well vacuumed.

I don't know if these differences are enough to be driving the different plans (the estimation differences appear smaller than the actual differences), but clearly the data is not the same.

Your first query is using the index idx_alf_node_mdq in a way which seems to be counter-productive.  Perhaps you could inhibit it to see what plan it chooses then.  For example, specify in your query "type_qname_id+0 = 240" to prevent the use of that index.  Or you could drop the index, if it is not vital.

But if the data has not be ANALYZEd recently, you should do that before anything else.  Might as well make it a VACUUM ANALYZE.

Cheers,

Jeff

Re: Specific query taking time to process

From
Jeff Janes
Date:
On Wed, Dec 11, 2019 at 5:21 PM Fahiz Mohamed <fahiz@netwidz.com> wrote:
There is a slight different in both instance’s data. Inastanbce 1 contains latest data and instance 2 consists of data which is 3 weeks older than instance 1. 

In knowing where to look for differences in performance, there is a big difference between them being identical, and being generally similar, but not identical.


I hope the above data difference can make a drastic difference. Please correct me if I am wrong.

They are similar in scale, but we know there is a big difference in distribution of some values.  For example, we still know the slow plan has 4697 rows in aspect_1 where qname_id = 251, while the other plan has 85 rows in aspect_1 meeting that same criterion. That is a big difference, and it is real difference in the data, not just a difference in planning or estimation.  Is this difference driving the difference in plan choice?  Probably not (plan choice is driven by estimated rows, not actual, and estimates are quite similar), but it does demonstrate the data is quite different between the two systems when you look under the hood.  It is likely that there are other, similar differences in the distribution of particular values which is driving the difference in plans.  It is just that we can't see those differences, because the EXPLAIN ANALYZE only reports on the plan it ran, not other plans it could have ran but didn't.

Your query is now using the index named  idx_alf_node_tqn in a way which is equally unproductive as the previous use of  idx_alf_node_mdq was.  It looks like they have the same columns, just in a different order.  My previous advice to try  "type_qname_id+0 = 240" should still apply.

If you can't get that to work, then another avenue is to run "explain (analyze, buffers) select count(*) from alf_node where (type_qname_id = 240) AND (store_id = 6)" on both instances.




I did execute vacuum manually and I noticed the below in the output

"INFO: vacuuming "public.alf_node_aspects"
INFO: "alf_node_aspects": found 0 removable, 150264654 nonremovable row versions in 812242 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 13.53s/33.35u sec elapsed 77.88 sec.

I'm not really sure what that means.  I certainly would not have expected 0 removable. There should have been some prior output, something like:

INFO:  scanned index "fk_alf_nasp_qn" to remove 500000 row versions  
 
It could be that autovacuum had already gotten around to vacuuming after your initial email but before you did the above, meaning there was not much for your manual to do.

But you can see that the vacuum did have an effect, by comparing these lines (despite them finding about same number of rows)

Heap Blocks: exact=40765

Heap Blocks: exact=1774 

It wasn't all that large of an effect in this case, but it is still something worth fixing.

Cheers,

Jeff

Re: Specific query taking time to process

From
Fahiz Mohamed
Date:
Hi Jeff, 

Thank you for your email, Sorry I couldn’t respond back to you. I am not working on this project at the moment. I have copied one of my colleague who working on this. He has some progress on this, he will update the email thread with those findings

Appreciate your support.

Thank you,
Fahiz
On 12 Dec 2019, 02:25 +0000, Jeff Janes <jeff.janes@gmail.com>, wrote:
On Wed, Dec 11, 2019 at 5:21 PM Fahiz Mohamed <fahiz@netwidz.com> wrote:
There is a slight different in both instance’s data. Inastanbce 1 contains latest data and instance 2 consists of data which is 3 weeks older than instance 1. 

In knowing where to look for differences in performance, there is a big difference between them being identical, and being generally similar, but not identical.


I hope the above data difference can make a drastic difference. Please correct me if I am wrong.

They are similar in scale, but we know there is a big difference in distribution of some values.  For example, we still know the slow plan has 4697 rows in aspect_1 where qname_id = 251, while the other plan has 85 rows in aspect_1 meeting that same criterion. That is a big difference, and it is real difference in the data, not just a difference in planning or estimation.  Is this difference driving the difference in plan choice?  Probably not (plan choice is driven by estimated rows, not actual, and estimates are quite similar), but it does demonstrate the data is quite different between the two systems when you look under the hood.  It is likely that there are other, similar differences in the distribution of particular values which is driving the difference in plans.  It is just that we can't see those differences, because the EXPLAIN ANALYZE only reports on the plan it ran, not other plans it could have ran but didn't.

Your query is now using the index named  idx_alf_node_tqn in a way which is equally unproductive as the previous use of  idx_alf_node_mdq was.  It looks like they have the same columns, just in a different order.  My previous advice to try  "type_qname_id+0 = 240" should still apply.

If you can't get that to work, then another avenue is to run "explain (analyze, buffers) select count(*) from alf_node where (type_qname_id = 240) AND (store_id = 6)" on both instances.




I did execute vacuum manually and I noticed the below in the output

"INFO: vacuuming "public.alf_node_aspects"
INFO: "alf_node_aspects": found 0 removable, 150264654 nonremovable row versions in 812242 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 13.53s/33.35u sec elapsed 77.88 sec.

I'm not really sure what that means.  I certainly would not have expected 0 removable. There should have been some prior output, something like:

INFO:  scanned index "fk_alf_nasp_qn" to remove 500000 row versions  
 
It could be that autovacuum had already gotten around to vacuuming after your initial email but before you did the above, meaning there was not much for your manual to do.

But you can see that the vacuum did have an effect, by comparing these lines (despite them finding about same number of rows)

Heap Blocks: exact=40765

Heap Blocks: exact=1774 

It wasn't all that large of an effect in this case, but it is still something worth fixing.

Cheers,

Jeff

Re: Specific query taking time to process

From
Duncan Whitham
Date:

I have read through your comments so far – and they tally with the checks we have been making to an extent – thanks for them.

 

We now only need 1 environment as we  can replicate the performance problem on a copy of live – snapshot/restore from AWS of live. We now have a vacuum analyse running every night on the 3 tables in question on live – to eliminate bloat and inaccurate stats as the root of the problem.

 

We can flip performance based on setting work_mem. For example, currently work_mem up to an including 5069Kb the performance of the query is well under 1 second – Upping work_mem just 1kb to 5097Kb then changes the query plan and performance is a lot worse.

 

Fresh snapshot and restore.

5096kb – plan –

Nested Loop Semi Join  (cost=3494.04..184505785.24 rows=1 width=8) (actual time=6.404..130.145 rows=212 loops=1)

   Buffers: shared hit=7369

   ->  Nested Loop  (cost=3493.47..184505754.56 rows=36 width=16) (actual time=6.394..129.351 rows=212 loops=1)

         Buffers: shared hit=6318

         ->  HashAggregate  (cost=72.32..81.03 rows=871 width=8) (actual time=0.123..0.186 rows=228 loops=1)

               Group Key: prop.node_id

               Buffers: shared hit=99

               ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties prop  (cost=0.70..70.14 rows=872 width=8) (actual time=0.025..0.086 rows=228 loops=1)

                     Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'S434071'::text))

                     Heap Fetches: 2

                     Buffers: shared hit=99

         ->  Index Only Scan using idx_alf_node_tqn on alf_node node  (cost=3421.15..211831.99 rows=1 width=8) (actual time=0.566..0.566 rows=1 loops=228)

               Index Cond: ((type_qname_id = 240) AND (store_id = 6) AND (id = prop.node_id))

               Filter: (NOT (SubPlan 1))

               Heap Fetches: 13

               Buffers: shared hit=6219

               SubPlan 1

                 ->  Materialize  (cost=3420.59..419826.13 rows=163099 width=8) (actual time=0.007..0.246 rows=4909 loops=212)

                       Buffers: shared hit=5092

                       ->  Bitmap Heap Scan on alf_node_aspects aspect_1  (cost=3420.59..418372.63 rows=163099 width=8) (actual time=1.402..5.243 rows=4909 loops=1)

                             Recheck Cond: (qname_id = 251)

                             Heap Blocks: exact=4801

                             Buffers: shared hit=5092

                             ->  Bitmap Index Scan on fk_alf_nasp_qn  (cost=0.00..3379.81 rows=163099 width=0) (actual time=0.850..0.850 rows=7741 loops=1)

                                   Index Cond: (qname_id = 251)

                                   Buffers: shared hit=291

   ->  Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect  (cost=0.57..4.70 rows=15 width=8) (actual time=0.003..0.003 rows=1 loops=212)

         Index Cond: ((node_id = node.id) AND (qname_id = 245))

         Heap Fetches: 15

         Buffers: shared hit=1051

 Planning time: 0.624 ms

 Execution time: 130.222 ms

(32 rows)

 

5097Kb – plan –

 

Nested Loop Semi Join  (cost=1019128.07..3400161.81 rows=1 width=8) (actual time=4832.639..32783.503 rows=212 loops=1)

   Buffers: shared hit=565 read=2191862

   ->  Hash Semi Join  (cost=1019127.50..3400131.13 rows=36 width=16) (actual time=4832.599..32779.613 rows=212 loops=1)

         Hash Cond: (node.id = prop.node_id)

         Buffers: shared hit=25 read=2191476

         ->  Bitmap Heap Scan on alf_node node  (cost=1019046.46..3388792.78 rows=4273414 width=8) (actual time=4219.440..29678.682 rows=41976707 loops=1)

               Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))

               Rows Removed by Index Recheck: 58872880

               Filter: (NOT (hashed SubPlan 1))

               Rows Removed by Filter: 2453

               Heap Blocks: exact=32899 lossy=1939310

               Buffers: shared read=2191402

               ->  Bitmap Index Scan on idx_alf_node_mdq  (cost=0.00..599197.73 rows=19566916 width=0) (actual time=4186.449..4186.449 rows=41990879 loops=1)

                     Index Cond: ((store_id = 6) AND (type_qname_id = 240))

                     Buffers: shared read=214101

               SubPlan 1

                 ->  Bitmap Heap Scan on alf_node_aspects aspect_1  (cost=3420.59..418372.63 rows=163099 width=8) (actual time=2.635..21.981 rows=4909 loops=1)

                       Recheck Cond: (qname_id = 251)

                       Heap Blocks: exact=4801

                       Buffers: shared read=5092

                       ->  Bitmap Index Scan on fk_alf_nasp_qn  (cost=0.00..3379.81 rows=163099 width=0) (actual time=2.016..2.016 rows=7741 loops=1)

                             Index Cond: (qname_id = 251)

                             Buffers: shared read=291

         ->  Hash  (cost=70.14..70.14 rows=872 width=8) (actual time=0.357..0.357 rows=228 loops=1)

               Buckets: 1024  Batches: 1  Memory Usage: 17kB

               Buffers: shared hit=25 read=74

               ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties prop  (cost=0.70..70.14 rows=872 width=8) (actual time=0.047..0.325 rows=228 loops=1)

                     Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'S434071'::text))

                     Heap Fetches: 2

                     Buffers: shared hit=25 read=74

   ->  Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect  (cost=0.57..4.70 rows=15 width=8) (actual time=0.016..0.016 rows=1 loops=212)

         Index Cond: ((node_id = node.id) AND (qname_id = 245))

         Heap Fetches: 15

         Buffers: shared hit=540 read=386

 Planning time: 0.821 ms

 Execution time: 32783.609 ms

(36 rows)

 

Note the second plan is not even using any index on alf_node initially. 


On Thu, Jan 30, 2020 at 9:44 AM Fahiz Mohamed <fahiz@netwidz.com> wrote:
Hi Jeff, 

Thank you for your email, Sorry I couldn’t respond back to you. I am not working on this project at the moment. I have copied one of my colleague who working on this. He has some progress on this, he will update the email thread with those findings

Appreciate your support.

Thank you,
Fahiz
On 12 Dec 2019, 02:25 +0000, Jeff Janes <jeff.janes@gmail.com>, wrote:
On Wed, Dec 11, 2019 at 5:21 PM Fahiz Mohamed <fahiz@netwidz.com> wrote:
There is a slight different in both instance’s data. Inastanbce 1 contains latest data and instance 2 consists of data which is 3 weeks older than instance 1. 

In knowing where to look for differences in performance, there is a big difference between them being identical, and being generally similar, but not identical.


I hope the above data difference can make a drastic difference. Please correct me if I am wrong.

They are similar in scale, but we know there is a big difference in distribution of some values.  For example, we still know the slow plan has 4697 rows in aspect_1 where qname_id = 251, while the other plan has 85 rows in aspect_1 meeting that same criterion. That is a big difference, and it is real difference in the data, not just a difference in planning or estimation.  Is this difference driving the difference in plan choice?  Probably not (plan choice is driven by estimated rows, not actual, and estimates are quite similar), but it does demonstrate the data is quite different between the two systems when you look under the hood.  It is likely that there are other, similar differences in the distribution of particular values which is driving the difference in plans.  It is just that we can't see those differences, because the EXPLAIN ANALYZE only reports on the plan it ran, not other plans it could have ran but didn't.

Your query is now using the index named  idx_alf_node_tqn in a way which is equally unproductive as the previous use of  idx_alf_node_mdq was.  It looks like they have the same columns, just in a different order.  My previous advice to try  "type_qname_id+0 = 240" should still apply.

If you can't get that to work, then another avenue is to run "explain (analyze, buffers) select count(*) from alf_node where (type_qname_id = 240) AND (store_id = 6)" on both instances.




I did execute vacuum manually and I noticed the below in the output

"INFO: vacuuming "public.alf_node_aspects"
INFO: "alf_node_aspects": found 0 removable, 150264654 nonremovable row versions in 812242 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU 13.53s/33.35u sec elapsed 77.88 sec.

I'm not really sure what that means.  I certainly would not have expected 0 removable. There should have been some prior output, something like:

INFO:  scanned index "fk_alf_nasp_qn" to remove 500000 row versions  
 
It could be that autovacuum had already gotten around to vacuuming after your initial email but before you did the above, meaning there was not much for your manual to do.

But you can see that the vacuum did have an effect, by comparing these lines (despite them finding about same number of rows)

Heap Blocks: exact=40765

Heap Blocks: exact=1774 

It wasn't all that large of an effect in this case, but it is still something worth fixing.

Cheers,

Jeff


--
logo  Duncan Whitham DeveloperZaizi 

     

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. 

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Kings House, 174 Hammersmith Road, London W6 7JP.

Re: Specific query taking time to process

From
Tom Lane
Date:
Duncan Whitham <dwhitham@zaizi.com> writes:
> We now only need 1 environment as we  can replicate the performance problem
> on a copy of live – snapshot/restore from AWS of live. We now have a vacuum
> analyse running every night on the 3 tables in question on live – to
> eliminate bloat and inaccurate stats as the root of the problem.

Hmm, doesn't seem like that's getting the job done.  I can see at
least one serious misestimate in these plans:

>                        ->  Bitmap Heap Scan on alf_node_aspects aspect_1
> (cost=3420.59..418372.63 rows=163099 width=8) (actual time=1.402..5.243
> rows=4909 loops=1)
>                              Recheck Cond: (qname_id = 251)

It doesn't seem to me that such a simple condition ought to be
misestimated by a factor of 30, so either you need to crank up
the stats target for this column or you need to analyze the
table more often.

The other rowcount estimates don't seem so awful, but this one is
contributing to the planner thinking that "SubPlan 1" is going to
be very expensive, which probably accounts for it trying to avoid
what's actually a cheap plan.

            regards, tom lane



Re: Specific query taking time to process

From
Duncan Whitham
Date:
Thanks for the reply Tom - you've been a great help. 

I had been looking at changing - default_statistics_target in a broadbrush fashion - changing it to 1000 on alf_node, alf_node_aspects and alf_node_properties and that gets the query to run fine irrespective of work_mem settings. 

I then set them all back to default and reanalyzed.

I then applied the change just to the qname_id column on alf_node_aspects -

alter table alf_node_aspects alter qname_id SET STATISTICS 1000;


reanalyzed alf_node_aspects and the estimate is much better ....


Nested Loop Semi Join  (cost=618944.02..3000749.68 rows=8 width=8) (actual time=5391.271..31441.085 rows=212 loops=1)
   Buffers: shared hit=756 read=2189959 written=1
   ->  Hash Semi Join  (cost=618943.45..3000659.89 rows=37 width=16) (actual time=5391.212..31437.065 rows=212 loops=1)
         Hash Cond: (node.id = prop.node_id)
         Buffers: shared hit=216 read=2189573 written=1
         ->  Bitmap Heap Scan on alf_node node  (cost=618862.32..2989274.54 rows=4290813 width=8) (actual time=4806.877..28240.746 rows=41976707 loops=1)
               Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
               Rows Removed by Index Recheck: 57797868
               Filter: (NOT (hashed SubPlan 1))
               Rows Removed by Filter: 2453
               Heap Blocks: exact=63327 lossy=1908882
               Buffers: shared hit=191 read=2189499 written=1
               ->  Bitmap Index Scan on idx_alf_node_mdq  (cost=0.00..600678.68 rows=19600211 width=0) (actual time=4773.927..4773.927 rows=41990879 loops=1)
                     Index Cond: ((store_id = 6) AND (type_qname_id = 240))
                     Buffers: shared read=214101 written=1
               SubPlan 1
                 ->  Index Scan using fk_alf_nasp_qn on alf_node_aspects aspect_1  (cost=0.57..17099.41 rows=4611 width=8) (actual time=0.036..13.453 rows=4909 loops=1)
                       Index Cond: (qname_id = 251)
                       Buffers: shared hit=191 read=3189
         ->  Hash  (cost=70.20..70.20 rows=875 width=8) (actual time=0.363..0.363 rows=228 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 17kB
               Buffers: shared hit=25 read=74
               ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties prop  (cost=0.70..70.20 rows=875 width=8) (actual time=0.047..0.337 rows=228 loops=1)
                     Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'S434071'::text))
                     Heap Fetches: 2
                     Buffers: shared hit=25 read=74
   ->  Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect  (cost=0.57..3.99 rows=2 width=8) (actual time=0.016..0.016 rows=1 loops=212)
         Index Cond: ((node_id = node.id) AND (qname_id = 245))
         Heap Fetches: 15
         Buffers: shared hit=540 read=386
 Planning time: 0.903 ms
 Execution time: 31441.206 ms
(32 rows)


But there is still a heap scan on alf_node instead of using the index - what should my next move be ? Change the stats collection level on store_id and  type_qname_id on alf_node - or reindex alf_node ? 




On Thu, Jan 30, 2020 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Duncan Whitham <dwhitham@zaizi.com> writes:
> We now only need 1 environment as we  can replicate the performance problem
> on a copy of live – snapshot/restore from AWS of live. We now have a vacuum
> analyse running every night on the 3 tables in question on live – to
> eliminate bloat and inaccurate stats as the root of the problem.

Hmm, doesn't seem like that's getting the job done.  I can see at
least one serious misestimate in these plans:

>                        ->  Bitmap Heap Scan on alf_node_aspects aspect_1
> (cost=3420.59..418372.63 rows=163099 width=8) (actual time=1.402..5.243
> rows=4909 loops=1)
>                              Recheck Cond: (qname_id = 251)

It doesn't seem to me that such a simple condition ought to be
misestimated by a factor of 30, so either you need to crank up
the stats target for this column or you need to analyze the
table more often.

The other rowcount estimates don't seem so awful, but this one is
contributing to the planner thinking that "SubPlan 1" is going to
be very expensive, which probably accounts for it trying to avoid
what's actually a cheap plan.

                        regards, tom lane


--
logo  Duncan Whitham DeveloperZaizi 

     

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. 

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Kings House, 174 Hammersmith Road, London W6 7JP.