Re: Specific query taking time to process - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Specific query taking time to process
Date
Msg-id 20191210204206.GE2082@telsasoft.com
Whole thread Raw
In response to Re: Specific query taking time to process  (Fahiz Mohamed <fahiz@netwidz.com>)
List pgsql-performance
On Mon, Dec 09, 2019 at 10:39:38PM +0000, Fahiz Mohamed wrote:
> Thank you very much for your prompt responses.
> 
> I have analysed more regarding this and found the long running query.
> 
> I ran "explain analyse" on this query and I got following result. (We have 2 identical DB instances and they consist
ofsame data. Instane 1 took 20+ second to process and instance 2 took less than a second)
 
> 
> Instance 1: (This is used by regular User - More than 600,000 request a day) - The result is same even when there is
nouser in the server.
 
> EXPLAIN ANALYZE
> Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual time=10568.217..22945.971 rows=22 loops=1)
>  -> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual time=10568.198..22945.663 rows=22 loops=1)
>     Hash Cond: (node.id = prop.node_id)
>     -> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85 rows=4565737 width=8) (actual
time=3304.419..20465.551rows=41109751 loops=1)
 
>        Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
>        Rows Removed by Index Recheck: 54239131
>        Filter: (NOT (hashed SubPlan 1))
>        Rows Removed by Filter: 2816
>        Heap Blocks: exact=24301 lossy=1875383
...

This is doing a bitmap scan on alf_node where the second is doing an Index Scan.
Is alf_node well-correlated on the 2nd server on store_id or type_qname_id ?
If you CLUSTER and ANALYZE on the 1st server, maybe it would perform similarly.
(But, that could hurt other queries).

> We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)
> With the fresh DB with the restore of the DATA without any indexes Search query performs relatively quick and most of
thetime its less than a second.
 
> But after 3 weeks of use of the DB it sudenly started to slowdown only for this perticular query and it takes 20+
secondsto respond. If I do a restore the DB again then it continues to work fine and the symptom pops out after 3 weeks
time.

> Instance 2: (Only by testers - 250 request a day)
> 
> Nested Loop Semi Join  (cost=6471.94..173560841.08 rows=2 width=8) (actual time=0.162..0.464 rows=17 loops=1)
>   ->  Nested Loop  (cost=6471.37..173560684.36 rows=45 width=16) (actual time=0.154..0.387 rows=17 loops=1)
>         ->  HashAggregate  (cost=3508.15..3516.80 rows=865 width=8) (actual time=0.041..0.047 rows=18 loops=1)
>               Group Key: prop.node_id
>               ->  Index Only Scan using idx_alf_nprop_s on alf_node_properties prop  (cost=0.70..3505.99 rows=866
width=8)(actual time=0.020..0.035 r
 
> ows=18 loops=1)
>                     Index Cond: ((qname_id = '242'::bigint) AND (string_value = 'E292432'::text))
>                     Heap Fetches: 18
>         ->  Index Scan using alf_node_pkey on alf_node node  (cost=2963.22..200644.11 rows=1 width=8) (actual
time=0.019..0.019rows=1 loops=18)
 
>               Index Cond: (id = prop.node_id)
>               Filter: ((type_qname_id <> 145) AND (store_id = 6) AND (type_qname_id = 240) AND (NOT (SubPlan 1)))
>               Rows Removed by Filter: 0
>               SubPlan 1



pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Specific query taking time to process
Next
From: Guillaume Lelarge
Date:
Subject: Re: unexpected result for wastedbytes query after vacuum full