Re: Please help! Query jumps from 1s -> 4m - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Please help! Query jumps from 1s -> 4m
Date
Msg-id 20200505213553.GF28974@telsasoft.com
Whole thread Raw
In response to Re: Please help! Query jumps from 1s -> 4m  (James Thompson <james@jthompson.dev>)
List pgsql-performance
On Tue, May 05, 2020 at 10:10:18PM +0100, James Thompson wrote:
> I've managed to replicate this now with prepared statements. Thanks for all
> the guidance so far.
> 
> The slowness occurs when the prepared statement changes to a generic plan.
> 
> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..2549.70 rows=70
width=36)(actual time=1.901..45.256 rows=65000 loops=1)
 
>     Output: table1alias2.uniqueid
>     Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.col8 = ANY
('{c5986b02-3a02-4639-8147-f286972413ba,...98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))

The rowcount is off by a factor of 1000x.

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..17.23 rows=1
width=36)(actual time=125.344..126877.822 rows=65000 loops=1)
 
>     Output: table1alias2.uniqueid
>     Index Cond: (table1alias2.col20 = $1001)
>     Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,$1000])::text[]))
>     Rows Removed by Filter: 2670023

And the generic plan is cheaper than the previous, custom plan; but slower,
undoubtedly due to rowcount mis-estimate.

> The second plan looks worse to me as it's applying a filter rather than
> using an index condition? I don't understand why it's not part of the
> condition and also why this is so much slower though.
> If I force a retrieval of all index rows for col20 = '12345' using an
> ad-hoc query (below, which in my mind is what the 'bad' plan is doing),
> that only takes 2s (2.7 mil rows). Where's the difference?
> 
> EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1
> WHERE datatype='TypeA' AND col20 = 12345;
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> (cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645 rows=2735023 loops=1)

I see you're querying on datetype, which I think isn't in the original query
(but I'm not sure if it's just renamed?).

Underestimate usually means that the conditions are redundant or correlated.
You could mitigate that either by creating an index on both columns (or add
datatype to the existing index), or CREATE STATISTICS ndistinct on those
columns.  Or maybe you just need to increase the stats target for col20 (?).
Then re-analyze.

-- 
Justin



pgsql-performance by date:

Previous
From: James Thompson
Date:
Subject: Re: Please help! Query jumps from 1s -> 4m
Next
From: Tom Lane
Date:
Subject: Re: Please help! Query jumps from 1s -> 4m