Re: Please help! Query jumps from 1s -> 4m - Mailing list pgsql-performance

From James Thompson
Subject Re: Please help! Query jumps from 1s -> 4m
Date
Msg-id CABoe=cTRMXFBTJ93nVk=JyPsFLTfxBZ+emGQU_wzPxiX40tMQA@mail.gmail.com
Whole thread Raw
In response to Re: Please help! Query jumps from 1s -> 4m  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Just to follow up on this...
Tried increasing stats targets last week + re-analyzing but the query was just as bad.
Ended up increasing the prepareThreshold to prevent server-side prepares for now (and thus later generic statements). This 'fixed' the issue and had no noticeable negative effect for our workloads.

I still don't understand why the plan being off makes the query so much slower in this case (the plans I shared in the last email don't look too different, I don't understand how the filter can add on 2mins of execution time to an index-only scan). If anyone does have thoughts on what could be happening I would be very interested to hear, but the main performance problem is effectively solved.

Thanks all for the valuable help getting to the bottom of what was happening.

On Tue, 5 May 2020 at 22:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Thompson <james@jthompson.dev> writes:
> The slowness occurs when the prepared statement changes to a generic plan.

> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
> time=1.901..45.256 rows=65000 loops=1)
>     Output: table1alias2.uniqueid
>     Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
> col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
>     Heap Fetches: 10
>     Buffers: shared hit=5048

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
> time=125.344..126877.822 rows=65000 loops=1)
>     Output: table1alias2.uniqueid
>     Index Cond: (table1alias2.col20 = $1001)
>     Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> $1000])::text[]))
>     Rows Removed by Filter: 2670023
>     Heap Fetches: 428
>     Buffers: shared hit=45933 read=42060 dirtied=4

Yeah, this is a dynamic we've seen before.  The rowcount estimate, and
hence the cost estimate, for the plan with explicit parameter values is
way off; but the estimate for the generic plan is even more way off,
causing the system to falsely decide that the latter is cheaper.

I've speculated about refusing to believe generic cost estimates if they are
more than epsilon less than the concrete cost estimate, but it's not quite
clear how that should work or whether it'd have its own failure modes.

The one thing that is totally clear is that these rowcount estimates are
crappy.  Can you improve them by increasing the stats target for that
table?  Maybe with less-garbage-y inputs, the system would make the right
plan choice here.

                        regards, tom lane

pgsql-performance by date:

Previous
From: Steve Pritchard
Date:
Subject: Re: Inaccurate Rows estimate for "Bitmap And" causes Planner tochoose wrong join
Next
From: "Matthew Nelson"
Date:
Subject: Plan not skipping unnecessary inner join