Re: Please help! Query jumps from 1s -> 4m - Mailing list pgsql-performance

From James Thompson
Subject Re: Please help! Query jumps from 1s -> 4m
Date
Msg-id CABoe=cR48w59g7PgVRFacsocWcfdgRejiWomQkxYr3U7OV0b9w@mail.gmail.com
Whole thread Raw
In response to Re: Please help! Query jumps from 1s -> 4m  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Please help! Query jumps from 1s -> 4m  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Please help! Query jumps from 1s -> 4m  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I've managed to replicate this now with prepared statements. Thanks for all the guidance so far.

The slowness occurs when the prepared statement changes to a generic plan.

Initial plan:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual time=1.901..45.256 rows=65000 loops=1)
    Output: table1alias2.uniqueid
    Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,... 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
    Heap Fetches: 10
    Buffers: shared hit=5048

after 5 executions of the statement:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual time=125.344..126877.822 rows=65000 loops=1)
    Output: table1alias2.uniqueid
    Index Cond: (table1alias2.col20 = $1001)
    Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ..., $1000])::text[]))
    Rows Removed by Filter: 2670023
    Heap Fetches: 428
    Buffers: shared hit=45933 read=42060 dirtied=4

The second plan looks worse to me as it's applying a filter rather than using an index condition? I don't understand why it's not part of the condition and also why this is so much slower though.
If I force a retrieval of all index rows for col20 = '12345' using an ad-hoc query (below, which in my mind is what the 'bad' plan is doing), that only takes 2s (2.7 mil rows). Where's the difference?

EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1 WHERE datatype='TypeA' AND col20 = 12345;
-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1  (cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645 rows=2735023 loops=1)
         Index Cond: (col20 = 12345)
         Heap Fetches: 417
         Buffers: shared hit=43843 read=44147 dirtied=8

>You could do: SET track_io_timing TO on;
I've not tried this yet, and haven't used it before - sounds like there's some risks associated with running it on a production server / clocks going backwards?

>Also, you should be able to check if that's the problem by doing either:
>plan_cache_mode = force_generic_plan;
>Or (I would think) DISCARD PLANS;
I think plan_cache_mode is just for pg12+ unfortunately? We're on 11 currently.
Just tested DISCARD PLANS locally, it didn't switch back from the generic plan. Was that your expectation?

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Please help! Query jumps from 1s -> 4m
Next
From: Justin Pryzby
Date:
Subject: Re: Please help! Query jumps from 1s -> 4m