Thread: Postgres chooses slow query plan from time to time

Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

13 September 2021, 13:24:57

Dear community,

I have a query that most of the time gets executed in a few
milliseconds yet occasionally takes ~20+ seconds. The difference, as
far as I am able to tell, comes whether it uses the table Primary Key
(fast) or an additional index with smaller size. The table in question
is INSERT ONLY - no updates or deletes done there.

Pg 11.12, total OS mem 124G

shared_buffers: 31GB
work_mem: 27MB
effective_cache_size: 93GB

The query:

SELECT
    *
FROM
    myschema.mytable pbh
WHERE
    pbh.product_code = $1
    AND pbh.cage_player_id = $2
    AND pbh.cage_code = $3
    AND balance_type = $4
    AND pbh.modified_time < $5
ORDER BY
    pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY;

\d myschema.mytable
                    Table "myschema.mytable"
     Column     │            Type             │ Collation │ Nullable │ Default
────────────────┼─────────────────────────────┼───────────┼──────────┼─────────
 cage_code      │ integer                     │           │ not null │
 cage_player_id │ bigint                      │           │ not null │
 product_code   │ character varying(30)       │           │ not null │
 balance_type   │ character varying(30)       │           │ not null │
 version        │ bigint                      │           │ not null │
 modified_time  │ timestamp(3) with time zone │           │ not null │
 amount         │ numeric(38,8)               │           │ not null │
 change         │ numeric(38,8)               │           │ not null │
 transaction_id │ bigint                      │           │ not null │
Indexes:
    "mytable_pk" PRIMARY KEY, btree (cage_code, cage_player_id,
product_code, balance_type, version)
    "mytable_idx1" btree (modified_time)
    "mytable_idx2" btree (cage_code, cage_player_id, modified_time)

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
relname='mytable';
─[ RECORD 1 ]──┬───────────────────────
relname        │ mytable
relpages       │ 18630554
reltuples      │ 1.45045e+09
relallvisible  │ 18629741
relkind        │ r
relnatts       │ 9
relhassubclass │ f
reloptions     │ ¤
pg_table_size  │ 152695029760 (142 GB)

I have caught this with AUTOEXPLAIN:

Query Text: SELECT *   FROM myschema.mytable pbh WHERE
pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
  Limit  (cost=0.70..6.27 rows=1 width=66)
    ->  Index Scan Backward using mytable_idx2 on mytable pbh
(cost=0.70..21552.55 rows=3869 width=66)
          Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
(modified_time < $5))
          Filter: (((product_code)::text = ($1)::text) AND
((balance_type)::text = ($4)::text))

And when I run EXPLAIN ANALYZE on the same query with the same
parameters manually:

 Limit  (cost=177.75..177.75 rows=1 width=66) (actual
time=8.635..8.635 rows=1 loops=1)
   ->  Sort  (cost=177.75..178.21 rows=186 width=66) (actual
time=8.634..8.634 rows=1 loops=1)
         Sort Key: modified_time DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Scan using mytable_pk on mytable pbh
(cost=0.70..176.82 rows=186 width=66) (actual time=1.001..8.610
rows=25 loops=1)
               Index Cond: ((cage_code = 123) AND (cage_player_id =
'12345'::bigint) AND ((product_code)::text = 'PRODUCT'::text) AND
((balance_type)::text = 'TOTAL'::text))
               Filter: (modified_time < '2021-09-13
04:00:00+00'::timestamp with time zone)
 Planning Time: 2.117 ms
 Execution Time: 8.658 ms

I have played around with SET STATISTICS, work_mem and even tried
CREATE STATISTICS although there is no functional dependency on the
table columns in questions, but nothing seems to work.

Any ideas, hints are very much appreciated!


With best regards,
--
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Tomas Vondra

Date:

13 September 2021, 13:50:46

On 9/13/21 3:24 PM, Kristjan Mustkivi wrote:
> Dear community,
> 
> I have a query that most of the time gets executed in a few
> milliseconds yet occasionally takes ~20+ seconds. The difference, as
> far as I am able to tell, comes whether it uses the table Primary Key
> (fast) or an additional index with smaller size. The table in question
> is INSERT ONLY - no updates or deletes done there.
> 

It'd be really useful to have explain analyze for the slow execution.

My guess is there's a poor estimate, affecting some of the parameter
values, and it probably resolves itself after autoanalyze run.

I see you mentioned SET STATISTICS, so you tried increasing the
statistics target for some of the columns? Have you tried lowering
autovacuum_analyze_scale_factor to make autoanalyze more frequent?

It's also possible most values are independent, but some values have a
rather strong dependency, skewing the estimates. The MCV would help with
that, but those are in PG12 :-(

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgres chooses slow query plan from time to time

From

Michael Lewis

Date:

13 September 2021, 14:19:40

Autovacuum will only run for freezing, right? Insert only tables don't get autovacuumed/analyzed until PG13 if I remember right.

Re: Postgres chooses slow query plan from time to time

From

Jeff Janes

Date:

13 September 2021, 19:21:39

On Mon, Sep 13, 2021 at 9:25 AM Kristjan Mustkivi <sonicmonkey@gmail.com> wrote:

SELECT
*
FROM
myschema.mytable pbh
WHERE
pbh.product_code = $1
AND pbh.cage_player_id = $2
AND pbh.cage_code = $3
AND balance_type = $4
AND pbh.modified_time < $5
ORDER BY
pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY;

"mytable_idx2" btree (cage_code, cage_player_id, modified_time)

Why does this index exist? It seems rather specialized, but what is it specialized for?

If you are into specialized indexes, the ideal index for this query would be:

btree (cage_code, cage_player_id, product_code, balance_type, modified_time)

But the first 4 columns can appear in any order if that helps you combine indexes. If this index existed, then it wouldn't have to choose between two other suboptimal indexes, and so would be unlikely to choose incorrectly between them.

Cheers,

Jeff

Re: Postgres chooses slow query plan from time to time

From

Jeff Janes

Date:

13 September 2021, 19:39:05

On Mon, Sep 13, 2021 at 9:25 AM Kristjan Mustkivi <sonicmonkey@gmail.com> wrote:

I have caught this with AUTOEXPLAIN:

Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
(modified_time < $5))
Filter: (((product_code)::text = ($1)::text) AND
((balance_type)::text = ($4)::text))

Is it always the case that autoexplain shows plans with $1 etc, rather than real values, for the slow queries?

If so, then it could be that the switch from custom to generic plans is causing the problem.

Cheers,

Jeff

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

14 September 2021, 07:55:01

Hello Tomas,

The auto explain analyze caught this:

2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
  Query Text: SELECT *   FROM myschema.mytable pbh WHERE
pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
  Limit  (cost=0.70..6.27 rows=1 width=66) (actual
time=5934.154..5934.155 rows=1 loops=1)
    Buffers: shared hit=7623 read=18217
    ->  Index Scan Backward using mytable_idx2 on mytable pbh
(cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153 rows=1 loops=1)
          Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
(modified_time < $5))

So it expected to get 3885 rows, but got just 1. So this is the
statistics issue, right?

For testing, I set autovacuum_vacuum_scale_factor = 0.0 and
autovacuum_vacuum_threshold = 10000 for the table and am now
monitoring the behavior.

Best regards,

Kristjan

On Mon, Sep 13, 2021 at 4:50 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 9/13/21 3:24 PM, Kristjan Mustkivi wrote:
> > Dear community,
> >
> > I have a query that most of the time gets executed in a few
> > milliseconds yet occasionally takes ~20+ seconds. The difference, as
> > far as I am able to tell, comes whether it uses the table Primary Key
> > (fast) or an additional index with smaller size. The table in question
> > is INSERT ONLY - no updates or deletes done there.
> >
>
> It'd be really useful to have explain analyze for the slow execution.
>
> My guess is there's a poor estimate, affecting some of the parameter
> values, and it probably resolves itself after autoanalyze run.
>
> I see you mentioned SET STATISTICS, so you tried increasing the
> statistics target for some of the columns? Have you tried lowering
> autovacuum_analyze_scale_factor to make autoanalyze more frequent?
>
> It's also possible most values are independent, but some values have a
> rather strong dependency, skewing the estimates. The MCV would help with
> that, but those are in PG12 :-(
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

14 September 2021, 08:03:38

Hi Jeff,

The specialized index is present due to some other queries and the
index is used frequently (according to the statistics). I do agree
that in this particular case the index btree (cage_code,
cage_player_id, product_code, balance_type, modified_time) would solve
the problem but at the moment it is not possible to change that
without unexpected consequences (this odd behavior manifests only in
one of our sites).

I will try if more aggressive autovacuum analyze will alleviate the
case as Tomas Vondra suggested.

Thank you for the help!

Kristjan

On Mon, Sep 13, 2021 at 10:21 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Mon, Sep 13, 2021 at 9:25 AM Kristjan Mustkivi <sonicmonkey@gmail.com> wrote:
>
>>
>> SELECT
>>     *
>> FROM
>>     myschema.mytable pbh
>> WHERE
>>     pbh.product_code = $1
>>     AND pbh.cage_player_id = $2
>>     AND pbh.cage_code = $3
>>     AND balance_type = $4
>>     AND pbh.modified_time < $5
>> ORDER BY
>>     pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY;
>
>
>>
>>     "mytable_idx2" btree (cage_code, cage_player_id, modified_time)
>
>
> Why does this index exist?  It seems rather specialized, but what is it specialized for?
>
> If you are into specialized indexes, the ideal index for this query would be:
>
> btree (cage_code, cage_player_id, product_code, balance_type, modified_time)
>
> But the first 4 columns can appear in any order if that helps you combine indexes.  If this index existed, then it
wouldn'thave to choose between two other suboptimal indexes, and so would be unlikely to choose incorrectly between
them.
>
> Cheers,
>
> Jeff

-- 
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Laurenz Albe

Date:

14 September 2021, 12:11:32

On Tue, 2021-09-14 at 10:55 +0300, Kristjan Mustkivi wrote:
> 2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
> app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
>   Query Text: SELECT *   FROM myschema.mytable pbh WHERE
> pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
> pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
> $5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
>   Limit  (cost=0.70..6.27 rows=1 width=66) (actual time=5934.154..5934.155 rows=1 loops=1)
>     Buffers: shared hit=7623 read=18217
>     ->  Index Scan Backward using mytable_idx2 on mytable pbh (cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153rows=1 loops=1)
 
>           Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND (modified_time < $5))

If it scanned the index for 6 seconds before finding the first result,
I'd suspect one of the following:

- the index is terribly bloated

- there were lots of deleted rows, and the index entries were marked as "dead"

- something locked the table for a long time

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Postgres chooses slow query plan from time to time

From

Jeff Janes

Date:

14 September 2021, 12:26:02

On Tue, Sep 14, 2021 at 3:55 AM Kristjan Mustkivi <sonicmonkey@gmail.com> wrote:

Hello Tomas,

The auto explain analyze caught this:

2021-09-14 06:55:33 UTC, pid=12345 db=mydb, usr=myuser, client=ip,
app=PostgreSQL JDBC Driver, line=55 LOG: duration: 5934.165 ms plan:
Query Text: SELECT * FROM myschema.mytable pbh WHERE
pbh.product_code = $1 AND pbh.cage_player_id = $2 AND
pbh.cage_code = $3 AND balance_type = $4 AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
Limit (cost=0.70..6.27 rows=1 width=66) (actual
time=5934.154..5934.155 rows=1 loops=1)
Buffers: shared hit=7623 read=18217
-> Index Scan Backward using mytable_idx2 on mytable pbh
(cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153 rows=1 loops=1)
Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
(modified_time < $5))

So it expected to get 3885 rows, but got just 1. So this is the
statistics issue, right?

That would be true if there were no LIMIT. But with the LIMIT, all this means is that it stopped actually scanning after it found one row, but it estimates that if it didn't stop it would have found 3885. So it is not very informative. But the above plan appears incomplete, there should be a line for "Rows Removed by Filter", and I think that that is what we really want to see in this case.

Cheers,

Jeff

Cheers,

Jeff

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

14 September 2021, 12:41:54

I am very sorry, I indeed copy-pasted an incomplete plan. Here it is in full:

2021-09-14 06:55:33 UTC, pid=27576  db=mydb, usr=myuser, client=ip,
app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
        Query Text: SELECT *   FROM myschema.mytable pbh WHERE
pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
$5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
        Limit  (cost=0.70..6.27 rows=1 width=66) (actual
time=5934.154..5934.155 rows=1 loops=1)
          Buffers: shared hit=7623 read=18217
          ->  Index Scan Backward using player_balance_history_idx2 on
mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
time=5934.153..5934.153 rows=1 loops=1)
                Index Cond: ((cage_code = $3) AND (cage_player_id =
$2) AND (modified_time < $5))
                Filter: (((product_code)::text = ($1)::text) AND
((balance_type)::text = ($4)::text))
                Rows Removed by Filter: 95589
                Buffers: shared hit=7623 read=18217

Also, I have made incrementally the following changes: set autovacuum
more aggressive, then added column based stats targets and then
created a statistics object. Yet there is no change in the plan
behavior. Table as it is now:

\d+ myschema.mytable
                                        Table "myschema.mytable"
     Column     │            Type             │ Collation │ Nullable │
Default │ Storage  │ Stats target │ Description
────────────────┼─────────────────────────────┼───────────┼──────────┼─────────┼──────────┼──────────────┼─────────────
 cage_code      │ integer                     │           │ not null │
        │ plain    │ 500          │
 cage_player_id │ bigint                      │           │ not null │
        │ plain    │ 500          │
 product_code   │ character varying(30)       │           │ not null │
        │ extended │ 500          │
 balance_type   │ character varying(30)       │           │ not null │
        │ extended │ 500          │
 version        │ bigint                      │           │ not null │
        │ plain    │              │
 modified_time  │ timestamp(3) with time zone │           │ not null │
        │ plain    │ 500          │
 amount         │ numeric(38,8)               │           │ not null │
        │ main     │              │
 change         │ numeric(38,8)               │           │ not null │
        │ main     │              │
 transaction_id │ bigint                      │           │ not null │
        │ plain    │              │
Indexes:
    "mytable_pk" PRIMARY KEY, btree (cage_code, cage_player_id,
product_code, balance_type, version)
    "mytable_idx1" btree (modified_time)
    "mytable_idx2" btree (cage_code, cage_player_id, modified_time)
Statistics objects:
    "myschema"."statistics_pbh_1" (ndistinct, dependencies) ON
cage_code, cage_player_id, product_code, balance_type FROM
myschema.mytable
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000

I will investigate the index bloat and if something is blocking the
query as suggested by Laurenz Albe.

Best,

Kristjan

On Tue, Sep 14, 2021 at 3:26 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Tue, Sep 14, 2021 at 3:55 AM Kristjan Mustkivi <sonicmonkey@gmail.com> wrote:
>>
>> Hello Tomas,
>>
>> The auto explain analyze caught this:
>>
>> 2021-09-14 06:55:33 UTC, pid=12345  db=mydb, usr=myuser, client=ip,
>> app=PostgreSQL JDBC Driver, line=55 LOG:  duration: 5934.165 ms  plan:
>>   Query Text: SELECT *   FROM myschema.mytable pbh WHERE
>> pbh.product_code = $1   AND pbh.cage_player_id = $2   AND
>> pbh.cage_code = $3   AND balance_type = $4   AND pbh.modified_time <
>> $5 ORDER BY pbh.modified_time DESC FETCH FIRST 1 ROWS ONLY
>>   Limit  (cost=0.70..6.27 rows=1 width=66) (actual
>> time=5934.154..5934.155 rows=1 loops=1)
>>     Buffers: shared hit=7623 read=18217
>>     ->  Index Scan Backward using mytable_idx2 on mytable pbh
>> (cost=0.70..21639.94 rows=3885 width=66) (actual
>> time=5934.153..5934.153 rows=1 loops=1)
>>           Index Cond: ((cage_code = $3) AND (cage_player_id = $2) AND
>> (modified_time < $5))
>>
>> So it expected to get 3885 rows, but got just 1. So this is the
>> statistics issue, right?
>
>
> That would be true if there were no LIMIT.  But with the LIMIT, all this means is that it stopped actually scanning
afterit found one row, but it estimates that if it didn't stop it would have found 3885.  So it is not very
informative. But the above plan appears incomplete, there should be a line for "Rows Removed by Filter", and I think
thatthat is what we really want to see in this case. 
>
> Cheers,
>
> Jeff
> Cheers,
>
> Jeff



--
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Tom Lane

Date:

14 September 2021, 14:15:30

Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
>           ->  Index Scan Backward using player_balance_history_idx2 on
> mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
> time=5934.153..5934.153 rows=1 loops=1)
>                 Index Cond: ((cage_code = $3) AND (cage_player_id =
> $2) AND (modified_time < $5))
>                 Filter: (((product_code)::text = ($1)::text) AND
> ((balance_type)::text = ($4)::text))
>                 Rows Removed by Filter: 95589
>                 Buffers: shared hit=7623 read=18217

So indeed, the core issue is that that filter condition is very selective,
and applying it after the index scan is expensive.  Perhaps it would help
to create an index that includes those columns along with cage_code and
cage_player_id.  (It's not clear whether to bother with modified_time in
this specialized index, but if you do include it, it needs to be the last
column since you're putting a non-equality condition on it.)

            regards, tom lane

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

14 September 2021, 15:36:45

On Tue, Sep 14, 2021 at 5:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
> >           ->  Index Scan Backward using player_balance_history_idx2 on
> > mytable pbh  (cost=0.70..21639.94 rows=3885 width=66) (actual
> > time=5934.153..5934.153 rows=1 loops=1)
> >                 Index Cond: ((cage_code = $3) AND (cage_player_id =
> > $2) AND (modified_time < $5))
> >                 Filter: (((product_code)::text = ($1)::text) AND
> > ((balance_type)::text = ($4)::text))
> >                 Rows Removed by Filter: 95589
> >                 Buffers: shared hit=7623 read=18217
>
> So indeed, the core issue is that that filter condition is very selective,
> and applying it after the index scan is expensive.  Perhaps it would help
> to create an index that includes those columns along with cage_code and
> cage_player_id.  (It's not clear whether to bother with modified_time in
> this specialized index, but if you do include it, it needs to be the last
> column since you're putting a non-equality condition on it.)
>
>                         regards, tom lane

But the Primary Key is defined as btree (cage_code, cage_player_id,
product_code, balance_type, version) so this should be exactly that
(apart from the extra "version" column). And the majority of the query
plans are using the PK with only a small number of cases going for the
IDX2 that is btree (cage_code, cage_player_id, modified_time). So I am
wondering how to make them not do that.

But perhaps the index bloat is indeed playing a part here as both the
PK and IDX2 have ~50% bloat ratio. I will try REINDEX-ing the table
although finding a good window to do it might require some time.

Best regards,

Kristjan

Re: Postgres chooses slow query plan from time to time

From

Tom Lane

Date:

14 September 2021, 15:47:21

Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
>>> Filter: (((product_code)::text = ($1)::text) AND
>>> ((balance_type)::text = ($4)::text))

> But the Primary Key is defined as btree (cage_code, cage_player_id,
> product_code, balance_type, version) so this should be exactly that
> (apart from the extra "version" column).

Oh, interesting.  So this is really a datatype mismatch problem.
I'd wondered idly why you were getting the explicit casts to text
in these conditions, but now it seems that that's key to the
problem: the casts prevent these clauses from being matched to
the index.  What are the declared data types of product_code
and balance_type?  And of the parameters they're compared to?

            regards, tom lane

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

15 September 2021, 06:47:34

Hello!

Both are of type varchar(30).

So is this something odd: Filter: (((product_code)::text = ($1)::text)
AND ((balance_type)::text = ($4)::text)) ?

But why does it do the type-cast if both product_code and balance_type
are of type text (although with constraint 30) and the values are also
of type text?

Best regards,

Kristjan

On Tue, Sep 14, 2021 at 6:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
> >>> Filter: (((product_code)::text = ($1)::text) AND
> >>> ((balance_type)::text = ($4)::text))
>
> > But the Primary Key is defined as btree (cage_code, cage_player_id,
> > product_code, balance_type, version) so this should be exactly that
> > (apart from the extra "version" column).
>
> Oh, interesting.  So this is really a datatype mismatch problem.
> I'd wondered idly why you were getting the explicit casts to text
> in these conditions, but now it seems that that's key to the
> problem: the casts prevent these clauses from being matched to
> the index.  What are the declared data types of product_code
> and balance_type?  And of the parameters they're compared to?
>
>                         regards, tom lane



-- 
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Tom Lane

Date:

15 September 2021, 12:16:52

Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
> Both are of type varchar(30).

Ah, right, you showed that back at the top of the thread.

> So is this something odd: Filter: (((product_code)::text = ($1)::text)
> AND ((balance_type)::text = ($4)::text)) ?

Yes, that is very darn odd.  When I try this I get:

regression=# create table foo(f1 varchar(30), f2 int, primary key (f2,f1));
CREATE TABLE

regression=# explain select * from foo where f2 = 11 and f1 = 'bar';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.15..8.17 rows=1 width=37)
   Index Cond: ((f2 = 11) AND (f1 = 'bar'::text))
(2 rows)

regression=# explain select * from foo where f2 = 11 and f1::text = 'bar';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.15..8.17 rows=1 width=37)
   Index Cond: ((f2 = 11) AND (f1 = 'bar'::text))
(2 rows)

regression=# prepare p as select * from foo where f2 = $1 and f1 = $2;
PREPARE

regression=# explain execute p(11,'bar');
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.15..8.17 rows=1 width=37)
   Index Cond: ((f2 = 11) AND (f1 = 'bar'::text))
(2 rows)

-- repeat a few times till it switches to a generic plan ...

regression=# explain execute p(11,'bar');
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.15..8.17 rows=1 width=37)
   Index Cond: ((f2 = $1) AND (f1 = $2))
(2 rows)

Note the lack of any visible cast on the varchar column, in each one of
these queries, even where I tried to force one to appear.  There is
something happening in your database that is not happening in mine.

My mind is now running to the possibility that you've got some extension
that creates an "=" operator that is capturing the syntax.

            regards, tom lane

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

15 September 2021, 13:01:50

On Wed, Sep 15, 2021 at 3:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Note the lack of any visible cast on the varchar column, in each one of
> these queries, even where I tried to force one to appear.  There is
> something happening in your database that is not happening in mine.
>
> My mind is now running to the possibility that you've got some extension
> that creates an "=" operator that is capturing the syntax.
>
>                         regards, tom lane

The following extensions have been installed:

─[ RECORD 1 ]──────────────────────────────────────────────────────────
Name        │ btree_gist
Version     │ 1.5
Schema      │ public
Description │ support for indexing common datatypes in GiST
─[ RECORD 2 ]──────────────────────────────────────────────────────────
Name        │ pg_stat_statements
Version     │ 1.6
Schema      │ public
Description │ track execution statistics of all SQL statements executed
─[ RECORD 3 ]──────────────────────────────────────────────────────────
Name        │ pgcrypto
Version     │ 1.3
Schema      │ public
Description │ cryptographic functions
─[ RECORD 4 ]──────────────────────────────────────────────────────────
Name        │ plpgsql
Version     │ 1.0
Schema      │ pg_catalog
Description │ PL/pgSQL procedural language

Plus the some libraries preloaded: shared_preload_libraries =
'pg_stat_statements,pg_cron,auto_explain'

Thank you so much for looking into this!

Best regards,
-- 
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com

Re: Postgres chooses slow query plan from time to time

From

Tom Lane

Date:

15 September 2021, 14:34:37

Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
> On Wed, Sep 15, 2021 at 3:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Note the lack of any visible cast on the varchar column, in each one of
>> these queries, even where I tried to force one to appear.  There is
>> something happening in your database that is not happening in mine.

> The following extensions have been installed:
> [ nothing very exciting ]

I still get the same results after installing those extensions.

I realized that the reason I don't see a cast is that
fix_indexqual_operand removes the cast from an index qualifier
expression's index-column side.  In any other context, there would
be a cast there, since the operator is =(text,text) not
=(varchar,varchar).  So that seems like a red herring ... or is it?
Now I'm confused by your original report, in which you show

>>>         ->  Index Scan using mytable_pk on mytable pbh (cost=0.70..176.82 rows=186 width=66) (actual
time=1.001..8.610rows=25 loops=1) 
>>>               Index Cond: ((cage_code = 123) AND (cage_player_id = '12345'::bigint) AND ((product_code)::text =
'PRODUCT'::text)AND ((balance_type)::text = 'TOTAL'::text)) 
>>>               Filter: (modified_time < '2021-09-13 04:00:00+00'::timestamp with time zone)

According to the code I just looked at, there should absolutely not
be casts on the product_code and balance_type index columns here.
So I'm not quite sure what's up ... -ENOCAFFEINE perhaps.

Nonetheless, this point is probably just a sideshow.  The above
EXPLAIN output proves that the planner *can* match this index,
which destroys my idea that you had a datatype mismatch preventing
it from doing so.

After looking again at the original problem, I think you are getting
bit by an issue we've seen before.  The planner is coming out with
a decently accurate cost estimate for the query when specific values
are inserted for the parameters.  However, when it considers a generic
version of the query with no known parameter values, the cost estimates
are not so good, and by chance it comes out estimating a very low cost
for the alternative plan that uses the other index.  That cost is not
right, but the planner doesn't know that, so it seizes on that plan.

This is a hard problem to fix, and we don't have a good answer for it.
In v12 and up, you can use the big hammer of disabling generic plans by
setting plan_cache_mode to "force_custom_plan", but v11 doesn't have
that parameter.  You might need to avoid using a prepared statement for
this query.

            regards, tom lane

Re: Postgres chooses slow query plan from time to time

From

Kristjan Mustkivi

Date:

16 September 2021, 07:09:09

Understood.

Thank you so much for looking into this!

Best regards,

Kristjan

On Wed, Sep 15, 2021 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kristjan Mustkivi <sonicmonkey@gmail.com> writes:
> > On Wed, Sep 15, 2021 at 3:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Note the lack of any visible cast on the varchar column, in each one of
> >> these queries, even where I tried to force one to appear.  There is
> >> something happening in your database that is not happening in mine.
>
> > The following extensions have been installed:
> > [ nothing very exciting ]
>
> I still get the same results after installing those extensions.
>
> I realized that the reason I don't see a cast is that
> fix_indexqual_operand removes the cast from an index qualifier
> expression's index-column side.  In any other context, there would
> be a cast there, since the operator is =(text,text) not
> =(varchar,varchar).  So that seems like a red herring ... or is it?
> Now I'm confused by your original report, in which you show
>
> >>>         ->  Index Scan using mytable_pk on mytable pbh (cost=0.70..176.82 rows=186 width=66) (actual
time=1.001..8.610rows=25 loops=1)
 
> >>>               Index Cond: ((cage_code = 123) AND (cage_player_id = '12345'::bigint) AND ((product_code)::text =
'PRODUCT'::text)AND ((balance_type)::text = 'TOTAL'::text))
 
> >>>               Filter: (modified_time < '2021-09-13 04:00:00+00'::timestamp with time zone)
>
> According to the code I just looked at, there should absolutely not
> be casts on the product_code and balance_type index columns here.
> So I'm not quite sure what's up ... -ENOCAFFEINE perhaps.
>
> Nonetheless, this point is probably just a sideshow.  The above
> EXPLAIN output proves that the planner *can* match this index,
> which destroys my idea that you had a datatype mismatch preventing
> it from doing so.
>
> After looking again at the original problem, I think you are getting
> bit by an issue we've seen before.  The planner is coming out with
> a decently accurate cost estimate for the query when specific values
> are inserted for the parameters.  However, when it considers a generic
> version of the query with no known parameter values, the cost estimates
> are not so good, and by chance it comes out estimating a very low cost
> for the alternative plan that uses the other index.  That cost is not
> right, but the planner doesn't know that, so it seizes on that plan.
>
> This is a hard problem to fix, and we don't have a good answer for it.
> In v12 and up, you can use the big hammer of disabling generic plans by
> setting plan_cache_mode to "force_custom_plan", but v11 doesn't have
> that parameter.  You might need to avoid using a prepared statement for
> this query.
>
>                         regards, tom lane



-- 
Kristjan Mustkivi

Email: kristjan.mustkivi@gmail.com