Thread: [GENERAL] Plan changes from index scan to seq scan after 5 executions

[GENERAL] Plan changes from index scan to seq scan after 5 executions

From
Alexander Kukushkin
Date:
Hi,

Recently I've been investigating a strange behavior of one stored procedure.
According to the statistics its execution time was very high (15 seconds), but if I run the same statement from console it was very fast, just a few milliseconds.

At the end I was able to prepare a short script, which reproduces the problem:

localhost/postgres=# create table test(id serial not null primary key, value text);         
CREATE TABLE
localhost/postgres=# create index ON test(id) where value = 'l';                                
CREATE INDEX
localhost/postgres=# insert into test(value) select 'h' FROM generate_series(1, 1000000);
INSERT 0 1000000
localhost/postgres=# analyze test;
ANALYZE
localhost/postgres=# prepare foo as select * FROM test where value = $1 limit 1;
PREPARE

Now we will run prepared statement. First 5 times it will produce following output:
localhost/postgres=# explain analyze execute foo('l');
                                                       QUERY PLAN                                                      
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=0.12..4.14 rows=1 width=6) (actual time=0.003..0.003 rows=0 loops=1)
   ->  Index Scan using test_id_idx on test  (cost=0.12..4.14 rows=1 width=6) (actual time=0.002..0.002 rows=0 loops=1)
 Planning time: 0.188 ms
 Execution time: 0.014 ms
(4 rows)


But on the 6th time (and further) plan it starts using Seq Scan:
localhost/postgres=# explain analyze execute foo('l');
                                                   QUERY PLAN                                                  
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Limit  (cost=0.00..0.02 rows=1 width=6) (actual time=77.890..77.890 rows=0 loops=1)
   ->  Seq Scan on test  (cost=0.00..16925.00 rows=1000000 width=6) (actual time=77.889..77.889 rows=0 loops=1)
         Filter: (value = $1)
         Rows Removed by Filter: 1000000
 Planning time: 0.104 ms
 Execution time: 77.904 ms
(6 rows)


If I remove "LIMIT 1" from the query, everything is good, it will always use Index Scan.

Regards,
--
Alexander Kukushkin

Re: [GENERAL] Plan changes from index scan to seq scan after 5 executions

From
"David G. Johnston"
Date:
On Sat, Sep 30, 2017 at 10:57 AM, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hi,

Recently I've been investigating a strange behavior of one stored procedure.

Please provide the output of:

SELECT version();

David J.

Re: [GENERAL] Plan changes from index scan to seq scan after 5 executions

From
Alexander Kukushkin
Date:
Hi David,

sorry, absolutely forgot about important stuff like version.
Original problem has been found on 9.4.9, but I was able to reproduce it on 10rc1.

localhost/postgres=# select version();
                                                     version                                                    
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 10rc1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit
(1 row)

Re: [GENERAL] Plan changes from index scan to seq scan after 5 executions

From
Tom Lane
Date:
Alexander Kukushkin <cyberdemn@gmail.com> writes:
> Original problem has been found on 9.4.9, but I was able to reproduce it on
> 10rc1.

The problem here is that the generic estimate of the selectivity of
"where value = $1" is 1.0, because var_eq_non_const makes the not
unreasonable assumption that the unknown value being searched for is
one of the values appearing in the table, and there is only one.
This makes the estimated cost of the seqscan+limit plan nearly nil,
since it's expected to return the first row it comes to.  So that
plan gets chosen if we're considering a generic plan that doesn't
know the specific value of $1.  And at that point the plancache
says to itself "I'm not getting any benefit in estimated cost for
the custom plans I've been making, so I'll stop doing that".

This is not an easy thing to improve without making other cases
worse :-(.  One heuristic that I've been wondering about is whether
we could say "if the generic plan appears cheaper than any custom
plan we've made so far, disbelieve it, because probably it's based
on wrong estimates".  But I'm not sure if that would have its own
failure modes.  Anyway nobody's tried it yet.

You can find more discussion of this problem in the -hackers archives.

As for workarounds, the only short-term fix I can suggest is to use
EXECUTE for this query in your function, thus preventing caching of
a plan for it.
        regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general