On Thu, 22 Apr 2021 at 00:03, Jeremy Schneider <schnjere@amazon.com> wrote:
>
> Two years later, I still remember this. And today I just confirmed
> someone hitting this on open source PG13.
The only thing that changed about get_actual_variable_range() is that
it now uses a SnapshotNonVacuumable snapshot. Previously a
long-running transaction could have caused vacuum to be unable to
remove tuples which could have caused get_actual_variable_range() to
be slow if it had to skip the unvacuumable tuples.
That's now changed as the SnapshotNonVacuumable will see any tuples
required by that long-running transaction and use that to determine
the range instead of skipping over it.
Anyone with a large number of tuples that vacuum can remove that are
at either end of the range on a column that is indexed by a btree
index could still have issues. Vacuuming more often might be a good
thing to consider. With the original report on this thread there were
more dead tuples in the table than live tuples. Disabling auto-vacuum
or tuning it so it waits that long is likely a bad idea.
FWIW, here's a simple test case that shows the problem in current master.
create table a (a int primary key) with (autovacuum_enabled = off);
insert into a select x from generate_series(1,10000000) x;
analyze a;
delete from a;
\timing on
explain select * from a where a < 10000000;
QUERY PLAN
------------------------------------------------------------
Seq Scan on a (cost=0.00..169247.71 rows=9998977 width=4)
Filter: (a < 10000000)
(2 rows)
Time: 9062.600 ms (00:09.063)
vacuum a;
explain select * from a where a < 10000000;
QUERY PLAN
-------------------------------------------------
Seq Scan on a (cost=0.00..0.00 rows=1 width=4)
Filter: (a < 10000000)
(2 rows)
Time: 2.665 ms
Notice that it became faster again after I did a vacuum.
David