Thread: Limit index pages visited in planner's get_actual_variable_range

Limit index pages visited in planner's get_actual_variable_range

From
Rian McGuire
Date:
Hi hackers,

It seems the get_actual_variable_range function has a long history of
fixes attempting to improve its worst-case behaviour, most recently in
9c6ad5eaa95, which limited the number of heap page fetches to 100.
There's currently no limit on the number of index pages fetched.

We managed to get into trouble after deleting a large number of rows
(~8 million) from the start of an index, which caused planning time to
blow up on a hot (~250 calls/s) query. During the incident `explain
(analyze, buffers)` looked like this:
Planning:
  Buffers: shared hit=88311
Planning Time: 249.902 ms
Execution Time: 0.066 ms

The planner was burning a huge amount of CPU time looking through
index pages for the first visible tuple. The problem eventually
resolved when the affected index was vacuumed, but that took several
hours to complete.

There's a reproduction with a smaller dataset below.

Our current workaround to safely bulk delete from these large tables
involves delaying deletion of the minimum row until after a vacuum has
run, so there's always a visible tuple near the start of the index.
It's not realistic for us to run vacuums more frequently (ie. after
deleting a smaller number of rows) because they're so time-consuming.

The previous discussion [1] touched on the idea of also limiting the
number of index page fetches, but there were doubts about the safety
of back-patching and the ugliness of modifying the index AM API to
support this.

I would like to submit our experience as evidence that the lack of
limit on index page fetches is a real problem. Even if a fix for this
doesn't get back-patched, it would be nice to see it in a major
version.

As a starting point, I've updated the WIP index page limit patch from
Simon Riggs [2] to apply cleanly to master.

Regards,
Rian

[1]
https://www.postgresql.org/message-id/flat/CAKZiRmznOwi0oaV%3D4PHOCM4ygcH4MgSvt8%3D5cu_vNCfc8FSUug%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CANbhV-GUAo5cOw6XiqBjsLVBQsg%2B%3DkpcCCWYjdTyWzLP28ZX-Q%40mail.gmail.com

=# create table test (id bigint primary key) with (autovacuum_enabled = 'off');
=# insert into test select generate_series(1,10000000);
=# analyze test;

An explain with no dead tuples looks like this:
=# explain (analyze, buffers) select id from test where id in (select
id from test order by id desc limit 1);
Planning:
  Buffers: shared hit=8
Planning Time: 0.244 ms
Execution Time: 0.067 ms

But if we delete a large number of rows from the start of the index:
=# delete from test where id <= 4000000;

The performance doesn't become unreasonable immediately - it's limited
to visiting 100 heap pages while it's setting killed bits on the index
tuples:
=# explain (analyze, buffers) select id from test where id in (select
id from test order by id desc limit 1);
Planning:
  Buffers: shared hit=1 read=168 dirtied=163
Planning Time: 5.910 ms
Execution Time: 0.107 ms

But the number of index buffers visited increases on each query, and
eventually all the killed bits are set:
$ for i in {1..500}; do psql test -c 'select id from test where id in
(select id from test order by id desc limit 1)' >/dev/null; done
=# explain (analyze, buffers) select id from test where id in (select
id from test order by id desc limit 1);
Planning:
  Buffers: shared hit=11015
Planning Time: 35.772 ms
Execution Time: 0.070 ms

With the patch:
=# explain (analyze, buffers) select id from test where id in (select
id from test order by id desc limit 1);
Planning:
  Buffers: shared hit=107
Planning Time: 0.377 ms
Execution Time: 0.045 ms

Attachment

Re: Limit index pages visited in planner's get_actual_variable_range

From
Peter Geoghegan
Date:
On Thu, May 2, 2024 at 2:12 AM Rian McGuire <rian.mcguire@buildkite.com> wrote:
> The planner was burning a huge amount of CPU time looking through
> index pages for the first visible tuple. The problem eventually
> resolved when the affected index was vacuumed, but that took several
> hours to complete.

This is exactly the same problem recently that Mark Callaghan recently
encountered when benchmarking Postgres using a variant of his insert
benchmark:

https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html
https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html
https://smalldatum.blogspot.com/2024/03/trying-to-tune-postgres-for-insert.html

This is a pretty nasty sharp edge. I bet many users encounter this
problem without ever understanding it.

> The previous discussion [1] touched on the idea of also limiting the
> number of index page fetches, but there were doubts about the safety
> of back-patching and the ugliness of modifying the index AM API to
> support this.

Fundamentally, the problem is that the heuristics that we have don't
care about the cost of reading index leaf pages. All that it takes is
a workload where that becomes the dominant cost -- such a workload
won't be helped at all by the existing heap-focussed heuristic. This
seems obvious to me.

It seems natural to fix the problem by teaching the heuristics to give
at least some consideration to the cost of reading index leaf pages --
more than zero. The patch from Simon seems like the right general
approach to me, since it precisely targets the underlying problem.
What other approach is there, really?

> I would like to submit our experience as evidence that the lack of
> limit on index page fetches is a real problem. Even if a fix for this
> doesn't get back-patched, it would be nice to see it in a major
> version.

I find that very easy to believe.

--
Peter Geoghegan