Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: MaxOffsetNumber for Table AMs
Date
Msg-id CA+TgmoaQro9E6orfMgj-2oCmj7dCVRR24jW2htS6wuUsNLAx_w@mail.gmail.com
Whole thread Raw
In response to Re: MaxOffsetNumber for Table AMs  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: MaxOffsetNumber for Table AMs
List pgsql-hackers
On Fri, Apr 30, 2021 at 6:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
> A remaining problem is that we must generate a new round of index
> tuples for each and every index when only one indexed column is
> logically modified by an UPDATE statement. I think that this is much
> less of a problem now due to bottom-up index deletion. Sure, it sucks
> that we still have to dirty the page at all. But it's nevertheless
> true that it all but eliminates version-driven page splits, which are
> where almost all of the remaining downside is. It's very reasonable to
> now wonder if this particular all-indexes problem is worth solving at
> all in light of that. (Modern hardware characteristics also make a
> comprehensive fix less valuable in practice.)

It's reasonable to wonder. I think it depends on whether the problem
is bloat or just general slowness. To the extent that the problem is
bloat, bottom-index deletion will help a lot, but it's not going to
help with slowness because, as you say, we still have to dirty the
pages. And I am pretty confident that slowness is a very significant
part of the problem here. It's pretty common for people migrating from
another database system to have, for example, a table with 10 indexes
and then repeatedly update a column that is covered by only one of
those indexes. Now, with bottom-up index deletion, this should cause a
lot less bloat, and that's good. But you still have to update all 10
indexes in the foreground, and that's bad, because the alternative is
to find just the one affected index and update it twice -- once to
insert the new tuple, and a second time to delete-mark the old tuple.
10 is a lot more than 2, and that's even ignoring the cost of deferred
cleanup on the other 9 indexes. So I don't really expect this to get
us out of the woods. Somebody whose workload runs five times slower on
a pristine data load is quite likely to give up on using PostgreSQL
before bloat even enters the picture.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: strange error reporting
Next
From: Robert Haas
Date:
Subject: Re: Granting control of SUSET gucs to non-superusers