Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAD21AoAs-KxrWpo57C8KhBS_Mh4A=4VWV2_xZCbLmLsr3=Rhrw@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New IndexAM API controlling index vacuum strategies  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, Jan 22, 2021 at 2:00 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Looks good. I'll give this version a review now. I will do a lot more
> > soon. I need to come up with a good benchmark for this, that I can
> > return to again and again during review as needed.
>
> I performed another benchmark, similar to the last one but with the
> latest version (v2), and over a much longer period. Attached is a
> summary of the whole benchmark, and log_autovacuum output from the
> logs of both the master branch and the patch.

Thank you for performing the benchmark!

>
> This was pgbench scale 2000, 4 indexes on pgbench_accounts, and a
> transaction with one update and two selects. Each run was 4 hours, and
> we alternate between patch and master for each run, and alternate
> between 16 and 32 clients. There were 8 4 hour runs in total, meaning
> the entire set of runs took 8 * 4 hours = 32 hours (not including
> initial load time and a few other small things like that). I used a
> 10k TPS rate limit, so TPS isn't interesting here. Latency is
> interesting -- we see a nice improvement in latency (i.e. a reduction)
> for the patch (see all.summary.out).

What value is set to fillfactor?

>
> The benefits of the patch are clearly visible when I drill down and
> look at the details. Each pgbench_accounts autovacuum VACUUM operation
> can finish faster with the patch because they can often skip at least
> some indexes (usually the PK, sometimes 3 out of 4 indexes total). But
> it's more subtle than some might assume. We're skipping indexes that
> VACUUM actually would have deleted *some* index tuples from, which is
> very good. Bottom-up index deletion is usually lazy, and only
> occasionally very eager, so you still have plenty of "floating
> garbage" index tuples in most pages. And now we see VACUUM behave a
> little more like bottom-up index deletion -- it is lazy when that is
> appropriate (with indexes that really only have floating garbage that
> is spread diffusely throughout the index structure), and eager when
> that is appropriate (with indexes that get much more garbage).

That's very good. I'm happy that this patch efficiently utilizes
bottom-up index deletion feature.

Looking at the relation size growth, there is almost no difference
between master and patched in spite of skipping some vacuums in the
patched test, which is also good.

>
> The benefit is not really that we're avoiding doing I/O for index
> vacuuming (though that is one of the smaller benefits here). The real
> benefit is that VACUUM is not dirtying pages, since it skips indexes
> when it would be "premature" to vacuum them from an efficiency point
> of view. This is important because we know that Postgres throughput is
> very often limited by page cleaning. Also, the "economics" of this new
> behavior make perfect sense -- obviously it's more efficient to delay
> garbage cleanup until the point when the same page will be modified by
> a backend anyway -- in the case of this benchmark via bottom-up index
> deletion (which deletes all garbage tuples in the leaf page at the
> point that it runs for a subset of pointed-to heap pages -- it's not
> using an oldestXmin cutoff from 30 minutes ago). So whenever we dirty
> a page, we now get more value per additional-page-dirtied.
>
> I believe that controlling the number of pages dirtied by VACUUM is
> usually much more important than reducing the amount of read I/O from
> VACUUM, for reasons I go into on the recent "vacuum_cost_page_miss
> default value and modern hardware" thread. As a further consequence of
> all this, VACUUM can go faster safely and sustainably (since the cost
> limit is not affected so much by vacuum_cost_page_miss), which has its
> own benefits (e.g. oldestXmin cutoff doesn't get so old towards the
> end).
>
> Another closely related huge improvement that we see here is that the
> number of FPIs generated by VACUUM can be significantly reduced. This
> cost is closely related to the cost of dirtying pages, but it's worth
> mentioning separately. You'll see some of that in the log_autovacuum
> log output I attached.

Makes sense.

>
> There is an archive with much more detailed information, including
> dumps from most pg_stat_* views at key intervals. This has way more
> information than anybody is likely to want:
>
> https://drive.google.com/file/d/1OTiErELKRZmYnuJuczO2Tfcm1-cBYITd/view?usp=sharing
>
> I did notice a problem, though. I now think that the criteria for
> skipping an index vacuum in the third patch from the series is too
> conservative, and that this led to an excessive number of index
> vacuums with the patch.

Maybe that's why there are 5 autovacuum runs on pgbench_accounts in
the master branch whereas there are 7 runs in the patched?

>  This is probably because there was a tiny
> number of page splits in some of the indexes that were not really
> supposed to grow. I believe that this is caused by ANALYZE running --
> I think that it prevented bottom-up deletion from keeping a few of the
> hottest pages from splitting (that can take 5 or 6 seconds) at a few
> points over the 32 hour run. For example, the index named "tenner"
> grew by 9 blocks, starting out at 230,701 and ending up at 230,710 (to
> see this, extract the files from the archive and "diff
> patch.r1c16.initial_pg_relation_size.out
> patch.r2c32.after_pg_relation_size.out").
>
> I now think that 0 blocks added is unnecessarily restrictive -- a
> small tolerance still seems like a good idea, though (let's still be
> somewhat conservative about it).

Agreed.

>
> Maybe a better criteria would be for nbtree to always proceed with
> index vacuuming when the index size is less than 2048 blocks (16MiB
> with 8KiB BLCKSZ). If an index is larger than that, then compare the
> last/old block count to the current block count (at the point that we
> decide if index vacuuming is going to go ahead) by rounding up both
> values to the next highest 2048 block increment. This formula is
> pretty arbitrary, but probably works as well as most others. It's a
> good iteration for the next version of the patch/further testing, at
> least.

Also makes sense to me. The patch I recently submitted doesn't include
it but I'll do that in the next version patch.

Maybe the same is true for heap? I mean that skipping heap vacuum on a
too-small table will not bring the benefit but bloat. I think we could
proceed with heap vacuum if a table is smaller than a threshold, even
if one of the indexes wanted to skip.

>
> BTW, it would be nice if there was more instrumentation, say in the
> log output produced when log_autovacuum is on. That would make it
> easier to run these benchmarks -- I could verify my understanding of
> the work done for each particular av operation represented in the log.
> Though the default log_autovacuum log output is quite informative, it
> would be nice if the specifics were more obvious (maybe this could
> just be for the review/testing, but it might become something for
> users if it seems useful).

Yeah, I think the following information would also be helpful:

* did vacuum heap? or skipped it?
* how many indexes did/didn't bulk-deletion?
* time spent for each vacuum phase.
etc

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: FailedAssertion in heap_index_delete_tuples at heapam.c:7220
Next
From: Tatsuro Yamada
Date:
Subject: Re: simplifying foreign key/RI checks