Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Berserk Autovacuum (let's save next Mandrill) |
Date | |
Msg-id | 20200318033701.GF26184@telsasoft.com Whole thread Raw |
In response to | Re: Berserk Autovacuum (let's save next Mandrill) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Berserk Autovacuum (let's save next Mandrill)
|
List | pgsql-hackers |
On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote: > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > I think Andres was thinking this would maybe be an optimization independent of > > > > is_insert_only (?) > > > > > > I wasn't sure. > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > will be ok. > > > > I was trying to say (in a later email) that I think it might be a good > > compromise to opportunistically freeze if we're dirtying the page > > anyway, but not optimize WAL emission etc. That's a pretty simple > > change, and it'd address a lot of the potential performance regressions, > > while still freezing for the "first" vacuum in insert only workloads. > > If we have truly insert-only tables, then doesn't vacuuming with > freezing every tuple actually decrease total vacuum cost (perhaps > significantly) since otherwise every vacuum keeps having to scan the > heap for dead tuples on pages where we know there are none? Those > pages could conceptually be frozen and ignored, but are not frozen > because of the default behavior, correct? The essential part of this patch is to trigger vacuum *at all* on an insert-only table. Before today's updated patch, it also used FREEZE on any table which hit the new insert threshold. The concern I raised is for insert-MOSTLY tables. I thought it might be an issue if repeatedly freezing updated tuples caused vacuum to be too slow, especially if they're distributed in pages all across the table rather than clustered. And I asked that the behavior (FREEZE) be configurable by a separate setting than the one that triggers autovacuum to run. FREEZE is already controlled by the vacuum_freeze_table_age param. I think you're right that VACUUM FREEZE on an insert-only table would be less expensive than vacuum once without freeze and vacuum again later, which uses freeze. To me, that suggests setting vacuum_freeze_table_age to a low value on those tables. Regular vacuum avoids scanning all-visible pages, so for an insert-only table pages should only be vacuumed once (if frozen the 1st time) or twice (if not). * Except when aggressive is set, we want to skip pages that are * all-visible according to the visibility map, but only when we can skip postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT generate_series(1,999999); VACUUM VERBOSE t; VACUUM VERBOSE t; ... INFO: "t": found 0 removable, 999999 nonremovable row versions in 4425 out of 4425 pages ... VACUUM Time: 106.038 ms INFO: "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 pages VACUUM Time: 1.828 ms => That's its not very clear way of saying that it only scanned 1 page the 2nd time around. > We have tables that log each change to a business object (as I suspect > many transactional workloads do), and I've often thought that > immediately freeze every page as soon as it fills up would be a real > win for us. > > If that's all true, it seems to me that removing that part of the > patch significantly lowers its value. > If we opportunistically freeze only if we're already dirtying a page, > would that help a truly insert-only workload? E.g., are there hint > bits on the page that would need to change the first time we vacuum a > full page with no dead tuples? I would have assumed the answer was > "no" (since if so I think it would follow that _all_ pages need > updated the first time they're vacuumed?). You probably know that hint bits are written by the first process to access the tuple after it was written. I think you're asking if the first *vacuum* requires additional writes beyond that. And I think vacuum wouldn't touch the page until it decides to freeze tuples. I do have a patch to display the number of hint bits written and pages frozen. https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com > But if that's the case, then this kind of opportunistic freezing wouldn't > help this kind of workload. Maybe there's something I'm misunderstanding > about how vacuum works though. I am reminding myself about vacuum with increasing frequency and usually still learn something new. -- Justin
pgsql-hackers by date: